Blog

Consider the task of constructing a classifier from random data, where the attribute values are generated at random regardless of the class labels. Assume the data set includes records from two classes: â€+†and â€â’. ” The first half of the data set is used for training, and the second half is used for testing.

(a)

Assume the data contains an equal number of positive and negative records, and the decision tree classifier predicts that every test record will be positive. What is the classifier’s expected error rate on the test data?

(b)

Rerun the previous analysis, assuming that the classifier predicts each test record to be positive with a probability of 0.8 and negative with a probability of 0.2.

(c)

Assume that two-thirds of the data fall into the positive category and one-third falls into the negative category. What is the expected error of a classifier that predicts every positive test record?

(d)

Rerun the previous analysis, assuming that the classifier predicts each test record to be positive with a probability of two-thirds and negative with a probability of one-third.