Describe how you would convert the data set with the attributes listed below into a binary transaction data set suitable for association analysis. Indicate specifically for each attribute in the original data set.

(a) The number of binary attributes it corresponds to in the transaction data set,

(b) How the original attribute values would be mapped to binary attribute values, and

(c) If the data values of an attribute contain any hierarchical structure that could be useful for grouping the data into fewer binary attributes.

The following is a list of the data set’s attributes, along with their possible values. Assume that all attributes are gathered on a student-by-student basis:

â€Year: Freshman, Sophomore, Junior, Senior, etc. Master’s Degree, PhD Degree, Professional

†Zip code: zip code for a U.S. student’s home address, zip code for a non-U.S. student’s local address

Agriculture, Architecture, Continuing Education, Education, Liberal Arts, Engineering, Natural Sciences, Business, Law, Medical, Dentistry, Pharmacy, Nursing, Veterinary Medicine

â€On Campus: 1 if the student resides on campus, 0 otherwise

â€Each of the following is a distinct attribute with a value of 1 if the person speaks the language and a value of 0 otherwise.

Arabic —


— Mandarin Chinese


— Brazilian



8th Chapter:


Consider a data set made up of 2(20) data vectors, each of which has 32 components, each of which is a 4-byte value. Assume that vector quantization is used for compression, with 2(16) prototype vectors. What is the compression ratio and how many bytes of storage does that data set require before and after compression?


Give an example of a data set with three natural clusters for which K-means would almost always find the correct clusters but bisecting K-means would not.