Blog

This exercise compares and contrasts some measures of similarity and distance.

(a)

The L1 distance corresponds to the Hamming distance for binary data; that is, the number of bits that differ between two binary vectors. The Jaccard similarity is a measure of how similar two binary vectors are. Calculate the Hamming distance and Jaccard similarity between the two binary vectors below.

0101010001 = x

0100011000 = y

(b)

Which method, Jaccard distance or Hamming distance, is more similar to the Simple Matching Coefficient, and which method is more similar to the cosine measure? Explain.

(c)

Consider how similar two organisms of different species are in terms of the number of genes they share. Describe which measure, Hamming or Jaccard, you believe is better for comparing the genetic make-up of two organisms. Explain. (Assume that each animal is represented as a binary vector, with each attribute being 1 if the organism contains a specific gene and 0 otherwise.)

(d)

Would you use the Hamming distance, the Jaccard coefficient, or another measure of similarity or distance to compare the genetic makeup of two organisms of the same species, such as two humans? Explain.