Reason behind using Manhattan distance

Jump to navigation Jump to search

Suppose there are 3 data points:

1 reference data point:

[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

And 2 data points in the database:

[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1] (Euclidean distance = 3.87, Squared Euclidean distance = 15, Manhattan distance = 15)

[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4] (Euclidean distance = 4, Squared Euclidean distance = 16, Manhattan distance = 4)

If noise changes a single 0 into a 4, it will affect Euclidean distance 4x times higher than Manhattan distance. Euclidean distance will pick the first, Manhattan distance will pick the second.

MN (talk)17:51, 28 August 2018

this is a good demonstration! euclidean is sensitive to outliners and prefer the averagely non-bad one rather than some good point with some dimensions being noise.

Xor (talk)03:03, 29 August 2018