Reason behind using Manhattan distance

Suppose there are 3 data points:

1 reference data point:

[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

And 2 data points in the database:

[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1] (Euclidean distance = 3.87, Squared Euclidean distance = 15, Manhattan distance = 15)

[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4] (Euclidean distance = 4, Squared Euclidean distance = 16, Manhattan distance = 4)

If noise changes a single 0 into a 4, it will affect Euclidean distance 4x times higher than Manhattan distance. Euclidean distance will pick the first, Manhattan distance will pick the second. If you divide all numbers by 10 and keep them all between 0 and 0.4, so they all have less energy than the main dimensions, the result will still be the same.

MN (talk)‎

this is a good demonstration! euclidean is sensitive to outliners and prefer the averagely non-bad one rather than some good point with some dimensions being noise.

Xor (talk)‎

Reason behind using Manhattan distance

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools