Reason behind using Manhattan distance

Jump to navigation Jump to search

Rethinking this after 4.5 years, L1/L2 distancing resembles L1/L2 norm in logistic regression, where L1 norm tend to find weights with more zeros, and L2 norm tend to find equal but non-zero weights for co-linear attributes. Since no one is using duplicated attributes due to limited dimensionality, this benefit of L2 norm is nullified.

The property of having more zeros of L1 norm reminds me of pattern matching, where zero means a match and non-zero means a mismatch. Being able to make a partial but best-effort match effectively simulates using a large amount of trees, each having a subset of the attributes.

Xor (talk)10:24, 17 January 2023