XanderCat/GFWS
Guess Factors and Wave Surfing (GFWS)
Dynamic Clustering using k-nearest-neighbor
KNN Distance Algorithms
The basic algorithms for finding nearby neighbors can be found on the Dynamic_Clustering page. The most common approach in Robocode use I would guess is Manhattan distance, which is |x2-x1|+|y2-y1|+.... But what are these distance values?
For XanderCat, the "distance" values are based on the segment indexes. A segmenter slices some attribute in a fixed n number of slices. The "distance" is how far the current situation segment index is from some candidate situation segment index. But a straight computation using segment index will not work well, because each segmenter can have a different value for the number of slices n. XanderCat therefore uses a percent difference across the range n instead.
So, assume segmenter A has 10 slices (indicies from 0 to 9, n=10). Assume the current situation falls under segment index 4 (call this value a2), while the candidate situation falls under segment index 6 (call this value a1).
Using straight Manhattan distance, this would look like |a2-a1| or |4-6| == 2.
But to account each segmenter having a different number of segmenters, XanderCat uses percentages instead.
Using percentages, this would look like |(a2-a1)/n| or |(4-6)/10| == 0.2 (the difference between is 20%).
Now assume there is a second segmenter B with a difference of 0.1 (10% diference)
Now the Manhattan distance is 0.2 + 0.1 = 0.3 (lower is a closer match)
While this type of distance calculation will be tested soon, initial versions of XanderCat did not use this approach exactly. Instead, it multiplied the inverse percentages together, like so:
(1-0.2)*(1-0.1) = 0.8 * 0.9 = 0.72 (higher is a closer match)
Is this better or worse? Honestly, as I type this, I don't know yet. This is something I will be testing out soon. The latter requires all of the segmenters distances to remain somewhat close to be a good match, while with the former straight Manhattan approach, a large seperation on one segmenter can be made up for by the others being closer together.