Talk:Segmentation/Best Segment
I was reading about oldwiki:Segmentation/BestSegment and oldwiki:Segmentation/Prioritizing. Basically my approach is base on slope and peak recognition. I call it "Peak Obviousness". I think the result is pretty decent, but I am not sure which VCS data is consider more "obvious". Note that my approach only works with real number in range of [0,1] (only smoothed and averaged VCS data set).
From the oldwiki page, using the information gain method, there are problems with the segment {0,0,0,0,0,3,3,3,3,3} and segment {2,2,2,2,2,5,2,2,2,2}. Converted these to float, my method consider the later segment better by large margin (0.130 vs 0.330). But these segments is what I am not sure with:
* {0, 1, 2, 3, 6, 5, 3, 2, 1, 0} * {0, 2, 4, 6, 5, 6, 4, 2, 1, 0} * {0, 2, 4, 6, 0, 6, 4, 2, 1, 0} * {0, 2, 4, 6, 6, 6, 4, 2, 1, 0}
(Note: I converted them to integer for reader's sake) My current method rate the first array 0.442, the second 0.168, the third 0.325 and the last 0.239. I am not sure which segment is consider better in Robocode's world. What do you guys think?
—Preceding unsigned comment added by Nat (talk • contribs)
Well... one method is to consider it in terms of botwidth. If the enemy will take up two bins, just calculate the ratio of "highest sum of two consecutive bins" to "sum of all other bins" and use that. For a botwidth of two bins, your examples score 0.917, 0.579, 0.667, 0.632. For a botwidth of one bin, your examples score as 0.353, 0.250, 0.316, 0.240. In both cases, the results rather clearly show the first example as "best".
To be honest though, I'm skeptical of approaches along these lines. Imagine a statistician looking at various interpretations of data and picking the one that made it looks most likely that he's win the lottery. It's a fallacy to cherry-pick the optimistic interpretations of data.
I'll put it this way: What makes a particular segmentation bad? Statistically they're all equally valid actually (assuming they have equal quantities of data). They range from "useful" to "noise", never ever "anti-useful". Ultimately, the noise would average out to nothing if one has enough diverse segmentation. If you had infinite different segmentations, the "noise" data should always average out, and a fairly weighted sum of the the "useful" data. As such it seems like, if it were possible, it would be much better to have infinite segmentations at once, than to pick and choose.
--Rednaxela 18:08, 3 April 2011 (UTC)