Talk:Segmentation/Best Segment
I was reading about oldwiki:Segmentation/BestSegment and oldwiki:Segmentation/Prioritizing. Basically my approach is base on slope and peak recognition. I call it "Peak Obviousness". I think the result is pretty decent, but I am not sure which VCS data is consider more "obvious". Note that my approach only works with real number in range of [0,1] (only smoothed and averaged VCS data set).
From the oldwiki page, using the information gain method, there are problems with the segment {0,0,0,0,0,3,3,3,3,3} and segment {2,2,2,2,2,5,2,2,2,2}. Converted these to float, my method consider the later segment better by large margin (0.130 vs 0.330). But these segments is what I am not sure with:
* {0, 1, 2, 3, 6, 5, 3, 2, 1, 0} * {0, 2, 4, 6, 5, 6, 4, 2, 1, 0} * {0, 2, 4, 6, 0, 6, 4, 2, 1, 0} * {0, 2, 4, 6, 6, 6, 4, 2, 1, 0}
(Note: I converted them to integer for reader's sake) My current method rate the first array 0.442, the second 0.168, the third 0.325 and the last 0.239. I am not sure which segment is consider better in Robocode's world. What do you guys think?
—Preceding unsigned comment added by Nat (talk • contribs)
Well... one method is to consider it in terms of botwidth. If the enemy will take up two bins, just calculate the ratio of "highest sum of two consecutive bins" to "sum of all other bins" and use that. For a botwidth of two bins, your examples score 0.917, 0.579, 0.667, 0.632. For a botwidth of one bin, your examples score as 0.353, 0.250, 0.316, 0.240. In both cases, the results rather clearly show the first example as "best".
To be honest though, I'm skeptical of approaches along these lines. Imagine a statistician looking at various interpretations of data and picking the one that made it looks most likely that he's win the lottery. It's a fallacy to cherry-pick the optimistic interpretations of data.
I'll put it this way: What makes a particular segmentation bad? Statistically they're all equally valid actually (assuming they have equal quantities of data). They range from "useful" to "noise", never ever "anti-useful". Ultimately, the noise would average out to nothing if one has enough diverse segmentation. If you had infinite different segmentations, the "noise" data should always average out, and a fairly weighted sum of the the "useful" data. As such it seems like, if it were possible, it would be much better to have infinite segmentations at once, than to pick and choose.
--Rednaxela 18:08, 3 April 2011 (UTC)
Actually, in my gun implementation I am using Precise Intersection, and construct these bin from overlapping range e.g. for range [0,0.5] [0.4,0.7] [0.8,0.9] I would have bin as {0 1 2 1 0 1 0}. I planned to use this instead of bin because if I were using dynamic segmentation, I would need to store all GF record anyway, and it do waste time to perform bin smoothing for every split. So I can't use entropy-based algorithm, since the data length is variable. Your ratio approach looks good in term of VCS data, but I can't use it in range-based data.
But still I wonder whether my second, third or fourth data set is 'better' in term of aiming, though.
I think picking and choosing is better approach because it can use less segmentation when there is less data i.e. beginning of the match so we don't need another 'light' segmentation. And if the choosing algorithm is correct, it should eliminate all enemy movement noise. My current algorithm of choosing isn't only this, it is also consider if the splitting dimension is also obvious for segment (e.g. if the dimension data is 0.1,0.12,0.13,0.14,0.15,0.16 and another segment is 1,1.1,1.2,5,5.1,5.2 the latter segment is better)
- [View source↑]
- [History↑]