Talk:Wiki Targeting/Dynamic Segmentation
How do you avoid the arbitrary decision of splitting mechanisms? In your example you suggest that distance could be the first splitting mechanism. How would you dynamically arrive at this? If you can not, and I suspect that it would be hard to actually choose the most significant segment dynamically, then how do you avoid all the existing hand tuning? The true holy grail to me seems to be the ability to decide which is the most significant segment and split on that.
Secondly: after splitting on distance > 500 how would this alorithm allow for the distance to be split more finely? Could it somehow determine that it is necessary to split the > 500 into > 250 and < 250? If so, how does it determine that this second split is more significant than say acceleration observations?
This certainly is an intriguing idea and I will take a much more deep look at it when I get home (still in CA until WED this week). -- jim
The decision is a modified spikyness algorithm (as seen on the WikiTargeting page), that returns a VisitsCovered number.
suppose we have a Node that needs to be split, containing the following GF bins: (The numbers represent visits, the ^ represents bins used for targeting, taking botwidth into account)
[######## ] [########## ] [############ ] [############## ] [###############] [###############] [###############] [###############] [###############] [999999998877665] ^^^
This node is obviously very flat. Suppose botwidth covers 3 bins. VisitsCovered will be 9+9+9=27, which in this case is not very good. Now suppose our algorithm finds that one of the many possible splitting parameters results in these two child nodes:
[ ] [ # ] [ ### ] [ ##### ] [ ####### ] [ ######### ] [ ########### ] [ ############# ] [###############] [123456787654321] ^^^ [ ] [# ] [## ] [### ] [#### ] [##### ##] [###### ####] [####### ######] [###############] [876543211223344] ^^^
Now we have one VisitsCovered number of 7+8+7 = 22, and one of 8+7+6 = 21, so we have a total of 22+21 = 43. The algorithm will have found that 43 is much better than 27 (the VisitsCovered of the parent node). Maybe some other splits have resulted in, say, total VisitsCovered of 29, 35, 28, 30 etc. Now the algorithm will decide that our example split is the most efficient one, since it will cover most visits.
So there's your holy grail, if I understand you correctly.
Splitting more finely is already covered in the algorithm. The class RecursionLevel takes care of that. An instance of that class is always sent up the tree together with the observation to be processed. When a splitting node is encountered this node will alter a variable in the RecursionLevel object to indicate that a certain parameter has been split upon. Actually, the RecursionLevel class contain two variables per dimension: a max, and a min value. In the Distance case, the RL Object will start life with the distance numbers 0 and 1000. After the first split it will have the numbers 0-500 respectively. After the third split it may have the values 250-500, etc. Note that this is only an example! It is not carved in stone that the algorithm will start out by splitting on distance three times in a row from the start.
--Vic
Note to self: Place Vic on the rocket scientist list when I get home. -- jim
I really don't think this is rocket science. If it sounds overly complex then I most likely failed to explain it well. Kawigi, Jamougha and probably others, have in the past been working on the true rocket science variant: binary decision trees based on entropy. Check out Segmentation/Prioritizing. This WikiTargeting/DynamicSegmentation algorithm is my attempt to simplify that theory. So if you think my explanation is crummy, please let me know. Then I will try to explain it more clearly. --Vic
But Jamougha reached the conclusion that the entropy stuff wouldn't deliver in Robocode, didn't he? Just checking, because this path might be the alchemistry of Targeting. =) -- PEZ
Yes he did (although I don't think he actually tried it). Anyway, I abandoned the entropy stuff because I just thought is was too complex. I think my above solution of visits covered is much simpler and much more to the point when working with arrays of GF bins. --Vic
Jamougha just showed that the normal entropy calculation (square of the difference) wouldn't always do intuitive things as far as improving your hit rate. And also, decision tree learning doesn't have to be binary. -- Kawigi
I don't know if Jamougha's doubts apply to my variation of the algorithm. I must also admit that I didn't fully understand Jamougha's 'overfitting' argument. Does this have to do with splitting up segments when there is not enough data to justify this? The thing about my algorithm I most worry about is the forming of the first couple of nodes. The first 40 waves collected will have a crucial impact on the tree since it will define the first (base) split. These 40 waves may see the enemy travelling from the spawn point in a corner towards the middle of the battlefield. I don't know if this will cause the algorithm to fail. Testing must show that. --Vic
I imagine the argument about overfitting would involve splitting up segments because of data that incidently shows some difference due to the random nature of the data, rather than because of actual tendencies. In theory, after, say, 500 rounds, the same tree (give or take) should be formulated as every other 500-round match, hopefully without the additional need to remove too many splits along the way from realizing that they weren't good afterall.
It occurs to me that a threaded approach to decision tree learning mixed with guess-factor targeting might be an interesting way to go (periodically re-generate the tree based on entropy or something like it, maybe at the beginning of every round). Or maybe I'm on crack. The real usefulness of d-tree learning may just be to analyze deeply segmented data after a long match to decide which static segments to leave in your bot :-p -- Kawigi
From experience I know that after only 5 or 6 rounds performance becomes an issue when re-arranging the entire graph. I also think that when a segment has enough data (say 300 waves collected) re-arranging becomes unnecessary. The stability of the segment should outweigh the performance penalty of re-arranging. Of course the real boundary here would have to be determined in practise. Maybe it could be as low as 50 or something. So maybe there should be a middle type node (containing between 40 -splitting minimum- and 300 observations) that gets to be rearranged after every round. That should solve my above concern about the formation of the first nodes. Then again, as the tree broadens, more and more of these middle type nodes will be present and performance will become an issue once again. Maybe it is a valid assumption that nodes become more stable higher up in the tree? I mean, they will become spikier and more predictable, so there is less chance of splitting on random spikes. In that case the middle type node upper boundary could decrease as we go higher up in the tree. In the case of a binary tree this 'stability threshold' should be halved at every level increment to keep the time needed for re-arranging constant. Since I am no math wizard, does that make any sense mathematically, Kawigi? --Vic
There's also a good chance that once more data comes in (particularly against good movers, like Raiko or something), it won't be spiky anymore. How much data do you need to be "sure" about a split? You might be right about it depending on the depth, but probably more importantly the amount of data in that area (which will naturally be more in some parts of the tree than others, for instance, you will have roughly twice as much in an acceleration segment than a deceleration segment, and depending on the bot, you could have even more in a constant speed segment. Most bots would have more data in a top-speed segment than a medium-speed segment, too. And so on. I guess the bottom line is that I don't know how the "stability threshold" should be implemented at all, but I think that data should rule over depth, and another thing that could influence it is the variance or entropy of the data in that segment. Maybe entropy needs to be "adjusted" to account differently for data that isn't in the same bucket but is really close (the normal entropy function is used for classification, where the classes aren't necessarily similar). Anyways, lots of musings there.
The performance hit on relearning the tree need only increase as the depth of the final tree increases, but in either case, it isn't something you want to do every tick, nor is it something worth doing that often, but doing it in the background occasionally seems reasonable. If the data is stored in the form of tiny segments, those can each be processed in basically constant time (not increasing with the amount of data actually in there). There's still plenty of thinking and playing that should be done with this, keep it up ;-)
Random note - Entropy seems to me to be related to statistical variance, except that it is built better for classification problems, but variance might work better when your stats follow a normal distribution. Why doesn't someone just invent something built for us? --Kawigi
Your first question is the same one we have been asking on the WikiTargeting page. Isn't there some mathematical definition of this in statistics? And, oh yes, I will keep it up. I actually find this an intriguing concept. I'll be back in about three weeks time. First up: vacation :-D --Vic
A few things that I have been thinking about. First, your visit count method has the problem of, given a flat distribution, you could have your spike being the max (ie a 99999999 you could have 99990000 and 00009999). What I am contimplating is a combination of the visit counts, an entropy calculation, and a maximum hit probability. Also, you dont need to perform the recursive divisions, because if the first split is not the most useful, the profiles will just make a useful split the next time. They will just start off the same. I am also working on a system to see how similar two arrays are, which should be useful both on selecting the most appropriate split, and for comparing on fire behavior to general behavior (if they can be shown to be the same it will increase the learning speed 10 fold. -- Jokester
I'm thinking that my gun for DrussGT will be something down these lines. And, from what I can reason out, it shouldn't matter which split comes first. If it is a 'bad' choice, then the data afterwards should make *each* of the subnodes do a good split when we have more data. It may slow down our learning time, but that shouldn't matter with this gun anyways. If it is a problem, one way to stabilize the initial splits would be to add a few simple segments as default (eg. lateral velocity, walls) and let the dynamic segmentation add in all the fancy stuff (distance, accel, decel, time since decel, distance last x ticks, time since direction change, gunheat, headingchange, advancing velocity, etc) individually for each bot. If you kept a counter it would even be possible to do rolling averages, weighting each entry by how long ago it was collected. As mentioned above, one of the requirements for a node is that it splits the data so that there is a balance of data to each side (we ARE using a binary tree here), that there are enough entries in the segment beforehand, and that the resultant nodes have a higher Entropy than the initial node. Additionally, by using this arrangement, you will never have an empty node when you want to fire your data, and each subnode is optimised for its conditions: for example, it may be useful to make a decel split inside the nearwall segment, but not otherwise. Anyways, if anybody else wants to check my code when I've got something working, feel free =) -- Skilgannon