DC and VCS
The highlighted comment was created in this revision.
So, I have been working on my movement, and I theorized that VCS movements would have a small advantage in the rumble because VCS guns are so common. However, I dislike VCS because I have found KNN so much more intuitive. (Yes, I probably just had a lot of bugs in my VCS gun, but I still like KNN better.) The problem I have is that I can guess how important attributes would be for a random movement fairly well, but I can't guess what people would use in their guns so I can't do as good of a job with those. (Which basically means I just copy Diamond) So I wanted to try to make my KNN classifiers match VCS guns as much as possible (except for the flattener where being different is probably beneficial). Step one is to make classifiers that would match "typical" guns, like raiko, or GFTargeting tutorial, etc. Any thoughts on what is "typical" for VCS guns? Also I can't think of how the weights should be chosen. Any ideas?
Well, the same dilemma of "guess how important attributes would be" exists with VCS as well.
Anyway, fun thing is, you can actually determine in a fairly algorithmic manner, how to make a KNN search that behaves as closely as possible to a given VCS configuration (i.e. Raiko's like you mention) while still being continuious.
- For each dimension, examine the VCS bins. If the size of the bins are non-equal (be careful to note how the minimum/maximum values of the dimension interact with the first/last bins), then you need to divise a continious function which curve fits the bin-size as a function of the value of the dimension. You can use any form of regression you like for this, provided the function stays above 0 for all.
- For each dimension's "bin-size" approximation function, take it's integral. The resulting function should now be monotonic. We'll call this the "scaling function"
- Feed each "scaling function" the minimum/maximum values for it's associated dimension and make note of the minimum/maximum values out of the function. Then divide the number of bins in the dimension by the difference between the maximum/minimum values, to get the scaling factor to make the range of the "scaling function" proportional to the number of bins. Multiply your "scaling function" by this.
- Now, before inserting values into the kd-tree, you put them through this "scaling function", which should weight them approximately the same as how the VCS did :)
(Hmm... maybe I should write code to perform this procedure some time...)
I've only seen a fraction of the VCS guns out there, but "typical" configurations are probably modeled after the big / influential VCS bots, like Raiko, PEZ's bots, FloodMini, maybe even Dookious.
I guess maybe this question is for other people, since if you're looking at Diamond, you already know what I think about what attributes are best. =) But no shame in starting with a Diamond-inspired configuration and trying to improve from there. I certainly took guidance from Raiko and CassiusClay, and I also keep my eye on what brainiacs like Rednaxela and Skilgannon are doing.
I think a more difficult task is modeling data decay in KNN surf stats. Whenever you get hit, it means that your existing data is inaccurate. (If it were accurate, you wouldn't have been hit.) Where you got hit is the peak in that segment of his gun, so it needs to out-rank the other data points in that segment of your stats, but you have no segments so you can't just use rolling average. I think tuning this aspect of your KNN surf stats is at least as important as getting the attributes just right.
DrussGT uses a "bullets shot" classifier to give higher weight to newer data and works somewhat like data decay. It makes k-NN search extrapolate a bit, but still helps increase the score against learning opponents.
IMHO, the advantage of VCS over DC is CPU performance. But VCS compresses data into bins and some information is lost, while in DC it isn´t. So, well tuned DC should perform better than well tuned VCS, unless you start skipping turns.
That is for the gun, but a similar thing is definitely useful in movement. Best would be one log which has data rolling, and one without to handle simple bots and make sure you don't forget anything about them. That trick actually comes from Rednaxela, and possibly even originally ABC. It causes the Kd-Tree to be a bit slower as the match progresses, but works wonders against surfers and is behind my recent-ish PL improvements.
I have a similar view on VCS vs DC - VCS is faster at lookups (obviously, index lookups are O(1)) but less accurate due to the discretisation of both the bins and the segments. However, even in a DC environment VCS has its place: if I were surfing DC I would cache my results in an array just like VCS to make lookups faster when evaluating surfing points. Note, DrussGT's movement doesn't actually use segmented VCS in the movement anymore, but instead lists of hit indexes in each segment. This reduced my storage space and my logging-hits time, and actually reduced my retrieve time as well because many of the hits are different representations of the same original hit (from my many buffers). I did some trickery with weighting the hits to make sure it gives exactly the same results as my VCS with a rolling average would have.
It's worth noting that in movement, you have far less data, so CPU speed is less of an issue. Until you get into flatteners, and even then maybe virtual wave flatteners (which are rare), DC surfing could probably do fine without even using kd-trees. Precise prediction far outweighs information management, I think.
And yes, having a dimension that is pure linear time is certainly one of the simpler approaches to KNN data decay... ;)
What I was actually doing in my gun was having a non-linear time dimension, and it worked out a lot better than straight linear. I think I ended up with 1.2*T^(0.4) or so, instead of 0.005*T. Those weights were genetically evolved with my WaveSim-ish setup.
It makes sense that at the beginning you want your data to decay faster than towards the end...