Time Weighted Rolling Average
Err, Nat, Paul Evans's version, and "Time Weighted Rolling Average" are essentially the same. The way you calculate the numeric constant is different, but the end result is the same. About the statement "The old value that exceed the n limited will be removed from averaged value like a magic!" isn't really anything magic, it's just weighting old values less. --Rednaxela 03:28, 17 February 2009 (UTC)
Thanks! I haven't realize that, just copy from old wiki! I'll change this page and move "Time Weighted Rolling Average" to sample implementations instead. About that statement, the value that exceed n will weight less enough that has really no side effects. The statement "like magic!" is one of statement on the old wiki, I want to keep it.
|Thread title||Replies||Last modified|
|Rolling KD-Tree?||3||06:37, 8 August 2021|
|Rolling Average vs Gradient Descent with Softmax & Cross Entropy||1||06:50, 27 July 2021|
|Rolling Depth||3||05:18, 30 June 2019|
So I am trying to work out how you deal with rolling averages in a KD-Tree DC gun implementation? I'd like a low rolling average anti-surfer gun. Silly question but I have no idea how to do it. I understand you can use time as a dimension in the tree, but then there is the problem of distance. Time is a continuous variable, which means you can't just go "get me the last 3 Guess Factors that match the current situation" as the time dimension in the tree will mean that you are not guaranteed the latest? I'm probably not explaining it very well. If you weight the time the same as other dimensions then you might get recent values, but, given the weighting of time in a 0-1 value distance matching, pulling 3 values out of the tree will not necessarily give you the 3 latest values, just the 3 closest values which might match latest or might be closer in non-time values?
This is a pretty terrible explanation/question sorry, but I'm trying to understand how to integrate low rolling averages with DC.
If you don't put time as a dimension and just pull out the say, 100 closest matches, you are not necessarily going to get any recent matches, so even if you put time-stamps in the data to filter those 100 results, you might not get any at all which are recent?
I've seen several ways of making DC models favor recent data points:
- As you say, add time as a feature to the data points. BeepBoop and DrussGT do this. I'm not sure if it is an advantage or disadvantage having time "compete" with other dimensions instead of it being a separate criteria for selecting points.
- Give your KD-Tree a maximum size where older data points are deleted as new ones are added. For example Diamond's anti-surfer gun combines several KD trees with different max sizes to find the closest matches in the last 125, 400, 1500, and 4000 waves. This of course also makes your tree(s) faster!
- Have the values in your tree store the time as well as the guessfactor. Then sort your top k matches by their age and weight them according to their rank. Diamond does this for its surfing, but not its guns. I suppose instead of ranking them, you could also weight them by 0.99^age or something, but I haven't seen that done before.
They each have different advantages/disadvantages and I'm sure there are other things you could try as well!
Hrm thanks Kev. I should have thought about 2. I didn't see any obvious way to remove points on the tree I'm using (Rednaxela) but it looks like it does support a max size! :facepalm how did I not notice that before? Sigh. Always the simple things!
Edit: Update, looks like the 2nd gen tree supports max size, but the 3rd gen tree, which I'm using only supports setting the bucket size and has no removal of old points. I'll have to check out other KD tree implementations to support limiting tree size.
Adding point removal isn't hard, the hard thing is to keep the right tree structure, which I haven't seen in most implementation, which is necessary to keep the tree performant.
Removing a point is essentially:
- search it as normal.
- iterate through each of the bucket and erases it.
If each Guess Factor bin is considered as an output unit before Softmax (logit), and loss is Cross Entropy, then the gradient of each logit is then:
- qi - 1, if bin is hit
- qi, otherwise
Where qi is the output of the ith unit after Softmax (estimated probability)
If gradients are not applied on logits as normal, but instead applied on qi itself, then:
- qi := qi - η * (qi - 1) = (1 - η) * qi + η * 1, if bin i hit
- qi := qi - η * qi = (1 - η) * qi + η * 0, otherwise
Which is essentially rolling average, where η (learning rate) equals to the α (decay rate) in exponential moving average.
Anyway this analog isn't how rolling average works, as logits don't equal to qi at all. But what if we replace rolling average with gradient descent? I suppose it could learn even faster, as the outputs farther from real value get higher decay rate...
Then one step further, you don't use VCS any more, instead add the logits of velocity bins, accel bins and distance bins, etc., all together. This structure is essentially estimating the probability as a multiplication of probability when e.g. velocity is high and distance is close.
If the movement profile relating to velocity, distance, etc. is independent, this approach will be mostly the same as traditional segmented VCS, with more data points.
Note that this approach is essentially a neural network without hidden units, or multiclass logistic regression.
In Basilisk, the only change I made from 3.2 to 3.3 was increasing the rolling depth by multiplying my bins by .99 rather than .95 every time a wave hits, and it gained me 2 ranking places!(From #8 up to #6). So, I was curious, what rolling depths do other people use in their guns? Also, are 1000 round battles good for testing rolling averages? I like 1000 rounds since it lowers the margin of error, but it might distort my results because it's long term learning, not fast 35 round learning. Thanks!
For non-adaptive movement, not using rolling average works as well, and if added a little, it performs better against multi-mode movement. But for adaptive movement, very fast rolling average is generally used.
1000 rounds battles are certainly not as accurate as running 30 * 35 rounds battles (1050 in total), that's why roborunner is invented, which you may have a try.
Btw, I'm using the whole rumble (1100+ robots) for gun testing now, ~5000 battles * 35 rounds = ~175000 rounds in total, which takes me ~2 hour on i9 8 core.
You might also try to weight non-bullet waves (the one where you don't actually fire a bullet) with a factor 0.1 or 0.2. For non-adaptive movements this should not impact scoring, but for adaptive movements it really increases accuracy. For mini Grimmig and micro I don't use rolling depths, for GresSuffurd I use two same guns, one without rolling depth and one with 0.9 (I think) to handle the wavesurfers. Currently my testing is just a handful of battles against my old self to tackle bugs. I try to only make changes that are logical (to me) and don't change to much at the same time. And don't forget, watch battles !! Especially your bots behaviour near walls/corners, at close range or against opponents where you score relatively low against, can give you insight on things to improve or fix.
OK, I think I'll keep my rolling average for now, but weight the non-bullet waves less like GrubbmGait said.
@Xor, I'll also install RoboRunner, I remember I planned to install it earlier but I keep forgetting :-P On my desktop at home, I have a Ryzen 3 1200 4 Core. 175000 rounds sounds like a lot! I'll probably not test against the entire rumble, but 30 seasons of 35 rounds sounds good!
@GrubbmGait, I try to only change one thing at a time, so that I can tell what change did what. And I always like watching the last 5 rounds of a battle at the end =)
Thanks for the help, both of you!