Great and very detailed article!
The highlighted comment was created in this revision.
Thanks for your work! This may be one of the briefest material about PIF, and could be one step toward popularizing melee battles, like the well-known GF tutorial & WS tutorial.
Anyway, two questions:
1. Is this algorithm faster than simply looping through all the scans before bft, in real world? Since either looping through all of them or branching for the closer need the entire memory of scans in bft to be loaded in cache to perform very good, and in cases that the entire BFT scans is not suited in cache, there will be badly several cache misses which slows the entire process down, if you don’t access memory in a pretty predictable manner. Anyway, this is only guess, so benchmark is still needed.
2. If I understood this algorithm correctly, it requires scans at time s + 0, s + 1, ... , s + bft to be all stored in array, being continuous. But in melee if we only store actual scans, looping through all the required scans is already getting a very small constant factor. If we instead store every scans and interpolated scans, it maybe harder for the CPU to access that data — e.g. cache misses because the increased size. So again, comparing to inconsistent scan version, is this algorithm still faster in real world?
Btw, in the pseudo code I could see that the data is actually stored in objects, which is accessable from an array of pointers — then one step of it may yield a cache miss — meanwhile the loop through contiguous memory may yield no cache miss, although with more read operations from CPU cache.
Algorithmic optimisation will pretty much always outperform low level optimisation's such as optimising memory access for cache misses.
You really shouldn't need to worry about cache misses in Java!
Optimising from o(n) to o(log n) will give a big performance benefit!
Yes, I agree with that. It just happens that in a scenario where you get on average one scan every 4-5 ticks, and the average BFT is 50 or less, even the theoretical improvement becomes negligible. But I'm a guy who likes to have the worst case situations nicely covered :P
The interesting question for me is: "does this make my bot run faster?"
I do not have this answer, I only know that this helps me not skipping turns because of odd worst case situations.
Well, I think the worst cases is not about bft, but the entire round time. BFT is too small to make you skip a turn, but a bug most bot authors make could make the worst case round-long.
The catch-point is, how do you handle data from different round?
Yeah, it is, my gun was pretty slow in melee. Idk if I got what you mean. Can you clarify?
e.g. You store the information of the next round right after the first round, and when the scans of the first round isn’t enough to get a hit, you continue searching scans from the next round and start from time = 0 to time = movie start time + bft.
if you store time as globaltime, this will only result in inaccurate result which may be eliminated by kde. But if you store round time, it will cause the data of the entire round be iterated.
Well, that’s only true for large enough n... And for small n, such as our cases, constant factor is dominant.
Btw, memory access is WAAAY expensive than basic calculations, so the gain for optimized memory access, for small n, often outperforms paper algorithms that don’t use contiguous memory in order.
1. I agree with that, a benchmark is needed here and I can even provide more than one implementation of this algorithm. Ill try to do this when Im home. Any idea on how should I benchmark this? Real world melee data or randomly generated?
2. The gain of theoretical speed increases as the size of our movie increases. So yes, in 1v1 I'm almost sure it is faster in practice, but I can't say the same about melee. Notice, though, that the number of iterations is in the order of <math>\log K</math>, where K is the number of inconsistent scans between 0 and BFT. It just happens that the worst case is when you have BFT scans. You do not need to store the interpolated scans in this array. You can just do a single interpolation after the algo is done, if you are linearly interpolating finding the impact point is very simple. So in terms of iterations, the difference is still good. But yeah, the difference becomes less and less noticeable as our movie gets sparser, even less if we do not be careful about cache misses.
3. The data is stored in objects and I do that in my code as well, but I would say it is ok to store the needed information in contiguous arrays as well, I just find it ugly.
Feel free to provide any other insight about this and even to post your implementations of this you find them useful! :)
Well, using real battles to benchmark often give you pretty high margin of error if not done properly... Anyway run 100 seasons against RaikoMicro and see the total time seems to say something about the overall performance. And using percentage run time, e.g. PIF time / total run time of your bot may be even better.
Btw, 1000+ highly optimized iterations as worst cases shouldn’t cause you skipped turns, but if you don’t use contiguous memory, and access that in order, several 1000+ cache misses in one turn is enough to kill you imo.