Great and very detailed article!

Jump to navigation Jump to search
Revision as of 3 November 2017 at 23:46.
The highlighted comment was created in this revision.

Great and very detailed article!

Thanks for your work! This may be one of the briefest material about PIF, and could be one step toward popularizing melee battles, like the well-known GF tutorial & WS tutorial.


Anyway, two questions:

1. Is this algorithm faster than simply looping through all the scans before bft, in real world? Since either looping through all of them or branching for the closer need the entire memory of scans in bft to be loaded in cache to perform very good, and in cases that the entire BFT scans is not suited in cache, there will be badly several cache misses which slows the entire process down, if you don’t access memory in a pretty predictable manner. Anyway, this is only guess, so benchmark is still needed.

2. If I understood this algorithm correctly, it requires scans at time s + 0, s + 1, ... , s + bft to be all stored in array, being continuous. But in melee if we only store actual scans, looping through all the required scans is already getting a very small constant factor. If we instead store every scans and interpolated scans, it maybe harder for the CPU to access that data — e.g. cache misses because the increased size. So again, comparing to inconsistent scan version, is this algorithm still faster in real world?


Btw, in the pseudo code I could see that the data is actually stored in objects, which is accessable from an array of pointers — then one step of it may yield a cache miss — meanwhile the loop through contiguous memory may yield no cache miss, although with more read operations from CPU cache.

    Xor (talk)18:59, 3 November 2017

    Algorithmic optimisation will pretty much always outperform low level optimisation's such as optimising memory access for cache misses.

    You really shouldn't need to worry about cache misses in Java!

    Optimising from o(n) to o(log n) will give a big performance benefit!

      Wolfman (talk)20:01, 3 November 2017

      Yes, I agree with that. It just happens that in a scenario where you get on average one scan every 4-5 ticks, and the average BFT is 50 or less, even the theoretical improvement becomes negligible. But I'm a guy who likes to have the worst case situations nicely covered :P

      The interesting question for me is: "does this make my bot run faster?"

      I do not have this answer, I only know that this helps me not skipping turns because of odd worst case situations.

        Rsalesc (talk)21:14, 3 November 2017

        Well, I think the worst cases is not about bft, but the entire round time. BFT is too small to make you skip a turn, but a bug most bot authors make could make the worst case round-long.

        The catch-point is, how do you handle data from different round?

          Xor (talk)00:46, 4 November 2017
           

          Well, that’s only true for large enough n... And for small n, such as our cases, constant factor is dominant.

          Btw, memory access is WAAAY expensive than basic calculations, so the gain for optimized memory access, for small n, often outperforms paper algorithms that don’t use contiguous memory in order.

            Xor (talk)00:43, 4 November 2017
             

            1. I agree with that, a benchmark is needed here and I can even provide more than one implementation of this algorithm. Ill try to do this when Im home. Any idea on how should I benchmark this? Real world melee data or randomly generated?

            2. The gain of theoretical speed increases as the size of our movie increases. So yes, in 1v1 I'm almost sure it is faster in practice, but I can't say the same about melee. Notice, though, that the number of iterations is in the order of <math>\log K</math>, where K is the number of inconsistent scans between 0 and BFT. It just happens that the worst case is when you have BFT scans. You do not need to store the interpolated scans in this array. You can just do a single interpolation after the algo is done, if you are linearly interpolating finding the impact point is very simple. So in terms of iterations, the difference is still good. But yeah, the difference becomes less and less noticeable as our movie gets sparser, even less if we do not be careful about cache misses.

            3. The data is stored in objects and I do that in my code as well, but I would say it is ok to store the needed information in contiguous arrays as well, I just find it ugly.

            Feel free to provide any other insight about this and even to post your implementations of this you find them useful! :)

              Rsalesc (talk)21:04, 3 November 2017