Difference between revisions of "Talk:Sequential Prediction"
(→Example?: topcoder problem solution) |
(→Example?: nice comments) |
||
Line 245: | Line 245: | ||
:: Thanks, that gave me a chuckle. =) I wish I could say I never come across comments like those... --[[User:Voidious|Voidious]] 13:06, 5 October 2009 (UTC) | :: Thanks, that gave me a chuckle. =) I wish I could say I never come across comments like those... --[[User:Voidious|Voidious]] 13:06, 5 October 2009 (UTC) | ||
::: Don't let PEZ read this, he would probably explode. [[oldwiki:WhyCommentsAreBad]] --[[User:Nat|<span style="color:#099;">Nat</span>]] [[User talk:Nat|<span style="color:#0a5;">Pavasant</span>]] 13:36, 5 October 2009 (UTC) | ::: Don't let PEZ read this, he would probably explode. [[oldwiki:WhyCommentsAreBad]] --[[User:Nat|<span style="color:#099;">Nat</span>]] [[User talk:Nat|<span style="color:#0a5;">Pavasant</span>]] 13:36, 5 October 2009 (UTC) | ||
+ | :: Hehehe those comments really made it easy to understand :-). --[[User:Zyx|zyx]] 14:00, 5 October 2009 (UTC) | ||
: I don't know C(++) very well...or whatever that language is, but it's definitely not Java --[[User:Starrynte|Starrynte]] 04:45, 5 October 2009 (UTC) | : I don't know C(++) very well...or whatever that language is, but it's definitely not Java --[[User:Starrynte|Starrynte]] 04:45, 5 October 2009 (UTC) | ||
Revision as of 15:00, 5 October 2009
Well this is the result of my quest for a good pattern matching algorithm, I hope you like it and please let me know if you have any doubts or comments.
Just for motivational purposes, I want you to know that my development version of this guns scores around 75 in average (with a maximum of 85 so far) against Shadow 3.66d in the TC challenges. I'm still under heavy development so I don't have definite results, but I think this will be a very powerful gun against surfers and it performs decently against random movers too (close to top GF guns but still doesnt outscores them). Hopefully in the next couple of weeks I will have some TC challenge results to post so I can encourage more people look into this. --zyx 07:44, 3 October 2009 (UTC)
While I'm not yet understand the algorithm, shouldn't this better be under Category:Targeting Implementations? Since it is still PM anyway. --Nat Pavasant 08:41, 3 October 2009 (UTC)
- I thought so at first, too (and almost changed it), but now I think it's fine like it is. This page is really describing an algorithm more than a specific gun's implementation of it, and others like Symbolic Pattern Matching are under targeting strategies, even though they're implementations of PM. --Voidious 16:53, 3 October 2009 (UTC)
OK, I understand the algorithm now (how weird am I that I got confused by your explanation but understand with the pseudo-code?). I'm not sure about the category, it is the data management for sure, but not sure about the 'Targeting Strategy'
I think this algorithm can do Single-Tick, just that it will run in O(n2). We need to clone the entire log for this =) --Nat Pavasant 09:21, 3 October 2009 (UTC)
Have you try your gun in Pattern Matcher Challenge? (oldwiki:PatternMatcherChallenge)? When I plug this algorithm into my DC-PIF robot and make it choose only the best match, then entered it to the PMC, I got only ~70% or so. I'm not sure if it is a bug in my implementation or not... --Nat Pavasant 13:02, 3 October 2009 (UTC)
- I ran two matches of the challenge to show you:
- As you can see the index is over 100%, but this is without limitations, not on the size of the log and not on the number of matches to try. It start skipping turns after round 70.
- My configuration is however more complex than the simple idea published. I have 3 different matchers in crowd form:
- Lateral/Approach velocity, using 17 discrete values for each attribute.
- Lateral/Approach velocity, using 34 discrete values for each attribute.
- Velocity/Rotation velocity, using 9 discrete values for each attribute.
- All matches are weighted as they are found by: match_size / DC_like_distance(last entry in pattern, current entry). Then I keep the best 100, play forward until I have the best 50 that stay inside the battle field. Then I run a kernel density function over those angles to find the best. In the DC like distancing I have many attributes, including distance of pattern to the log tail for fast adaptation. This configuration has been tweaked towards making a good AS gun, so I think I could get it to work even better for the PMC if I wanted to, but my goal is an alternative to current AS guns rather than killing simple pattern bots like this one.
- What are you matching on? I found that matching on velocity/rotation gives the best results against PatternBot but is not optimal for top surfers, where Lateral/Approach velocity is much better. --zyx 14:46, 3 October 2009 (UTC)
- I use velocity/deltaHeading (which I believe is as same as Rotation). 9 velocity discrete and 5 rotation discrete, yet it only find the pattern length <100 and it does match across round (I try adding ROUND_BREAK but fail) Perhaps there is bug in my matcher:
private class PMEntry extends Point2D.Double { double heading; PMEntry next, previous; String key; int matchSize = 0; int round; public PMEntry(Point2D pt, double heading) { super(); setLocation(pt); this.heading = heading; } } private class PMSequence { private HashMap<String, LinkedList<PMEntry>> dataMap; private PMEntry last = null; public PMSequence() { dataMap = new HashMap<String, LinkedList<PMEntry>>(); } private LinkedList<PMEntry> getEntries(String key) { LinkedList<PMEntry> entry = dataMap.get(key); if (entry == null) { dataMap.put(key, entry = new LinkedList<PMEntry>()); } return entry; } public void addNewEntry(PMEntry entry) { LinkedList<PMEntry> entries = getEntries(entry.key); if (last != null) { for (PMEntry e : entries) { PMEntry previous = e.previous; if (previous != null && previous.key.equals(last.key)) { e.matchSize = previous.matchSize + 1; } else { e.matchSize = 1; } } last.next = entry; entry.previous = last; } entry.matchSize = 1; entries.add(entry); last = entry; } public PMEntry[] getEntries(PMEntry entry) { LinkedList<PMEntry> entries = getEntries(entry.key); ResultHeap rh = new ResultHeap(entries.size()); for (int i = entries.size() - 1; i >= 0; i--) { rh.addValue(entries.get(i)); } PMEntry[] result = new PMEntry[rh.size]; for (int i = 0; i < result.length; i++) { rh.removeLargest(); result[i] = rh.removedData; } return result; } // Hijacked from Rednaxela's Tree private class ResultHeap { private final int size; private int values; public PMEntry removedData; private PMEntry[] data; public ResultHeap(int size) { this.data = new PMEntry[size]; this.size = size; this.values = 0; } public void addValue(PMEntry value) { // If there is still room in the heap if (values < size) { // Insert new value at the end data[values] = value; upHeapify(values); values++; } } public void removeLargest() { removedData = data[0]; values--; data[0] = data[values]; downHeapify(0); } private void upHeapify(int c) { for (int p = (c - 1) / 2; c != 0 && data[c].matchSize > data[p].matchSize; c = p, p = (c - 1) / 2) { PMEntry pData = data[p]; data[p] = data[c]; data[c] = pData; } } private void downHeapify(int p) { for (int c = p * 2 + 1; c < values; p = c, c = p * 2 + 1) { if (c + 1 < values && data[c].matchSize < data[c + 1].matchSize) { c++; } if (data[p].matchSize < data[c].matchSize) { // Swap the points PMEntry pData = data[p]; data[p] = data[c]; data[c] = pData; } else { break; } } } }
- By the way Nat, your adaptation of my ResultsHeap is notably inefficent. Using a single array of PMEntry instead of keeping the match length stored in a seperate array, makes the operations of reading/comparing/etc the values measurably slower. Also... by not having any limiting on the number of matches returned... it's hugely slow and almost entirely defeats the purpose of using a heap. The heap will only gain you significant performance if you're only keep a size-limited list of the longest matches (and keep the 'else' condition you removed from addValue for no good reason...). Plus, for keeping the LONGEST matches you want a min-heap, not a max-heap like is used in a kd-tree. Honestly your implementation of it looks like somewhat of a performance nightmare. --Rednaxela 18:04, 3 October 2009 (UTC)
- Since that test with tons of skipped turns at round 24, I rewrote the whole gun (yes, really). Just run another match with PatternBot, using single buffer it skipped no turn. Still using while data, heap is just to sort (should use merge sort, but too lazy to write one) --Nat Pavasant 18:37, 3 October 2009 (UTC)
Oh wow! This is really neat! I wonder why it starts skipping turns after round 70 though... Even with three guns running at once, it seems to me it shouldn't start skipping turns that early, unless you're doing something like increasingly heavy multiple-choice throughout the match, without efficent PIF. --Rednaxela 15:02, 3 October 2009 (UTC)
- It need to scroll through ~10,000 records each tick =) Even with ABC's PIF method, my (buggy) PM got skipping turn too. With normal PIF (sin & cos), it start at around round 24 or so. --Nat Pavasant 15:06, 3 October 2009 (UTC)
- Against PatternBot rounds are about 1200 ticks, 70 rounds is a 84000 log size. I have ABC's fast play it forward but I do have heavy multiple choice and weighting. I think it can further optimized, but I actually only care about normal 35 round battles. So I won't tackle many optimizations yet. And is not like a skip forever, but some rounds it skips about 5 turns at the beginning, when PatternBot is positioning itself because that part is a really simplistic pattern so many many matches are found. Besides my clustering is based on Voidious' brute force method, I could try a heap like algorithm later, as Nat did. --zyx 15:31, 3 October 2009 (UTC)
Nat, remember you have to traverse the list in reverse order as they appear in the log. Try changing entries.add(entry);
for entries.add(0, entry);
in the addNewEntry
. That should give better results, but not fix the across rounds matches. You have a static PMSequence? Then try reseting last
to null at the beginning of each round. Although I haven't checked if mine matches across rounds, but I'm quite doubtful it does. --zyx 15:31, 3 October 2009 (UTC)
- Thank you, it works fine now, except that it is awfully slow using HashMap and LinkedList =) Mine start skipping turns at round 20 and skipped turns like hell at round 28. But robot-generated PMC index (I make it collect data and print index every round) tell me that it work. One thing I don't understand, is why is it need to travel in reverse order? --Nat Pavasant 16:06, 3 October 2009 (UTC)
Very cool! Glad to see I'm not the only one in the Targeting Laboratory. =) I need to re-read this page a couple times before I fully grok the code, but I think I get the idea. I think Multiple Choice on the deepest matches is the kind of thing that could really put PM over the edge, and it's much more viable to do that with such a lightning fast PM algorithm. By the way, have you seen the Folded Pattern Matcher (old wiki link)? I believe it's different than what you have here, but it is another very fast PM, so I thought you might want to look for comparison / inspiration. Thanks for posting such an in-depth write-up - good luck with this. --Voidious 16:40, 3 October 2009 (UTC)
- (I know I'm not zyx) I have been planning with PM-kind thing for a while. The problem I got is the speed (now this work, should be better on my project, which have much the same goals and path as zyx) I use to used FoldedPatternMatcher too, but that still O(n^2) because it need to check each case too (it only skips the find-the-head-of-each-match step) This is O(n), and you know how much faster it is (my multiple-choice algorithm make it O(n log n) though, due the heap use) On other note, when I first read this page I think it is just re-invent of FoldedPatternMatcher =) --Nat Pavasant 17:26, 3 October 2009 (UTC)
- Yes I forgot to mention that, I had it in my mind (I wrote the article at 4am). FolderPatternMatcher is very similar, I think if whoever wrote that would have realized that you can keep the match size as you find them instead of going trough them every time then it would have come out as this algorithm.
- HashMap are O(log n), if you know the alphabet size and use integers as key, then using a built in array is much faster O(1). --zyx 17:45, 3 October 2009 (UTC)
One thought that may improve performance somewhat... would be using an increasingly larger alphabet size the number of rounds increases. This will both lead to more precise matches as more data is avaliable, and reduce how many entries will be matching the current 'letter' of the alphabet. --Rednaxela 17:53, 3 October 2009 (UTC)
- But using larger alphabet you will broke the match with old data, or unless you use multiple buffers... --Nat Pavasant 18:37, 3 October 2009 (UTC)
- I have thought of doing something along that lines, but as Nat mentions you need different buffers for that. But you can update the detailed ones without any prediction based on them at early rounds. And later throw away the less detailed buffers when the detailed ones have a decent amount of match size. --zyx 19:34, 3 October 2009 (UTC)
- Experimenting with it and got:
========================= Round 1 of 100 ========================= 0,000 Initializing movement log... java.lang.OutOfMemoryError: Java heap space at java.util.ArrayList.<init>(Unknown Source) at nat.z.pm.PMSequence.<init>(PMSequence.java:34) at nat.z.pm.PMGun2.<init>(PMGun2.java:89) at nat.z.PMTest.init(PMTest.java:18) at nat.base.BotBase.run(BotBase.java:204)
Nat Somewhere you asked why you needed to traverse it in reverse order, try running it in normal order for this case. Assume you have correct values for "AAABA" (ie: 111**) and now try to add a new A to it. --zyx 00:26, 4 October 2009 (UTC)
There are several references here to ABC's fast PIF method, however, I couldn't find anything related to this on the wiki. What exactly is this method? --Navajo 05:31, 4 October 2009 (UTC)
- I don't know where the original post is, but for now you can check Nat's talk page. There is some discussion, codes and a link to the oldwiki for Skilgannon's code with some ABC code as well. I think from there you can get to the original post too, but I may delirious. --zyx 07:15, 4 October 2009 (UTC)
- I believe the original post is at oldwiki:DCBot --Nat Pavasant 07:34, 4 October 2009 (UTC)
Example?
I've reread the main page a few times now and "sort of" get it, but not quite. zyx, would you mind adding a slightly longer example with the binary alphabet? I understand the first step (finding the B or A starting point in the "histogram") but I'm not clear how you continue from there. Does it recurse back into the histogram with all matches for the next letter or ... ? --Darkcanuck 21:50, 3 October 2009 (UTC)
- I added a new section which contains a step by step of the algorithm matching a simple string, hope it helps in understanding how it works. Although I think the implementation pseudocode is the clearest way to do so. And there is no recursion anywhere, the idea is to have a list for each character in the alphabet, so that when traversing it you go trough all the possible ends of a match, the size would tell you where the pattern actually starts (not really needed for most Robocode applications). --zyx 00:07, 4 October 2009 (UTC)
- Thanks! Still wrapping my head around this... The pseudocode seems to add some unexplained tricks and has two mysterious functions. But let's see if I understand this: you're basically matching as you update. So you can't really ask the algorithm for the best match to "X", rather it gives you the best match for the newest data in the sequence. I think that's where I was stuck. In the naive implementation you would have to rescan the entire log (in reverse order) once to update the match lengths, correct? I guess your pseudocode tries to speed up that process by remembering pointers into the log for each letter of the alphabet so you only have to scan those points. This is fascinating, I'm going to have to make a simulator for it before trying it in-game... --Darkcanuck 01:48, 4 October 2009 (UTC)
- Thats exactly what it does. More formally it finds all the substrings that match the current tail of the log. --zyx 01:52, 4 October 2009 (UTC)
- I don't know if it is just me or not, but I understand algorithm better with code than with explanation (I understand Bucket PR k-d tree when I read Simonton's tree. I understand MR and GF when I read Coriantumr's source. I understand DC when I read DrussGT's source. I understand WS when I read BasicSurfer's source. I understand PM when I read WeekendObsession and Waylander/Toorkild source. etc.) --Nat Pavasant 02:32, 4 October 2009 (UTC)
- Well source code is somewhat universal and unambiguous, so it helps a lot in understanding. But with the lack of some explanation, just reading code can become extremely hard to understand, even something you wrote yourself some years ago. Imagine reading BasicSurfer with absolutely no explanation, you can eventually figure it out (especially if you already know GF targeting) but I'm sure it will take many re-reads and a lot of time to do so. --zyx 07:21, 4 October 2009 (UTC)
- Not for me. Actually I know WS before GF. I got headache trying to understand Wave Surfing tutorial, which I got nothing at all. Then I read BasicSurfer, which make me understand the concept of Wave Surfing finally. Even though I admit that the 'self-documentation' code and/or inline comments help a lot. --Nat Pavasant 07:34, 4 October 2009 (UTC)
int i, j, k, x, a; struct LittleSquares { template<class S>typeof*""*winner(S s) { for ( ; i<s.size(); a?k^=a%2^'rf_H'/7>>x:j=!++++i ) a=s[i][j]+s[i+1][j++], x-=a&8?3:x; return k&7 ? "first" : "second"; } };
Tell me what this code does ;-). If you want, assume is called with this parameter s = { "...#", "..##" }. --zyx 15:40, 4 October 2009 (UTC)
- This code could be greatly improved with a few comments:
int i, j, k, x, a; // integer values struct LittleSquares { // structure template<class S>typeof*""*winner(S s) { // template for S for ( ; i<s.size(); a?k^=a%2^'rf_H'/7>>x:j=!++++i ) // loop over i a=s[i][j]+s[i+1][j++], x-=a&8?3:x; // a & x are integer results return k&7 ? "first" : "second"; // return "first" or "second" } };
- There, now its production-ready! =) --Darkcanuck 03:19, 5 October 2009 (UTC)
- Thanks, that gave me a chuckle. =) I wish I could say I never come across comments like those... --Voidious 13:06, 5 October 2009 (UTC)
- Don't let PEZ read this, he would probably explode. oldwiki:WhyCommentsAreBad --Nat Pavasant 13:36, 5 October 2009 (UTC)
- Hehehe those comments really made it easy to understand :-). --zyx 14:00, 5 October 2009 (UTC)
- Thanks, that gave me a chuckle. =) I wish I could say I never come across comments like those... --Voidious 13:06, 5 October 2009 (UTC)
- I don't know C(++) very well...or whatever that language is, but it's definitely not Java --Starrynte 04:45, 5 October 2009 (UTC)
What I mean is regular code =) It is C++ I know, not that it help much. I only write plain old C. By the way, if I simplified correctly.
int i, j, k, x, a; // integer values struct LittleSquares { // structure template<class S> typeof(*"") *winner(S s) { // template for S for ( ; i < s.size(); (a != 0) ? // k = K XOR (a MOD 2) XOR (10 000 010 101 111 100 010 001 111 000 >> x) k ^= (a % 2) ^ (274187384>>x) : j = !++++i // j = 0, i += 2 ) { // loop over i a = s[i][j] + s[i+1][j]; j++; x -= a == 8 ? 3 : x; } return (k == 7) ? "first" : "second"; } };
In this case I would answer, this code didn't do anything since its lack of main() or WinMain() =) OK, seriously, since I work on g++, its result isn't predictable since all variable isn't initialized yet (unlike Java =)), or initialized at an unknown place. I don't know if all this things are on VC++ too. Lastly, I repeat my first sentence. Even decompiled obfuscated code are not as f<censored>g mess as this. --Nat Pavasant 12:08, 5 October 2009 (UTC)
- Yes I know that code is simple madness, but as unlikely as it may seem, that solves this relatively hard topcoder problem. --zyx 13:57, 5 October 2009 (UTC)