# Sequential Prediction

(Sequential Prediction - first review) |
(Added categories, I hope their are the appropriate ones) |
||

Line 82: | Line 82: | ||

* Distance to sequence tail (very good for fast adapting dodgers). | * Distance to sequence tail (very good for fast adapting dodgers). | ||

* Similar weighting as found in [[Dynamic Clustering]] algorithms. | * Similar weighting as found in [[Dynamic Clustering]] algorithms. | ||

+ | |||

+ | [[Category:Advanced Targeting Strategies]] | ||

+ | [[Category:Log-Based Targeting]] |

## Revision as of 07:47, 3 October 2009

Sequential prediction is a very fast Pattern Matching algorithm, it has a linear running time. While it matches it is able to find all matches along with their match size. The main draw back is that the values matched need to be discrete and capped, that can be achieved just like in Symbolic Pattern Matching but we can use integers instead of characters in a String if preferred.

## Contents |

## Credits

I got the idea and basic algorithm from:

- Fri Mommersteeg - Eindhoven University of Technology. Article published in the book Game Programming Wisom, 2002 under article
*Pattern Recognition with Sequential Prediction*.

I tried to keep most of the original author's idea, but I biased it a little bit for Robocode.

## Basics

- As in other pattern matching algorithms, assume we have a sequence of size N called log. For simplicity we will assume it's indexed form 1 to N.
- Let a match be a pair {i, k} such that log(i + j) equals log(N - k + j) for all j from 0 to k-1. We will call k the size of match.
- Let a longest match be a match with maximal size.

If we can find a longest match we already have more information than most current pattern matching algorithms (except for Single-Tick Pattern Matching), which usually have one or more fixed sizes to try. And if we can find it faster, then it's even better.

Consider the binary string *BABBA***BAABA***AABBBBABB***BAABA**, in this case the longest match is **BAABA** and has a size of 5. The pattern starts at index 6 of the log, therefore the match is called {6, 5}. A naive algorithm would look for the previous A, and then check if the value to it's left is a B, and so on. This algorithm takes O(n^2) to find the longest match, which is too slow for even relatively small log sizes.

## Better Idea

The key point is take advantage from the fact that the sequence is being generated incrementally. When a new element is added the complete sequence remains the same, except for the tail. Exploiting this property we can find the longest match very fast.

- Let a match size of occurrence of A be the size of a match ending in A.
- Suppose we know all the match sizes of occurrences of A.
- Suppose the log ends with A and that we are adding a B.

Can we use this information to find all match sizes of occurrences of B? Yes we can, and is very simple too. For all occurrences of B we have one of two cases:

- The B is
*not*preceded by A. - The B is preceded by A.

In the first case it is obvious that the match size is 1. For the second case things could be trickier, but since we now all match sizes of occurrences of A, then is not that hard.

The string is something like "...*A***B**...A", prior to the addition and it will become "...*A***B**...AB". Therefor is easy to see that the match size of **B** is the match size of *A*(we know this value) plus 1. This comes from extending the substring that ended in *A* to include the **B** as we know it will match the new tail.

What if we are adding an A instead of a B, does the algorithm work? The answer is yes as long as you traverse the occurrences of A in reverse order as they appeared.

Now the only problem is the supposition of having all match sizes of occurrences of A, but since we start with an empty log the initialization is trivial. After initialized the algorithm is self-sustaining, so the supposition gives no problems at all.

This results in an O(n) algorithm, than only checks all previous occurrences of the element being added. With clever data structures the algorithm can run very fast.

## Implementation

- Let alphabet be the set of all possible values in the sequence.
- Let histogram be an array of size |alphabet| of lists of Entry.

class Entry { int index; // Index into the sequence. int value; // Discrete value being matched. int match_size; // Match size of this occurrence. Entry getPreviousEntry(); // Method to obtain the Entry that came right before this one. } function SequentialPrediction(int next_element) { List<Entry> ocurrences = histogram[next_element]; int longest_match_size = 0; int longest_match_tail = 0; for each entry in ocurrences do { Entry previous = entry.getPreviousEntry(); // Find the match size if ( previous.value == last_value ) entry.match_size = previous.match_size + 1; else entry.match_size = 1; // Update longest match if ( entry.match_size > longest_match_size ) { longest_match_size = entry.match_size; longest_match_tail = entry.index; } } UpdateSequence(next_element); UpdateHistogram(next_element); last_value = next_element; if ( longest_match_size > 0 ) { Start prediction at Sequence[longest_match_tail + 1]; } }

When iterating trough the entries in the occurrences list, remember to go trough them from last to first. Or you can add new entries to the beginning of the list when updating the histogram.

## Multiple-Choice

Since the algorithm find all partial matches, instead of updating the index and size of the longest match we can use them all (or some of the best ones) for a multiple choice pattern matcher.

## Weighting

For the multiple choice algorithm, choices may be weighted by one or more of the following with minimal complexity added:

- Match size.
- Distance to sequence tail (very good for fast adapting dodgers).
- Similar weighting as found in Dynamic Clustering algorithms.