Difference between revisions of "Gaff/Targeting"

From Robowiki
Jump to navigation Jump to search
m (change the heading a bit, look more structured.)
m (add category)
 
(3 intermediate revisions by 3 users not shown)
Line 1: Line 1:
== Overview ==
 
 
 
[[Gaff]] uses a fairly novel neural network (NN) system for targeting, with very strong results against wave surfers.  This system consists of two multi-layer perceptron networks which use back-propagation for training.  Each network uses the same input data (eg. bullet flight time, lateral velocity, ticks since lateral reversal etc.) but the two are trained differently.  Each network also has multiple outputs (61 in the current version) which represent different [[Guess Factor | guess factors]].  The higher the output value, the more likely Gaff is to hit using that guess factor.  The results from both networks are summed together and the best guess factor is used for aiming.  Gaff performs this calculation every tick and fires once the gunheat is zero and the current aim is within tolerance.
 
[[Gaff]] uses a fairly novel neural network (NN) system for targeting, with very strong results against wave surfers.  This system consists of two multi-layer perceptron networks which use back-propagation for training.  Each network uses the same input data (eg. bullet flight time, lateral velocity, ticks since lateral reversal etc.) but the two are trained differently.  Each network also has multiple outputs (61 in the current version) which represent different [[Guess Factor | guess factors]].  The higher the output value, the more likely Gaff is to hit using that guess factor.  The results from both networks are summed together and the best guess factor is used for aiming.  Gaff performs this calculation every tick and fires once the gunheat is zero and the current aim is within tolerance.
  
Line 41: Line 39:
 
This data set was inspired by [[User:Skilgannon|Skilgannon]]'s excellent random mover targeting in [[DrussGT]].
 
This data set was inspired by [[User:Skilgannon|Skilgannon]]'s excellent random mover targeting in [[DrussGT]].
  
== Training Scheme ==
+
== Training Schemes ==
  
=== Schema #1 ===
+
=== Network #1 ===
  
 
The two networks differ only in the way they are trained.  The first training scheme attempts to learn heavily from the enemy's most recent movements and actually encourages "catastrophic interference" in the network (where old relationships are "overwritten" by new data).  This scheme works very well against wave surfers.
 
The two networks differ only in the way they are trained.  The first training scheme attempts to learn heavily from the enemy's most recent movements and actually encourages "catastrophic interference" in the network (where old relationships are "overwritten" by new data).  This scheme works very well against wave surfers.
Line 49: Line 47:
 
Every tick Gaff sends out a targeting wave tagged with the enemy's latest input data set.  When the wave reaches the enemy's location, Gaff will retrain the network using back-propagation to learn the correct GF for that set of inputs.  Training is done incrementally, one wave at a time.  Since the network has distinct outputs for 61 different GFs, Gaff uses a RBF to set the "correct" values of the outputs.  The RBF uses the enemy's effective bot width at that distance.  This results in smoother learning than if the correct outputs were set to 1 and the others were switched off.
 
Every tick Gaff sends out a targeting wave tagged with the enemy's latest input data set.  When the wave reaches the enemy's location, Gaff will retrain the network using back-propagation to learn the correct GF for that set of inputs.  Training is done incrementally, one wave at a time.  Since the network has distinct outputs for 61 different GFs, Gaff uses a RBF to set the "correct" values of the outputs.  The RBF uses the enemy's effective bot width at that distance.  This results in smoother learning than if the correct outputs were set to 1 and the others were switched off.
  
Non-firing waves are only trained once, using a higher learning rate (4x the firing waves).  Firing waves are stored in a 5-waves FIFO buffer and are retrained ''every time'' any wave hits.  Waves that already give the correct solution are not retrained.
+
Non-firing waves are only trained once, using a higher learning rate (4x the firing waves).  Firing waves are stored in a 5-wave FIFO buffer and are retrained ''every time'' any wave hits.  Waves that already give the correct solution are not retrained.
  
  
=== Schema #2 ===
+
=== Network #2 ===
  
 
This network is tuned to hit random movers and uses the same basic setup as above (waves every tick, back-propagation using RBF of correct GF).  But this one keeps a buffer of 200 waves in memory.  Every time a wave hits (firing or non-firing) it is added to the buffer.  Once the buffer is full, one wave will be removed at random to make room for the new one.
 
This network is tuned to hit random movers and uses the same basic setup as above (waves every tick, back-propagation using RBF of correct GF).  But this one keeps a buffer of 200 waves in memory.  Every time a wave hits (firing or non-firing) it is added to the buffer.  Once the buffer is full, one wave will be removed at random to make room for the new one.
Line 82: Line 80:
 
debug.gunnery = 1
 
debug.gunnery = 1
 
</pre>
 
</pre>
 +
 +
[[Category:Targeting Implementations]]

Latest revision as of 15:41, 1 October 2009

Gaff uses a fairly novel neural network (NN) system for targeting, with very strong results against wave surfers. This system consists of two multi-layer perceptron networks which use back-propagation for training. Each network uses the same input data (eg. bullet flight time, lateral velocity, ticks since lateral reversal etc.) but the two are trained differently. Each network also has multiple outputs (61 in the current version) which represent different guess factors. The higher the output value, the more likely Gaff is to hit using that guess factor. The results from both networks are summed together and the best guess factor is used for aiming. Gaff performs this calculation every tick and fires once the gunheat is zero and the current aim is within tolerance.

Many other bots have used neural networks for targeting but their implementation is quite different than what's explained below. Engineer is probably the most similar, but it uses a Self-Organizing Map (SOM) rather than the standard multi-layer network that Gaff uses.

Neural Networks

I won't go into detail on how neural networks work or how to build one -- there are plenty of other resources on the web for that. But basically a neural network can be used as a non-linear function approximator. Gaff's two networks try to approximate hit probabilities: given an enemy's current state, what guess factors are most likely to hit. This is the same idea as using hit statistics, but is more organic. If done right, the bot should be able to generalize and be able to guess at good firing solutions even for enemy states it has never seen before.

Each network network is a simple multi-layer perceptron network. There are no hidden layers since I found that they only slowed down learning without adding any significant improvement. There are 61 outputs in each network, each representing a single guess factor between -1 and +1. The output functions are sigmoidal, so each output value will always lie between -1 and +1 (sigmoid outputs can improve training and add non-linearity to the network). I originally started with a single output representing the actual guess factor, but performance improved dramatically once the outputs were split up into probabilities instead -- NNs are very useful for classification problems.

The inputs are the tricky part. If you read up on NNs, you'll find that inputs should be normalized either from 0 to 1 or -1 to +1. Some sources recommend doing the scaling by setting the average value to 0 and -1/+1 to be one standard deviation away. Using the min and max can also work. I've used both in earlier versions of Gaff with good results, but versions 1.42 and later take a different approach.

Gaff currently uses Radial Basis Functions (RBFs) to split up input data into "features": the closer the input value is to the feature value, the nearer the input is to 1; further away, it tails off to 0. For example, one feature might be lateral velocity = 0. I would also have lateral velocity features at other values, say every 2 units. By using features instead of raw data, the network can learn faster since similar situations will generate similar features, plus the RBF adds more non-linearity which is always good.

Gaff 1.42 used the following input data:

Data Range Features
Bullet flight time (BFT) 0 - 105 11
Lateral velocity 0 - 8 11
Lateral acceleration -2 - +2 11
Approach velocity -8 - +8 9
Dist. traveled last 10 ticks 0 - 65 6
Ticks since velocity change 0 - BFT 7
Ticks since direction change 0 - BFT 7
Forward radians to wall 0 - 1.5 7
Reverse radians to wall 0 - 1.0 4
Current guess factor -1 - +1 11

This data set was inspired by Skilgannon's excellent random mover targeting in DrussGT.

Training Schemes

Network #1

The two networks differ only in the way they are trained. The first training scheme attempts to learn heavily from the enemy's most recent movements and actually encourages "catastrophic interference" in the network (where old relationships are "overwritten" by new data). This scheme works very well against wave surfers.

Every tick Gaff sends out a targeting wave tagged with the enemy's latest input data set. When the wave reaches the enemy's location, Gaff will retrain the network using back-propagation to learn the correct GF for that set of inputs. Training is done incrementally, one wave at a time. Since the network has distinct outputs for 61 different GFs, Gaff uses a RBF to set the "correct" values of the outputs. The RBF uses the enemy's effective bot width at that distance. This results in smoother learning than if the correct outputs were set to 1 and the others were switched off.

Non-firing waves are only trained once, using a higher learning rate (4x the firing waves). Firing waves are stored in a 5-wave FIFO buffer and are retrained every time any wave hits. Waves that already give the correct solution are not retrained.


Network #2

This network is tuned to hit random movers and uses the same basic setup as above (waves every tick, back-propagation using RBF of correct GF). But this one keeps a buffer of 200 waves in memory. Every time a wave hits (firing or non-firing) it is added to the buffer. Once the buffer is full, one wave will be removed at random to make room for the new one.

Every time a wave hits, five waves are selected from the buffer and are retrained. If the last wave was a firing wave, it automatically gets included as one of the five; the rest are chosen at random. There is no difference in learning rates between firing and non-firing waves.


Putting it all together

The training is the complicated part. When it comes time to aim, Gaff calculates the outputs of both networks and then sums together the corresponding outputs. Gaff will also make a decent approximation of the angles reachable by the enemy (taking into account walls) and then looks within this range for the largest sum. That gives the firing GF and the gun is moved to that angle.

Simple, right? It took a long time to figure this out -- I had two great networks but could never figure out how to choose the best one. This method magically solves that problem.


Additional Notes

If you want to try and implement something similar, go right ahead! However, NNs can be difficult to debug since it's not at all obvious what they're doing. I'd strongly suggest starting really, really simple. I spent a lot of time perfecting the basic NN classes, including doing back-propagation calculations by hand and then comparing them to the actual results using unit tests. It's tedious work but a single bug in your network or training system can really screw things up and may go undetected for a long, long time.

Some key stats:

  • development started in June 2008 (or possibly earlier)
  • the described design is used in Gaff 1.34 through 1.42, and dates from May 2009
    • it's also been re-used in Holden and Pris until I come up with something more unique for them.
  • 333 iterations of this design were tested in various challenges, typically 5-15 seasons each
    • the first ~75 tested using shell scripts; I had to record the scores by hand
    • then I discovered RoboResearch which is the only reason I've kept my sanity

To see the network outputs in action, add the following line to Gaff's properties file (Gaff.data/Gaff.properties) and enable painting:

debug.gunnery = 1