BeepBoop/Understanding BeepBoop

From Robowiki
< BeepBoop
Revision as of 02:23, 8 June 2021 by Kev (talk | contribs) (some more details)
Jump to navigation Jump to search

This page may eventually become a full explanation of BeepBoop along the lines of Understanding DrussGT, but for now it just contains some ideas I used in BeepBoop that I haven't seen in other bots. Hopefully, it will give you some inspiration for improving your own MegaBot!

Correct bullet shadowing

Bots I've looked at implement passive bullet shadowing something like this:

  1. Iterate through various (my_bullet, enemy_wave) pairs. These typically consist of a new enemy wave with all your current bullets or a new bullet with current enemy waves.
  2. For each pair, iterate through various timesteps t when the bullet might intersect with the wave.
  3. Add a shadow (if there is any) for the bullet at t and the wave at t, found by determining where the line segment (bullet_location_t, bullet_location_t+1) intersects the circles wave_t and wave_t+1.

However, step 3 isn't quite right. Robocode checks for bullet collisions by iterating through bullets in a random order, updating the line segment for the current bullet, and then checking if it intersects any of the other bullet line segments. Depending on this random order, a bullet can collide with another bullet's "previous" line segment. This means the correct way of doing step 3 is really.

  1. Add a shadow for the bullet at t and the wave at t. Then add 50%-weighted shadows for the bullet at t and wave at t-1 and for the bullet at t-1 and wave at t.

Bullet power selection

Having good Energy Management can improve scores a lot. BeepBoop uses a bullet power selection algorithm that tries to predict the expected score at the round if the enemy keeps firing with their last bullet power and BeepBoop keeps firing with a candidate bullet power. I won't go into full detail about how it works for now, but a few things I've noticed are:

  • Against most surfers, you generally lose energy over your opponent by shooting (i.e., the energy loss from shooting is not worth the damage and energy gain from hitting the opponent). This sometimes makes firing 0.1-powered bullets a good strategy for winning the round (not shooting at all is worse because you don't get bullet shadows), but of course, you won't get much bullet-damage score either.
  • However, against a bot that is better than you, you should fire higher-power bullets and hope you get lucky. If you and your opponent both just fire 0.1-powered bullets, the opponent's higher hit rate is almost guaranteed to pay off over the many shots fired in the round. I suspect this is one reason why BeepBoop crushes DrussGT, Diamond, Firestarter, etc. but doesn’t do as well against WaveSerpent, Shadow, and Ascendant, which use higher bullet powers.
  • DrussGT exclusively uses bullet powers that exploit a bug BasicSurfer to get easy 90%+ scores against some bots. However, this also means it never fires 0.1-power bullets, which I suspect hurts it against strong opponents. BeepBoop instead only uses anti-basic-surfer powers when it has a high hit rate.

Wall features

Wall features are really important for aiming. BeepBoop uses three: a standard “orbital” feature as seen in RaikoMicro, Thorn, etc., one which takes the ratio of Precise Position MAE and regular MAE to see how much the wall is reducing the MAE, and a stick wall feature which computes how much Wall Smoothing would change the enemy heading from orbital movement.

Anti-Surfer targeting

If you hit a wave surfer, they will try to avoid that angle in the future. BeepBoop’s anti-surfer gun accounts for this by using a “did-hit” feature. Waves that pass the enemy shortly before/after a bullet hit have did-hit set to 1, but when aiming did-hit is always set to 0. As a result, BeepBoop is less likely to use those waves when finding nearest neighbors.

Crowd surfing

Inspired by Diamond, BeepBoop combines multiple KNN-based danger systems used at different enemy hit rates. BeepBoop dynamically weights these systems so ones assigning high danger scores to where the enemy is firing have higher weights. It only uses bullet collisions (rather than bullet hits) to determine these weights to avoid the selection bias of BeepBoop moving to what it thinks are low-danger areas.

Simulated Enemy Targeting

BeepBoop runs HOT, LinearTargeting, CircularTargeting, Averaged Linear targeting, and Current-GF targeting against itself from the enemy’s perspective, so it can dodge these simple targeting strategies almost perfectly. These are treated as additional members of the “crowd” of danger computers used for surfing.

Randomizing velocity

Instead of using setMaxVelocity(8) and stopping with setAhead(0), BeepBoop uses setMaxVelocity(8 - epsilon) and setAhead(epsilon) where epsilon is a small random number selected every tick. The varying velocity (1) confuses some bots that exact match on velocity and (2) makes it look like BeepBoop is always accelerating or decelerating.

Ram bonus

Bots get a slightly higher score from finishing off opponents by ramming them, so BeepBoop rams disabled/low-energy enemies that have stopped shooting. I don't know if this helps APS much, but it is fun to watch!

Gradient-based learning of KNN models

Background

KNN/Dynamic Clustering methods map observations about a bot into an embedding space where nearest-neighbor lookups happen. For example, one data point might be [0.01 * distance, 0.1 * abs(velocity), 2 * wall_ahead, …]. Most KNN bots use hand-tuned “formulas” for these weights multiplying the features. However, it can work better to write out data from your bot using something like WaveSim and then learn the embedding function. DrussGT and (I think?) ScalarR use genetic algorithms for this. BeepBoop (and possibly Raven as well?) instead learn the embedding function offline using stochastic gradient descent.

How it works

BeepBoop embeds an observation [x1, …, xn] (where the xs are features such as distance, velocity, etc.) as [w1*(x1 + b1)^a1, …, wn*(xn + bn)^an] where the ws, as, and bs are learned parameters. Adding the powers instead of just having a linear transformation [w1*x1, …, w_n*x_n] makes the model a bit more expressive and seems to improve results a bit. I explored fancier formulas, such as embedding points as NN(x) where NN is a learned neural network, but it didn’t seem to work any better. Aiming is done in a pretty standard way:

  1. Given the current observation x, find the k nearest previous observations [y_1, …, y_n] under the current embedding function.
  2. Weight them according to distance(embed(x), embed(y_i)), with closer neighbors getting higher weight. BeepBoop uses the softmax function so the weights sum to 1.
  3. Apply (weighted) kernel density estimation over the guess factors of the neighbors to get a distribution of where the other bot is likely to be when the wave hits them.
  4. Point the gun towards the argmax GuessFactor of this distribution.

Changing the embedding function changes the weights in 2, which in turn changes the distribution in 3. The embedding function can be learned as follows:

  1. Create a target distribution of where the enemy bot actually ended up, basically a uniform distribution between the min and max GFs that would have hit the bot.
  2. Compute a loss function (I used cross-entropy) which measures the distance between the distribution produced by the kernel density estimation and the target distribution.
  3. Take a small gradient step minimizing the loss. In other words slightly adjust the w, a, and b parameters so the aim distribution puts more mass on where the enemy bot actually ended up.

These updates don’t take into account that changing the embedding may change what the k nearest neighbors actually are, but in practice this didn’t seem to matter much. Overall, the method works great and gives interpretable results! When I trained a KNN model to dodge a bot without wall features (e.g., Grinnik), it put 0 weight on wall features, whereas it put a high weight on them when learning to dodge Thorn. The biggest surprise for me was that it didn’t put much weight on recency for the anti-surfer gun whereas most anti-surfing guns use only recent scans or high rolling averages.

Is this worth all the effort?

Diamond uses pretty simple KNN formulas and it is still super strong! I'd guess having perfectly-tuned KNN models is helping BeepBoop less than 0.5 APS points, and almost all of its high APS comes from getting all the other aspects of the bot right.