Difference between revisions of "BeepBoop/Understanding BeepBoop"

From Robowiki
Jump to navigation Jump to search
(some more details)
m
 
(5 intermediate revisions by the same user not shown)
Line 10: Line 10:
 
<li>Add a shadow for the bullet at ''t'' and the wave at ''t''. Then add 50%-weighted shadows for the bullet at ''t'' and wave at ''t-1'' and for the bullet at ''t-1'' and wave at ''t''.</li>
 
<li>Add a shadow for the bullet at ''t'' and the wave at ''t''. Then add 50%-weighted shadows for the bullet at ''t'' and wave at ''t-1'' and for the bullet at ''t-1'' and wave at ''t''.</li>
 
</ol>
 
</ol>
 +
 +
=== Active bullet shadowing ===
 +
BeepBoop is unique among top bots for picking firing angles that produce helpful bullet shadows to protect itself.
 +
* First, it produces a set of 20 or so candidate firing angles. These include the most likely angle to hit the enemy according to its aim model, angles that shield its current movement plan, and angles sampled at random from the distribution over guessfactors produced by its aim model.
 +
* Then, it generates the bullet shadows resulting from those angles and evaluates how much they (1) reduce the danger of its current move plan and (2) reduce the minimum reachable danger on the wave.
 +
* The final score for an angle combines the predicted probability of hitting the opponent with the danger reductions from (1)/(2).
 +
Against a strong opponent, around 40% of its shots will not use the most likely-to-hit angle because they create useful shadows.
  
 
=== Bullet power selection ===
 
=== Bullet power selection ===
Having good [[Energy Management]] can improve scores a lot. BeepBoop uses a bullet power selection algorithm that tries to predict the expected score at the round if the enemy keeps firing with their last bullet power and BeepBoop keeps firing with a candidate bullet power. I won't go into full detail about how it works for now, but a few things I've noticed are:
+
BeepBoop’s [[Energy Management]] is maybe my favorite component. Rather than using hand-tuned rules, BeepBoop selects the bullet power that maximizes the expected score at the end of the round. Estimating the expected score for a bullet power uses lots of assumptions and approximations. The biggest one is that both bots will keep using the same bullet power for the round. I think this works in practice because BeepBoop keeps “staying ahead” of the enemy’s current bullet power. To predict bullet damage, it assumes a constant damage rate of bullet_damage * hit_rate / gun_heat. To predict survival chance, it first uses Lagrange multipliers to solve for hit rates for itself and its opponent that would cause it to lose the round. Then, it estimates the probability of itself and its opponent achieving those hit rates over the round’s remaining shots using the normal distribution approximation to the binomial cdf. The method produces interesting emergent behavior, such as firing high-power bullets when losing in the hope of getting a lucky hit.
* Against most surfers, you generally lose energy over your opponent by shooting (i.e., the energy loss from shooting is not worth the damage and energy gain from hitting the opponent). This sometimes makes firing 0.1-powered bullets a good strategy for winning the round (not shooting at all is worse because you don't get bullet shadows), but of course, you won't get much bullet-damage score either.
 
* However, against a bot that is better than you, you should fire higher-power bullets and hope you get lucky. If you and your opponent both just fire 0.1-powered bullets, the opponent's higher hit rate is almost guaranteed to pay off over the many shots fired in the round. I suspect this is one reason why BeepBoop crushes [[DrussGT]], [[Diamond]], [[Firestarter]], etc. but doesn’t do as well against [[WaveSerpent]], [[Shadow]], and [[Ascendant]], which use higher bullet powers.
 
* [[DrussGT]] exclusively uses bullet powers that exploit a bug BasicSurfer to get easy 90%+ scores against some bots. However, this also means it never fires 0.1-power bullets, which I suspect hurts it against strong opponents. BeepBoop instead only uses anti-basic-surfer powers when it has a high hit rate.
 
  
 
=== Wall features ===
 
=== Wall features ===
Line 22: Line 26:
 
=== Anti-Surfer targeting ===
 
=== Anti-Surfer targeting ===
 
If you hit a wave surfer, they will try to avoid that angle in the future. BeepBoop’s anti-surfer gun accounts for this by using a “did-hit” feature. Waves that pass the enemy shortly before/after a bullet hit have did-hit set to 1, but when aiming did-hit is always set to 0. As a result, BeepBoop is less likely to use those waves when finding nearest neighbors.  
 
If you hit a wave surfer, they will try to avoid that angle in the future. BeepBoop’s anti-surfer gun accounts for this by using a “did-hit” feature. Waves that pass the enemy shortly before/after a bullet hit have did-hit set to 1, but when aiming did-hit is always set to 0. As a result, BeepBoop is less likely to use those waves when finding nearest neighbors.  
 +
 +
=== Path surfing ===
 +
 +
For [[Wave Surfing]], BeepBoop tries to find a good path, which is a list of 1s, -1s and 0s corresponding to driving forward, driving backward, and halting.
 +
Its heading is usually a deterministic function of its position and movement direction, but against [[Ramming_Movement|rambots]] it also considers turning left/right as different move options.
 +
First it considers paths containing all 1s/-1s/0s like in [[Wave_Surfing/True_Surfing|True Surfing]].
 +
Then, like in [[Wave_Surfing/GoTo_Surfing|GoTo Surfing]], it finds low-danger areas along the forward/backward paths and generates new paths to go there.
 +
The end velocity of generated paths is randomized, which means over time it will consider different options like halting and then going to a low-danger area vs. going to the low-danger and then halting (the end velocity can affect which areas of the next wave are reachable).
 +
It always reconsiders the best path found the previous tick, so it will stay on a good path once it is discovered.
 +
The search procedure over paths is basically A* search.
 +
It surfs the first two waves with [[Precise Prediction]] and [[Waves/Precise_Intersection|Precise_Intersection]] and a third wave with approximate danger estimation.
  
 
=== Crowd surfing ===
 
=== Crowd surfing ===
Inspired by [[Diamond]], BeepBoop combines multiple [[KNN]]-based danger systems used at different enemy hit rates. BeepBoop dynamically weights these systems so ones assigning high danger scores to where the enemy is firing have higher weights. It only uses bullet collisions (rather than bullet hits) to determine these weights to avoid the selection bias of BeepBoop moving to what it thinks are low-danger areas.  
+
Inspired by [[Diamond]], BeepBoop combines multiple [[KNN]]-based danger estimation systems used at different enemy hit rates. BeepBoop dynamically weights these systems so ones assigning high danger scores to where the enemy is firing have higher weights. It only uses bullet collisions (rather than bullet hits) to determine these weights to avoid the selection bias of BeepBoop moving to what it thinks are low-danger areas. The hit rate thresholds and base weights for the ensemble of danger estimators was learned offline. 
  
===  Simulated Enemy Targeting ===
+
===  Simulated enemy targeting ===
BeepBoop runs [[HOT]], [[LinearTargeting]], [[CircularTargeting]], Averaged Linear targeting, and Current-GF targeting against itself from the enemy’s perspective, so it can dodge these simple targeting strategies almost perfectly. These are treated as additional members of the “crowd” of danger computers used for surfing.  
+
BeepBoop runs [[HOT]], [[LinearTargeting]], [[CircularTargeting]], Averaged Linear targeting, and Current-GF targeting against itself from the enemy’s perspective, so it can dodge these simple targeting strategies almost perfectly. These are treated as additional members of the ensemble of danger estimators used for surfing.
 +
 
 +
=== Far-future gun heat waves ===
 +
[[Gun Heat Waves]] can give you a few extra ticks to dodge an incoming bullet. BeepBoop takes gun heat waves further against [[Ramming_Movement|rambots]] by creating gun heat waves ''during surf prediction'' up to 15 ticks in the future. As a result, BeepBoop can move in ways that will make future shots easier to dodge, for example it will sometimes do a little “wiggle” so circular targeters miss it.
  
 
=== Randomizing velocity ===
 
=== Randomizing velocity ===
Instead of using setMaxVelocity(8) and stopping with setAhead(0), BeepBoop uses setMaxVelocity(8 - epsilon) and setAhead(epsilon) where epsilon is a small random number selected every tick. The varying velocity (1) confuses some bots that exact match on velocity and (2) makes it look like BeepBoop is always accelerating or decelerating.  
+
Instead of using setMaxVelocity(8) and stopping with setAhead(0), BeepBoop uses setMaxVelocity(8 - epsilon) and setAhead(epsilon) where epsilon is a small random number selected every tick. The varying velocity (1) confuses some bots that exact-match on velocity and (2) makes it look like BeepBoop is always accelerating or decelerating.  
  
 
=== Ram bonus ===
 
=== Ram bonus ===
Bots get a slightly higher [[score]] from finishing off opponents by [[Ramming_Movement|ramming]] them, so [[BeepBoop]] rams disabled/low-energy enemies that have stopped shooting. I don't know if this helps APS much, but it is fun to watch!
+
Bots get a slightly higher [[score]] from finishing off opponents by [[Ramming_Movement|ramming]] them, so [[BeepBoop]] rams disabled/low-energy enemies that have stopped shooting. In some cases, it also tries to disable rather than destroy the opponent so it can ram them. I don't know if this helps APS much, but it is fun to watch!
  
 
=== Gradient-based learning of [[KNN]] models ===
 
=== Gradient-based learning of [[KNN]] models ===
Line 51: Line 69:
 
# Take a small gradient step minimizing the loss. In other words slightly adjust the w, a, and b parameters so the aim distribution puts more mass on where the enemy bot actually ended up.
 
# Take a small gradient step minimizing the loss. In other words slightly adjust the w, a, and b parameters so the aim distribution puts more mass on where the enemy bot actually ended up.
 
These updates don’t take into account that changing the embedding may change what the ''k'' nearest neighbors actually are, but in practice this didn’t seem to matter much.
 
These updates don’t take into account that changing the embedding may change what the ''k'' nearest neighbors actually are, but in practice this didn’t seem to matter much.
Overall, the method works great and gives interpretable results! When I trained a KNN model to dodge a bot without wall features (e.g., [[Grinnik]]), it put 0 weight on wall features, whereas it put a high weight on them when learning to dodge [[Thorn]]. The biggest surprise for me was that it didn’t put much weight on recency for the anti-surfer gun whereas most anti-surfing guns use only recent scans or high rolling averages
+
Overall, the method works great and gives interpretable results! When I trained a KNN model to dodge a bot without wall features (e.g., [[Grinnik]]), it put 0 weight on wall features, whereas it put a high weight on them when learning to dodge [[Thorn]]. The biggest surprise for me was that it didn’t put much weight on recency for the anti-surfer gun whereas most anti-surfing guns use only recent scans or high rolling averages.
 
 
==== Is this worth all the effort? ====
 
[[Diamond]] uses pretty simple KNN formulas and it is still super strong! I'd guess having perfectly-tuned KNN models is helping [[BeepBoop]] less than 0.5 APS points, and almost all of its high APS comes from getting all the other aspects of the bot right.
 

Latest revision as of 18:55, 11 January 2024

This page may eventually become a full explanation of BeepBoop along the lines of Understanding DrussGT, but for now it just contains some ideas I used in BeepBoop that I haven't seen in other bots. Hopefully, it will give you some inspiration for improving your own MegaBot!

Correct bullet shadowing

Bots I've looked at implement passive bullet shadowing something like this:

  1. Iterate through various (my_bullet, enemy_wave) pairs. These typically consist of a new enemy wave with all your current bullets or a new bullet with current enemy waves.
  2. For each pair, iterate through various timesteps t when the bullet might intersect with the wave.
  3. Add a shadow (if there is any) for the bullet at t and the wave at t, found by determining where the line segment (bullet_location_t, bullet_location_t+1) intersects the circles wave_t and wave_t+1.

However, step 3 isn't quite right. Robocode checks for bullet collisions by iterating through bullets in a random order, updating the line segment for the current bullet, and then checking if it intersects any of the other bullet line segments. Depending on this random order, a bullet can collide with another bullet's "previous" line segment. This means the correct way of doing step 3 is really.

  1. Add a shadow for the bullet at t and the wave at t. Then add 50%-weighted shadows for the bullet at t and wave at t-1 and for the bullet at t-1 and wave at t.

Active bullet shadowing

BeepBoop is unique among top bots for picking firing angles that produce helpful bullet shadows to protect itself.

  • First, it produces a set of 20 or so candidate firing angles. These include the most likely angle to hit the enemy according to its aim model, angles that shield its current movement plan, and angles sampled at random from the distribution over guessfactors produced by its aim model.
  • Then, it generates the bullet shadows resulting from those angles and evaluates how much they (1) reduce the danger of its current move plan and (2) reduce the minimum reachable danger on the wave.
  • The final score for an angle combines the predicted probability of hitting the opponent with the danger reductions from (1)/(2).

Against a strong opponent, around 40% of its shots will not use the most likely-to-hit angle because they create useful shadows.

Bullet power selection

BeepBoop’s Energy Management is maybe my favorite component. Rather than using hand-tuned rules, BeepBoop selects the bullet power that maximizes the expected score at the end of the round. Estimating the expected score for a bullet power uses lots of assumptions and approximations. The biggest one is that both bots will keep using the same bullet power for the round. I think this works in practice because BeepBoop keeps “staying ahead” of the enemy’s current bullet power. To predict bullet damage, it assumes a constant damage rate of bullet_damage * hit_rate / gun_heat. To predict survival chance, it first uses Lagrange multipliers to solve for hit rates for itself and its opponent that would cause it to lose the round. Then, it estimates the probability of itself and its opponent achieving those hit rates over the round’s remaining shots using the normal distribution approximation to the binomial cdf. The method produces interesting emergent behavior, such as firing high-power bullets when losing in the hope of getting a lucky hit.

Wall features

Wall features are really important for aiming. BeepBoop uses three: a standard “orbital” feature as seen in RaikoMicro, Thorn, etc., one which takes the ratio of Precise Position MAE and regular MAE to see how much the wall is reducing the MAE, and a stick wall feature which computes how much Wall Smoothing would change the enemy heading from orbital movement.

Anti-Surfer targeting

If you hit a wave surfer, they will try to avoid that angle in the future. BeepBoop’s anti-surfer gun accounts for this by using a “did-hit” feature. Waves that pass the enemy shortly before/after a bullet hit have did-hit set to 1, but when aiming did-hit is always set to 0. As a result, BeepBoop is less likely to use those waves when finding nearest neighbors.

Path surfing

For Wave Surfing, BeepBoop tries to find a good path, which is a list of 1s, -1s and 0s corresponding to driving forward, driving backward, and halting. Its heading is usually a deterministic function of its position and movement direction, but against rambots it also considers turning left/right as different move options. First it considers paths containing all 1s/-1s/0s like in True Surfing. Then, like in GoTo Surfing, it finds low-danger areas along the forward/backward paths and generates new paths to go there. The end velocity of generated paths is randomized, which means over time it will consider different options like halting and then going to a low-danger area vs. going to the low-danger and then halting (the end velocity can affect which areas of the next wave are reachable). It always reconsiders the best path found the previous tick, so it will stay on a good path once it is discovered. The search procedure over paths is basically A* search. It surfs the first two waves with Precise Prediction and Precise_Intersection and a third wave with approximate danger estimation.

Crowd surfing

Inspired by Diamond, BeepBoop combines multiple KNN-based danger estimation systems used at different enemy hit rates. BeepBoop dynamically weights these systems so ones assigning high danger scores to where the enemy is firing have higher weights. It only uses bullet collisions (rather than bullet hits) to determine these weights to avoid the selection bias of BeepBoop moving to what it thinks are low-danger areas. The hit rate thresholds and base weights for the ensemble of danger estimators was learned offline.

Simulated enemy targeting

BeepBoop runs HOT, LinearTargeting, CircularTargeting, Averaged Linear targeting, and Current-GF targeting against itself from the enemy’s perspective, so it can dodge these simple targeting strategies almost perfectly. These are treated as additional members of the ensemble of danger estimators used for surfing.

Far-future gun heat waves

Gun Heat Waves can give you a few extra ticks to dodge an incoming bullet. BeepBoop takes gun heat waves further against rambots by creating gun heat waves during surf prediction up to 15 ticks in the future. As a result, BeepBoop can move in ways that will make future shots easier to dodge, for example it will sometimes do a little “wiggle” so circular targeters miss it.

Randomizing velocity

Instead of using setMaxVelocity(8) and stopping with setAhead(0), BeepBoop uses setMaxVelocity(8 - epsilon) and setAhead(epsilon) where epsilon is a small random number selected every tick. The varying velocity (1) confuses some bots that exact-match on velocity and (2) makes it look like BeepBoop is always accelerating or decelerating.

Ram bonus

Bots get a slightly higher score from finishing off opponents by ramming them, so BeepBoop rams disabled/low-energy enemies that have stopped shooting. In some cases, it also tries to disable rather than destroy the opponent so it can ram them. I don't know if this helps APS much, but it is fun to watch!

Gradient-based learning of KNN models

Background

KNN/Dynamic Clustering methods map observations about a bot into an embedding space where nearest-neighbor lookups happen. For example, one data point might be [0.01 * distance, 0.1 * abs(velocity), 2 * wall_ahead, …]. Most KNN bots use hand-tuned “formulas” for these weights multiplying the features. However, it can work better to write out data from your bot using something like WaveSim and then learn the embedding function. DrussGT and (I think?) ScalarR use genetic algorithms for this. BeepBoop (and possibly Raven as well?) instead learn the embedding function offline using stochastic gradient descent.

How it works

BeepBoop embeds an observation [x1, …, xn] (where the xs are features such as distance, velocity, etc.) as [w1*(x1 + b1)^a1, …, wn*(xn + bn)^an] where the ws, as, and bs are learned parameters. Adding the powers instead of just having a linear transformation [w1*x1, …, w_n*x_n] makes the model a bit more expressive and seems to improve results a bit. I explored fancier formulas, such as embedding points as NN(x) where NN is a learned neural network, but it didn’t seem to work any better. Aiming is done in a pretty standard way:

  1. Given the current observation x, find the k nearest previous observations [y_1, …, y_n] under the current embedding function.
  2. Weight them according to distance(embed(x), embed(y_i)), with closer neighbors getting higher weight. BeepBoop uses the softmax function so the weights sum to 1.
  3. Apply (weighted) kernel density estimation over the guess factors of the neighbors to get a distribution of where the other bot is likely to be when the wave hits them.
  4. Point the gun towards the argmax GuessFactor of this distribution.

Changing the embedding function changes the weights in 2, which in turn changes the distribution in 3. The embedding function can be learned as follows:

  1. Create a target distribution of where the enemy bot actually ended up, basically a uniform distribution between the min and max GFs that would have hit the bot.
  2. Compute a loss function (I used cross-entropy) which measures the distance between the distribution produced by the kernel density estimation and the target distribution.
  3. Take a small gradient step minimizing the loss. In other words slightly adjust the w, a, and b parameters so the aim distribution puts more mass on where the enemy bot actually ended up.

These updates don’t take into account that changing the embedding may change what the k nearest neighbors actually are, but in practice this didn’t seem to matter much. Overall, the method works great and gives interpretable results! When I trained a KNN model to dodge a bot without wall features (e.g., Grinnik), it put 0 weight on wall features, whereas it put a high weight on them when learning to dodge Thorn. The biggest surprise for me was that it didn’t put much weight on recency for the anti-surfer gun whereas most anti-surfing guns use only recent scans or high rolling averages.