View source for Talk:BeepBoop/Understanding BeepBoop
- [View source↑]
- [History↑]
Contents
| Thread title | Replies | Last modified |
|---|---|---|
| BeepBoop vs Yatagan | 3 | 13:36, 16 August 2025 |
| Property of gradient of cross-entropy loss with kernel density estimation | 3 | 16:26, 13 March 2022 |
I've noticed that versus Yatagan BeepBoop never moves. The bullet shielding is effective enough that BeepBoop wins comfortably but I'm curious whether this behaviour is a bug or not. So far I've not seen any other bots that BeepBoop responds to in this way.
A quick way to find these bots is to filter bots BeepBoop win at near 100% APS while they have quite high APS in average. Another quick way is to look at positive KNNPBI bots of BeepBoop.
Also BeepBoop is open source, so you can have a look at the white list of shieldable bots ;)
You do not have permission to edit this page, for the following reasons:
You can view and copy the source of this page.
Interesting observations! The scale invariance of K actually seems like a good property to me. It means that K doesn't really need to be normalized, or more precisely that multiplying K by a constant multiplies the gradient by that constant, which seems like the behavior you'd want. Most loss functions (e.g., mean squared error) learn more for far data points than close ones. That might be good for surfing, but I could imagine that you may want the opposite for targeting where you "give up" on hard data points and focus on the ones that you might score a hit on. So perhaps BeepBoop's loss is a middle ground that works decently well for both.
The thoughts on surfing & targeting is quite inspiring. And even if no data points are near within K size (hard case), that case is still valuable, since there may exist some data point just outside of the K size. And repeating the training process with new weight iteratively may eventually turn that case into an easy case ;) Are you doing something similar as well?
One more finding. Actually you don't need to take integral or use bins at all, you can compute the loss from each data point separately and take the sum of the loss. Although the value in loss isn't equal, the gradients are exactly the same. This yields one more insight: the absolute predicted value isn't important at all, all that matters is how relatively they are close to the target distribution, compared to each other. As a result, the cluster used for one prediction isn't necessarily in the same batch, they can be shuffled entirely, yet doesn't affect the result (theoretically).
Oops the calc is wrong.