BeepBoop seems to be the new king

Congratulations (again) from me too ;) BeepBoop since 1.2 had very surprising results (nearly 95!!!). And yet nothing worked when I tried to use gradient descent in training models. Would you mind to share a little bit more about this section? E.g. initialization, learning rate, how to prevent getting zero or negative exponent in x^a formula…

Xor (talk)‎

I’ve been meaning to release the code for the training, but it’s currently a huge mess and I’m pretty busy! In the meantime, here are some details that might help:

I initialized the powers to 1, biases to 0, and multipliers to a simple hand-made KNN formula.
I constrained the powers to be positive, so I guess the formula should really be written as w(x+b)^abs(a).
I used Adam with a learning rate 1e-3 for optimization.
Changing the KNN formula of course changes the nearest neighbors, so I alternated between training for a couple thousand steps and rebuilding the tree and making new examples.
For simplicity/efficiency, I used binning to build a histogram over GFs for an observation. Simply normalizing the histogram so it sums to 1 to get an output distribution doesn’t work that well (for one thing, it can produce very low probabilities if the kernel width is small). Instead, I used the output distribution softmax(t * log(histogram + abs(b))) where t and b are learned parameters initialized to 1 and 1e-4.

--Kev (talk)‎

Thanks for the detailed explanation! It is not easy to get so many details right, which explained how mighty BeepBoop is, not to mention the innovations.

Xor (talk)‎

BeepBoop seems to be the new king

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools

In other languages