BeepBoop seems to be the new king

Fragment of a discussion from User talk:Kev
Jump to navigation Jump to search

Congratulations (again) from me too ;) BeepBoop since 1.2 had very surprising results (nearly 95!!!). And yet nothing worked when I tried to use gradient descent in training models. Would you mind to share a little bit more about this section? E.g. initialization, learning rate, how to prevent getting zero or negative exponent in x^a formula…

Xor (talk)03:07, 26 December 2022

I’ve been meaning to release the code for the training, but it’s currently a huge mess and I’m pretty busy! In the meantime, here are some details that might help:

  • I initialized the powers to 1, biases to 0, and multipliers to a simple hand-made KNN formula.
  • I constrained the powers to be positive, so I guess the formula should really be written as w(x+b)^abs(a).
  • I used Adam with a learning rate 1e-3 for optimization.
  • Changing the KNN formula of course changes the nearest neighbors, so I alternated between training for a couple thousand steps and rebuilding the tree and making new examples.
  • For simplicity/efficiency, I used binning to build a histogram over GFs for an observation. Simply normalizing the histogram so it sums to 1 to get an output distribution doesn’t work that well (for one thing, it can produce very low probabilities if the kernel width is small). Instead, I used the output distribution softmax(t * log(histogram + abs(b))) where t and b are learned parameters initialized to 1 and 1e-4.
--Kev (talk)17:10, 3 January 2023

Thanks for the detailed explanation! It is not easy to get so many details right, which explained how mighty BeepBoop is, not to mention the innovations.

Xor (talk)05:57, 4 January 2023