Rolling Average vs Softmax & Cross Entropy

Revision as of 27 July 2021 at 03:49.
This is the thread's initial revision.

If each Guess Factor bin is considered an output unit before Softmax (logit), and loss is Cross Entropy, then the gradient of each logit is then:

q_i - 1, if bin is hit

q_i, otherwise

If gradient is not applied on logit as normal, but instead applied on q_i itself, then:

q_i := q_i - eta * (q_i - 1) = (1 - eta) * q_i + eta * 1, if bin i hit

q_i := q_i - eta * q_i = (1 - eta) * q_i + eta * 0, otherwise

Which is essentially rolling average, where eta (learning rate) equals to the alpha (decay rate) in exponential moving average.

Navigation menu