Rolling Average vs Softmax & Cross Entropy

Jump to navigation Jump to search
Revision as of 27 July 2021 at 03:49.
This is the thread's initial revision.

Rolling Average vs Softmax & Cross Entropy

If each Guess Factor bin is considered an output unit before Softmax (logit), and loss is Cross Entropy, then the gradient of each logit is then:

qi - 1, if bin is hit
qi, otherwise

If gradient is not applied on logit as normal, but instead applied on qi itself, then:

qi := qi - eta * (qi - 1) = (1 - eta) * qi + eta * 1, if bin i hit
qi := qi - eta * qi = (1 - eta) * qi + eta * 0, otherwise

Which is essentially rolling average, where eta (learning rate) equals to the alpha (decay rate) in exponential moving average.

    Xor (talk)05:49, 27 July 2021