Rolling Average vs Gradient Descent with Softmax & Cross Entropy

Jump to navigation Jump to search

Rolling Average vs Gradient Descent with Softmax & Cross Entropy

Edited by author.
Last edit: 06:49, 27 July 2021

If each Guess Factor bin is considered an output unit before Softmax (logit), and loss is Cross Entropy, then the gradient of each logit is then:

qi - 1, if bin is hit
qi, otherwise

If gradient is not applied on logit as normal, but instead applied on qi itself, then:

qi := qi - eta * (qi - 1) = (1 - eta) * qi + eta * 1, if bin i hit
qi := qi - eta * qi = (1 - eta) * qi + eta * 0, otherwise

Which is essentially rolling average, where eta (learning rate) equals to the alpha (decay rate) in exponential moving average.

Xor (talk)05:49, 27 July 2021

You do not have permission to edit this page, for the following reasons:

  • The action you have requested is limited to users in the group: Users.
  • You must confirm your email address before editing pages. Please set and validate your email address through your user preferences.

You can view and copy the source of this page.

Return to Thread:Talk:Rolling Averages/Rolling Average vs Softmax & Cross Entropy/reply.