Thread history

Fragment of a discussion from Talk:BeepBoop/Understanding BeepBoop
Viewing a history listing
Jump to navigation Jump to search
Time User Activity Comment
No results

One more finding. Actually you don't need to take integral or use bins at all, you can compute the loss from each data point separately and take the sum of the loss. Although the value in loss isn't equal, the gradients are exactly the same. This yields one more insight: the absolute predicted value isn't important at all, all that matters is how relatively they are close to the target distribution, compared to each other. As a result, the cluster used for one prediction isn't necessarily in the same batch, they can be shuffled entirely, yet doesn't affect the result (theoretically).

Oops the calc is wrong.

Xor (talk)15:21, 13 March 2022