# Difference between revisions of "Thread:Talk:BeepBoop/Understanding BeepBoop/Property of gradient of cross-entropy loss with kernel density estimation/reply (4)"

Jump to navigation
Jump to search

m |
m |
||

Line 1: | Line 1: | ||

− | One more finding. Actually you don't need to take integral or use bins at all, you can compute the loss from each data point separately and take the sum of the loss. Although the value in loss isn't equal, the gradients are exactly the same. This yields one more insight: the absolute predicted value isn't important at all, all that matters is how relatively they are close to the target distribution, compared to each other. As a result, the cluster used for one prediction isn't necessarily in the same batch, they can be shuffled entirely, yet doesn't affect the result (theoretically). | + | <s>One more finding. Actually you don't need to take integral or use bins at all, you can compute the loss from each data point separately and take the sum of the loss. Although the value in loss isn't equal, the gradients are exactly the same. This yields one more insight: the absolute predicted value isn't important at all, all that matters is how relatively they are close to the target distribution, compared to each other. As a result, the cluster used for one prediction isn't necessarily in the same batch, they can be shuffled entirely, yet doesn't affect the result (theoretically).</s> |

Oops the calc is wrong. | Oops the calc is wrong. |

## Latest revision as of 17:26, 13 March 2022

~~One more finding. Actually you don't need to take integral or use bins at all, you can compute the loss from each data point separately and take the sum of the loss. Although the value in loss isn't equal, the gradients are exactly the same. This yields one more insight: the absolute predicted value isn't important at all, all that matters is how relatively they are close to the target distribution, compared to each other. As a result, the cluster used for one prediction isn't necessarily in the same batch, they can be shuffled entirely, yet doesn't affect the result (theoretically).~~

Oops the calc is wrong.