http://robowiki.net/w/index.php?title=Thread:Talk:BeepBoop/Understanding_BeepBoop/Property_of_gradient_of_cross-entropy_loss_with_kernel_density_estimation/reply&feed=atom&action=historyThread:Talk:BeepBoop/Understanding BeepBoop/Property of gradient of cross-entropy loss with kernel density estimation/reply - Revision history2022-08-19T13:54:38ZRevision history for this page on the wikiMediaWiki 1.34.1http://robowiki.net/w/index.php?title=Thread:Talk:BeepBoop/Understanding_BeepBoop/Property_of_gradient_of_cross-entropy_loss_with_kernel_density_estimation/reply&diff=57080&oldid=prevKev: Reply to Property of gradient of cross-entropy loss with kernel density estimation2022-02-10T16:10:17Z<p>Reply to <a href="/wiki/Thread:Talk:BeepBoop/Understanding_BeepBoop/Property_of_gradient_of_cross-entropy_loss_with_kernel_density_estimation" title="Thread:Talk:BeepBoop/Understanding BeepBoop/Property of gradient of cross-entropy loss with kernel density estimation">Property of gradient of cross-entropy loss with kernel density estimation</a></p>
<p><b>New page</b></p><div>Interesting observations! The scale invariance of K actually seems like a good property to me. It means that K doesn't really need to be normalized, or more precisely that multiplying K by a constant multiplies the gradient by that constant, which seems like the behavior you'd want. Most loss functions (e.g., mean squared error) learn more for far data points than close ones. That might be good for surfing, but I could imagine that you may want the opposite for targeting where you "give up" on hard data points and focus on the ones that you might score a hit on. So perhaps BeepBoop's loss is a middle ground that works decently well for both.</div>Kev