Hacker News new | past | comments | ask | show | jobs | submit login

To bring things full circle: the cross-entropy loss is the KL divergence. So intuitively, when you're minimizing cross-entropy loss, you're trying to minimize the "divergence" between the true distribution and your model distribution.

This intuition really helped me understand CE loss.




Cross-entropy is not the KL divergence. There is an additional term in cross-entropy which is the entropy of the data distribution (i.e., independent of the model). So, you're right in that minimizing one is equivalent to minimizing the other.

https://stats.stackexchange.com/questions/357963/what-is-the...


Yes, you are totally correct, but I believe this term is omitted from the cross-entropy loss function that is used in machine learning? Because it is a constant which does not contribute to the optimization.

Please correct me if I'm wrong.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: