Thanks a lot Tivadar Danka for this article.

Softmax shouldn't be regarded as a probability distribution anymore.

My guess is that it was initially inspired by statistical physics but without considering the energy behaviour that allows effective differentiation.

A recent parper apply a new kind of softmax using energy based functions (=more natural) and have very good results, but in out-of-distribution context. Do you think we should replace Softmax by this new energy-based function?

https://arxiv.org/pdf/2010.03759.pdf

--

--