Apr 20, 2021
Thanks a lot Tivadar Danka for this article.
Softmax shouldn't be regarded as a probability distribution anymore.
My guess is that it was initially inspired by statistical physics but without considering the energy behaviour that allows effective differentiation.
A recent parper apply a new kind of softmax using energy based functions (=more natural) and have very good results, but in out-of-distribution context. Do you think we should replace Softmax by this new energy-based function?