Derivative of softmax

Disclaimer: Copied from this, linked from here


 * $\varphi \paren {a _j} = \dfrac {e ^{a _j}} {\sum _k e ^{a _k}} $

Quotient Rule:


 * $ \dfrac{\partial{\varphi \paren {a _j}}}{\partial{a _j}} = \dfrac{ \paren{\frac{\partial}{\partial a _j} e ^{a _j}} \paren{\sum _k e ^{a _k}} - e ^{a _j} \paren{ \frac{\partial}{\partial a _j} \paren {\sum _k e ^{a _k} } } }{ \paren{\sum _k e ^{a _k}} ^2 } $

Exponential Rule:


 * $ \dfrac{\partial{\varphi \paren {a _j}}}{\partial{a _j}} = \dfrac{ e ^{a _j} \paren{\sum _k e ^{a _k}} - e ^{a _j} e ^{a _j} }{ \paren{\sum _k e ^{a _k}} ^2 } = \dfrac{ e ^{a _j} }{ \sum _k e ^{a _k} } - \paren{ \dfrac{ e ^{a _j} }{ \sum _k e ^{a _k}} } ^2$


 * $ \dfrac{\partial{\varphi \paren {a _j}}}{\partial{a _j}} = \varphi \paren {a _j} - \varphi \paren {a _j} ^2 = \varphi \paren {a _j} \paren{1 - \varphi \paren {a _j}}$