patternMinor

Proving Monotonicity of Softmax Layer

Submitted by: @import:stackexchange-cs·Mar 10, 2026·

Viewed 0 times

artificial-intelligence neural-networks cs stackoverflow

provinglayersoftmaxmonotonicity

Problem

In the book here: http://neuralnetworksanddeeplearning.com/chap3.html

If you scroll down to Exercise 2 in the Softmax Section, it says

Show that $\partial a^L_{j}/\partial z^L_{k}$ is positive if $j=k$ and negative if $j \neq k$. As a consequence, increasing $z^L_j$ is guaranteed to increase the corresponding output activation, $a^L_j$, and will decrease all the other output activations.

Here,
$$a_j = \frac{e^{z^L_{j}}}{\sum_{k}{e^{z^L_{k}}}}$$

I managed to prove the part when $j \neq k$ by differentiating as normal to get
$$-\frac{e^{z^L_{j}}}{\left(\sum_{k}{e^{z^L_{k}}}\right)^2}$$

which is obviously always negative. However I'm having trouble with when $j=k$.
When I differentiated I got an inequality which simplified to proving
$$\sum_{k}{e^{z^L_k}}>1$$
I am unsure of how to do this.

Solution

Let's remove the $L$ superscripts. The derivative with respect to $z_L$ is
$$
\frac{\partial a_j}{\partial z_j} =
\frac{e^{z_j} \sum_k e^{z_k} - e^{z_j} e^{z_j}}{\left(\sum_k e^{z_k}\right)^2} =
\frac{e^{z_j} \sum_{k \neq j} e^{z_k}}{\left(\sum_k e^{z_k}\right)^2} > 0.
$$

Context

StackExchange Computer Science Q#97812, answer score: 2

Revisions (0)

No revisions yet.