patternpythonMinor

Different neural network activation functions and gradient descent

Submitted by: @import:stackexchange-codereview·Mar 10, 2026·

Viewed 0 times

gradientneuraldifferentandfunctionsactivationnetworkdescent

Problem

I've implemented a bunch of activation functions for neural networks, and I just want have validation that they work correctly mathematically. I implemented sigmoid, tanh, relu, arctan, step function, squash, and gaussian and I use their implicit derivative (in terms of the output) for backpropagation.

```
import numpy as np

def sigmoid(x, derivative=False):
if (derivative == True):
return x * (1 - x)
return 1 / (1 + np.exp(-x))

def tanh(x, derivative=False):
if (derivative == True):
return (1 - (x ** 2))
return np.tanh(x)

def relu(x, derivative=False):
if (derivative == True):
for i in range(0, len(x)):
for k in range(len(x[i])):
if x[i][k] > 0:
x[i][k] = 1
else:
x[i][k] = 0
return x
for i in range(0, len(x)):
for k in range(0, len(x[i])):
if x[i][k] > 0:
pass # do nothing since it would be effectively replacing x with x
else:
x[i][k] = 0
return x

def arctan(x, derivative=False):
if (derivative == True):
return (np.cos(x) ** 2)
return np.arctan(x)

def step(x, derivative=False):
if (derivative == True):
for i in range(0, len(x)):
for k in range(len(x[i])):
if x[i][k] > 0:
x[i][k] = 0
return x
for i in range(0, len(x)):
for k in range(0, len(x[i])):
if x[i][k] > 0:
x[i][k] = 1
else:
x[i][k] = 0
return x

def squash(x, derivative=False):
if (derivative == True):
for i in range(0, len(x)):
for k in range(0, len(x[i])):
if x[i][k] > 0:
x[i][k] = (x[i][k]) / (1 + x[i][k])
else:
x[i][k] = (x[i][k]) / (1 - x[i][k])
return x
for i in range(0, len(x)):
for k in range(0, len(x[i])):
x[i

Solution

Your code is correct. What can be improved is readability and speed.

First thing first, don't mix up the computation of the function and the computation of the derivative. I understand that passing True to the function to get the derivative can be convenient, but consider doing something less confusing.

def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def sigmoid_derivative(x)
    return x * (1 - x)

You can always pass both versions of the function to your gradient descent function, or create Sigmoid class with forward/backward versions. This is what PyBrain and Torch do for example. Let's look at the PyBrain ReLU since it's done in Python:

from pybrain.structure.modules.neuronlayer import NeuronLayer

class ReluLayer(NeuronLayer):
    """ Layer of rectified linear units (relu). """

    def _forwardImplementation(self, inbuf, outbuf):
        outbuf[:] = inbuf * (inbuf > 0)

    def _backwardImplementation(self, outerr, inerr, outbuf, inbuf):
        inerr[:] = outerr * (inbuf > 0)

I think looking at this is a good way to review your code. First, what you did is not different, congratulations! The differences are mostly about optimizations, because speed is really important in neural networks. Let's look at the differences:

They did not conflate the two functions as I explained earlier

They set up their network to avoid allocations during training: they set outbuff using inbuf inplace: once they created their neural network, they don't need to create anything, just update weights to go as fast as possible. It probably wasn't your main concern here, and that's perfectly OK if you're doing this to learn.

They don't use loops and conditions, but stick to vectorized operations. Again, it's an optimization concern: vectorized operations get executed much faster. Maybe you did not think about the inbuf > 0 trick (which will create a matrix of zeros and ones), but it will make your code faster.

They group all computation: the equivalent of inerr[:] = outerr (inbuf > 0) in your code is layerDelta = layerError relu(layers[i-1], True). I don't think this is a crucial difference.

Code Snippets

def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def sigmoid_derivative(x)
    return x * (1 - x)

from pybrain.structure.modules.neuronlayer import NeuronLayer

class ReluLayer(NeuronLayer):
    """ Layer of rectified linear units (relu). """

    def _forwardImplementation(self, inbuf, outbuf):
        outbuf[:] = inbuf * (inbuf > 0)

    def _backwardImplementation(self, outerr, inerr, outbuf, inbuf):
        inerr[:] = outerr * (inbuf > 0)

Context

StackExchange Code Review Q#132023, answer score: 3

Revisions (0)

No revisions yet.