patternpythonMinor

Checking convergence of 2-layer neural network in python

Submitted by: @import:stackexchange-codereview·Mar 10, 2026·

Viewed 0 times

layerneuralcheckingpythonconvergencenetwork

Problem

I am working with the following code:

```
import numpy as np

def sigmoid(x):
return 1.0/(1.0 + np.exp(-x))

def sigmoid_prime(x):
return sigmoid(x)*(1.0-sigmoid(x))

def tanh(x):
return np.tanh(x)

def tanh_prime(x):
return 1.0 - x**2

class NeuralNetwork:

def __init__(self, layers, activation='tanh'):
if activation == 'sigmoid':
self.activation = sigmoid
self.activation_prime = sigmoid_prime
elif activation == 'tanh':
self.activation = tanh
self.activation_prime = tanh_prime

# Set weights
self.weights = []
# layers = [2,2,1]
# range of weight values (-1,1)
# input and hidden layers - random((2+1, 2+1)) : 3 x 3
for i in range(1, len(layers) - 1):
r = 2*np.random.random((layers[i-1] + 1, layers[i] + 1)) -1
self.weights.append(r)
# output layer - random((2+1, 1)) : 3 x 1
r = 2*np.random.random( (layers[i] + 1, layers[i+1])) - 1
self.weights.append(r)

def fit(self, X, y, learning_rate=0.2, epochs=100000):
# Add column of ones to X
# This is to add the bias unit to the input layer
ones = np.atleast_2d(np.ones(X.shape[0]))
X = np.concatenate((ones.T, X), axis=1)

for k in range(epochs):
if k % 10000 == 0: print 'epochs:', k

i = np.random.randint(X.shape[0])
a = [X[i]]

for l in range(len(self.weights)):
dot_value = np.dot(a[l], self.weights[l])
activation = self.activation(dot_value)
a.append(activation)
# output layer
error = y[i] - a[-1]
deltas = [error * self.activation_prime(a[-1])]

# we need to begin at the second to last layer
# (a layer before the output layer)
for l in range(len(a) - 2, 0, -1):
deltas.append(deltas[-1].dot(self.weights[l].T)*

Solution

The reasons for the speed discrepancy

-
The reason for the differences in timing are because evaluating sigmoid_prime() takes far longer than tanh_prime(). You can see this if you use a line profiler such as the line_profiler module.

-
Is tanh_prime() supposed to be the derivative of tanh()? If so, you might want to double-check your formula. The derivative of tanh(x) is 1. - tanh(x)2, not 1. - x2.

-
In fact, if you use the the actual definition of the derivative of tanh(), the timings become much more similar.

def tanh_prime_alt(x):
    return 1 - tanh(x)**2

foo = np.random.rand(10000)
%timeit -n 100 tanh_prime(foo)
%timeit -n 100 tanh_prime_alt(foo)
%timeit -n 100 sigmoid_prime(foo)

100 loops, best of 3: 10.2 µs per loop
100 loops, best of 3: 116 µs per loop
100 loops, best of 3: 279 µs per loop

So with this alternate tanh_prime(), the sigmoid method is now only 2× slower, not 20× slower. I should emphasize that (a) I don't know enough about neural networks to know if 1. - x**2 is an appropriate expression or approximation to the actual derivative of tanh(), but if it is in fact OK, then (b) the reason that activation = 'tanh' is so much faster is because of this approximation/error.

-
The remaining 2× difference is because in your factored expression of sigmoid_prime(), you are needlessly evaluating sigmoid() twice. I'd instead do this:

def sigmoid_prime_alt(x):
    sig_x = sigmoid(x)
    return sig_x - sig_x**2

As expected, this speeds things up two-fold relative to your original definition.

foo = np.random.rand(10000)
%timeit -n 100 sigmoid_prime(foo)
%timeit -n 100 sigmoid_prime_alt(foo)

100 loops, best of 3: 248 µs per loop
100 loops, best of 3: 132 µs per loop

-
Since the sigmoid() function and the tanh() function are related by tanh(x) = 2 sigmoid(2x) - 1, i.e. sigmoid(x) = (1 + tanh(x/2.))/2, then iff you are OK with the weird 1 - x**2 approximation for tanh_prime(), you should be able to work out a similar approximation for sigmoid_prime().

-
You might be interested in the autograd module, which provides a generalized capability to compute symbolic derivatives of most NumPy code.

Other comments

These comments aren't a thorough review, but just some things I noticed.

-
Why are your weights Python lists instead of NumPy arrays? If you're already using NumPy, you might as well use it wherever you can.

-
You probably don't need the for l in range(len(self.weights)): loop, do you? Can't you use NumPy array slicing and the matrix capabilities of np.dot() to replace this loop?

-
If you are going to loop, you don't need to do for l in range(len(self.weights)) and then reference self.weights[l]. You can do for weight in self.weights: and then reference weight in your loop code, for example.

-
Write some docstrings for your functions please!

Code Snippets

def tanh_prime_alt(x):
    return 1 - tanh(x)**2

foo = np.random.rand(10000)
%timeit -n 100 tanh_prime(foo)
%timeit -n 100 tanh_prime_alt(foo)
%timeit -n 100 sigmoid_prime(foo)

100 loops, best of 3: 10.2 µs per loop
100 loops, best of 3: 116 µs per loop
100 loops, best of 3: 279 µs per loop

def sigmoid_prime_alt(x):
    sig_x = sigmoid(x)
    return sig_x - sig_x**2

foo = np.random.rand(10000)
%timeit -n 100 sigmoid_prime(foo)
%timeit -n 100 sigmoid_prime_alt(foo)

100 loops, best of 3: 248 µs per loop
100 loops, best of 3: 132 µs per loop

Context

StackExchange Code Review Q#127449, answer score: 2

Revisions (0)

No revisions yet.