patternpythonMinor
Checking convergence of 2-layer neural network in python
Viewed 0 times
layerneuralcheckingpythonconvergencenetwork
Problem
I am working with the following code:
```
import numpy as np
def sigmoid(x):
return 1.0/(1.0 + np.exp(-x))
def sigmoid_prime(x):
return sigmoid(x)*(1.0-sigmoid(x))
def tanh(x):
return np.tanh(x)
def tanh_prime(x):
return 1.0 - x**2
class NeuralNetwork:
def __init__(self, layers, activation='tanh'):
if activation == 'sigmoid':
self.activation = sigmoid
self.activation_prime = sigmoid_prime
elif activation == 'tanh':
self.activation = tanh
self.activation_prime = tanh_prime
# Set weights
self.weights = []
# layers = [2,2,1]
# range of weight values (-1,1)
# input and hidden layers - random((2+1, 2+1)) : 3 x 3
for i in range(1, len(layers) - 1):
r = 2*np.random.random((layers[i-1] + 1, layers[i] + 1)) -1
self.weights.append(r)
# output layer - random((2+1, 1)) : 3 x 1
r = 2*np.random.random( (layers[i] + 1, layers[i+1])) - 1
self.weights.append(r)
def fit(self, X, y, learning_rate=0.2, epochs=100000):
# Add column of ones to X
# This is to add the bias unit to the input layer
ones = np.atleast_2d(np.ones(X.shape[0]))
X = np.concatenate((ones.T, X), axis=1)
for k in range(epochs):
if k % 10000 == 0: print 'epochs:', k
i = np.random.randint(X.shape[0])
a = [X[i]]
for l in range(len(self.weights)):
dot_value = np.dot(a[l], self.weights[l])
activation = self.activation(dot_value)
a.append(activation)
# output layer
error = y[i] - a[-1]
deltas = [error * self.activation_prime(a[-1])]
# we need to begin at the second to last layer
# (a layer before the output layer)
for l in range(len(a) - 2, 0, -1):
deltas.append(deltas[-1].dot(self.weights[l].T)*
```
import numpy as np
def sigmoid(x):
return 1.0/(1.0 + np.exp(-x))
def sigmoid_prime(x):
return sigmoid(x)*(1.0-sigmoid(x))
def tanh(x):
return np.tanh(x)
def tanh_prime(x):
return 1.0 - x**2
class NeuralNetwork:
def __init__(self, layers, activation='tanh'):
if activation == 'sigmoid':
self.activation = sigmoid
self.activation_prime = sigmoid_prime
elif activation == 'tanh':
self.activation = tanh
self.activation_prime = tanh_prime
# Set weights
self.weights = []
# layers = [2,2,1]
# range of weight values (-1,1)
# input and hidden layers - random((2+1, 2+1)) : 3 x 3
for i in range(1, len(layers) - 1):
r = 2*np.random.random((layers[i-1] + 1, layers[i] + 1)) -1
self.weights.append(r)
# output layer - random((2+1, 1)) : 3 x 1
r = 2*np.random.random( (layers[i] + 1, layers[i+1])) - 1
self.weights.append(r)
def fit(self, X, y, learning_rate=0.2, epochs=100000):
# Add column of ones to X
# This is to add the bias unit to the input layer
ones = np.atleast_2d(np.ones(X.shape[0]))
X = np.concatenate((ones.T, X), axis=1)
for k in range(epochs):
if k % 10000 == 0: print 'epochs:', k
i = np.random.randint(X.shape[0])
a = [X[i]]
for l in range(len(self.weights)):
dot_value = np.dot(a[l], self.weights[l])
activation = self.activation(dot_value)
a.append(activation)
# output layer
error = y[i] - a[-1]
deltas = [error * self.activation_prime(a[-1])]
# we need to begin at the second to last layer
# (a layer before the output layer)
for l in range(len(a) - 2, 0, -1):
deltas.append(deltas[-1].dot(self.weights[l].T)*
Solution
The reasons for the speed discrepancy
-
The reason for the differences in timing are because evaluating
-
Is
-
In fact, if you use the the actual definition of the derivative of
So with this alternate
-
The remaining 2× difference is because in your factored expression of
As expected, this speeds things up two-fold relative to your original definition.
-
Since the
-
You might be interested in the
Other comments
These comments aren't a thorough review, but just some things I noticed.
-
Why are your
-
You probably don't need the
-
If you are going to loop, you don't need to do
-
Write some docstrings for your functions please!
-
The reason for the differences in timing are because evaluating
sigmoid_prime() takes far longer than tanh_prime(). You can see this if you use a line profiler such as the line_profiler module.-
Is
tanh_prime() supposed to be the derivative of tanh()? If so, you might want to double-check your formula. The derivative of tanh(x) is 1. - tanh(x)2, not 1. - x2.-
In fact, if you use the the actual definition of the derivative of
tanh(), the timings become much more similar.def tanh_prime_alt(x):
return 1 - tanh(x)**2
foo = np.random.rand(10000)
%timeit -n 100 tanh_prime(foo)
%timeit -n 100 tanh_prime_alt(foo)
%timeit -n 100 sigmoid_prime(foo)
100 loops, best of 3: 10.2 µs per loop
100 loops, best of 3: 116 µs per loop
100 loops, best of 3: 279 µs per loopSo with this alternate
tanh_prime(), the sigmoid method is now only 2× slower, not 20× slower. I should emphasize that (a) I don't know enough about neural networks to know if 1. - x**2 is an appropriate expression or approximation to the actual derivative of tanh(), but if it is in fact OK, then (b) the reason that activation = 'tanh' is so much faster is because of this approximation/error.-
The remaining 2× difference is because in your factored expression of
sigmoid_prime(), you are needlessly evaluating sigmoid() twice. I'd instead do this:def sigmoid_prime_alt(x):
sig_x = sigmoid(x)
return sig_x - sig_x**2As expected, this speeds things up two-fold relative to your original definition.
foo = np.random.rand(10000)
%timeit -n 100 sigmoid_prime(foo)
%timeit -n 100 sigmoid_prime_alt(foo)
100 loops, best of 3: 248 µs per loop
100 loops, best of 3: 132 µs per loop-
Since the
sigmoid() function and the tanh() function are related by tanh(x) = 2 sigmoid(2x) - 1, i.e. sigmoid(x) = (1 + tanh(x/2.))/2, then iff you are OK with the weird 1 - x**2 approximation for tanh_prime(), you should be able to work out a similar approximation for sigmoid_prime().-
You might be interested in the
autograd module, which provides a generalized capability to compute symbolic derivatives of most NumPy code.Other comments
These comments aren't a thorough review, but just some things I noticed.
-
Why are your
weights Python lists instead of NumPy arrays? If you're already using NumPy, you might as well use it wherever you can.-
You probably don't need the
for l in range(len(self.weights)): loop, do you? Can't you use NumPy array slicing and the matrix capabilities of np.dot() to replace this loop? -
If you are going to loop, you don't need to do
for l in range(len(self.weights)) and then reference self.weights[l]. You can do for weight in self.weights: and then reference weight in your loop code, for example.-
Write some docstrings for your functions please!
Code Snippets
def tanh_prime_alt(x):
return 1 - tanh(x)**2
foo = np.random.rand(10000)
%timeit -n 100 tanh_prime(foo)
%timeit -n 100 tanh_prime_alt(foo)
%timeit -n 100 sigmoid_prime(foo)
100 loops, best of 3: 10.2 µs per loop
100 loops, best of 3: 116 µs per loop
100 loops, best of 3: 279 µs per loopdef sigmoid_prime_alt(x):
sig_x = sigmoid(x)
return sig_x - sig_x**2foo = np.random.rand(10000)
%timeit -n 100 sigmoid_prime(foo)
%timeit -n 100 sigmoid_prime_alt(foo)
100 loops, best of 3: 248 µs per loop
100 loops, best of 3: 132 µs per loopContext
StackExchange Code Review Q#127449, answer score: 2
Revisions (0)
No revisions yet.