patternpythonMinor
Different neural network activation functions and gradient descent
Viewed 0 times
gradientneuraldifferentandfunctionsactivationnetworkdescent
Problem
I've implemented a bunch of activation functions for neural networks, and I just want have validation that they work correctly mathematically. I implemented sigmoid, tanh, relu, arctan, step function, squash, and gaussian and I use their implicit derivative (in terms of the output) for backpropagation.
```
import numpy as np
def sigmoid(x, derivative=False):
if (derivative == True):
return x * (1 - x)
return 1 / (1 + np.exp(-x))
def tanh(x, derivative=False):
if (derivative == True):
return (1 - (x ** 2))
return np.tanh(x)
def relu(x, derivative=False):
if (derivative == True):
for i in range(0, len(x)):
for k in range(len(x[i])):
if x[i][k] > 0:
x[i][k] = 1
else:
x[i][k] = 0
return x
for i in range(0, len(x)):
for k in range(0, len(x[i])):
if x[i][k] > 0:
pass # do nothing since it would be effectively replacing x with x
else:
x[i][k] = 0
return x
def arctan(x, derivative=False):
if (derivative == True):
return (np.cos(x) ** 2)
return np.arctan(x)
def step(x, derivative=False):
if (derivative == True):
for i in range(0, len(x)):
for k in range(len(x[i])):
if x[i][k] > 0:
x[i][k] = 0
return x
for i in range(0, len(x)):
for k in range(0, len(x[i])):
if x[i][k] > 0:
x[i][k] = 1
else:
x[i][k] = 0
return x
def squash(x, derivative=False):
if (derivative == True):
for i in range(0, len(x)):
for k in range(0, len(x[i])):
if x[i][k] > 0:
x[i][k] = (x[i][k]) / (1 + x[i][k])
else:
x[i][k] = (x[i][k]) / (1 - x[i][k])
return x
for i in range(0, len(x)):
for k in range(0, len(x[i])):
x[i
```
import numpy as np
def sigmoid(x, derivative=False):
if (derivative == True):
return x * (1 - x)
return 1 / (1 + np.exp(-x))
def tanh(x, derivative=False):
if (derivative == True):
return (1 - (x ** 2))
return np.tanh(x)
def relu(x, derivative=False):
if (derivative == True):
for i in range(0, len(x)):
for k in range(len(x[i])):
if x[i][k] > 0:
x[i][k] = 1
else:
x[i][k] = 0
return x
for i in range(0, len(x)):
for k in range(0, len(x[i])):
if x[i][k] > 0:
pass # do nothing since it would be effectively replacing x with x
else:
x[i][k] = 0
return x
def arctan(x, derivative=False):
if (derivative == True):
return (np.cos(x) ** 2)
return np.arctan(x)
def step(x, derivative=False):
if (derivative == True):
for i in range(0, len(x)):
for k in range(len(x[i])):
if x[i][k] > 0:
x[i][k] = 0
return x
for i in range(0, len(x)):
for k in range(0, len(x[i])):
if x[i][k] > 0:
x[i][k] = 1
else:
x[i][k] = 0
return x
def squash(x, derivative=False):
if (derivative == True):
for i in range(0, len(x)):
for k in range(0, len(x[i])):
if x[i][k] > 0:
x[i][k] = (x[i][k]) / (1 + x[i][k])
else:
x[i][k] = (x[i][k]) / (1 - x[i][k])
return x
for i in range(0, len(x)):
for k in range(0, len(x[i])):
x[i
Solution
Your code is correct. What can be improved is readability and speed.
First thing first, don't mix up the computation of the function and the computation of the derivative. I understand that passing
You can always pass both versions of the function to your gradient descent function, or create
I think looking at this is a good way to review your code. First, what you did is not different, congratulations! The differences are mostly about optimizations, because speed is really important in neural networks. Let's look at the differences:
First thing first, don't mix up the computation of the function and the computation of the derivative. I understand that passing
True to the function to get the derivative can be convenient, but consider doing something less confusing.def sigmoid(x):
return 1 / (1 + np.exp(-x))
def sigmoid_derivative(x)
return x * (1 - x)You can always pass both versions of the function to your gradient descent function, or create
Sigmoid class with forward/backward versions. This is what PyBrain and Torch do for example. Let's look at the PyBrain ReLU since it's done in Python:from pybrain.structure.modules.neuronlayer import NeuronLayer
class ReluLayer(NeuronLayer):
""" Layer of rectified linear units (relu). """
def _forwardImplementation(self, inbuf, outbuf):
outbuf[:] = inbuf * (inbuf > 0)
def _backwardImplementation(self, outerr, inerr, outbuf, inbuf):
inerr[:] = outerr * (inbuf > 0)I think looking at this is a good way to review your code. First, what you did is not different, congratulations! The differences are mostly about optimizations, because speed is really important in neural networks. Let's look at the differences:
- They did not conflate the two functions as I explained earlier
- They set up their network to avoid allocations during training: they set
outbuffusinginbufinplace: once they created their neural network, they don't need to create anything, just update weights to go as fast as possible. It probably wasn't your main concern here, and that's perfectly OK if you're doing this to learn.
- They don't use loops and conditions, but stick to vectorized operations. Again, it's an optimization concern: vectorized operations get executed much faster. Maybe you did not think about the
inbuf > 0trick (which will create a matrix of zeros and ones), but it will make your code faster.
- They group all computation: the equivalent of
inerr[:] = outerr (inbuf > 0)in your code islayerDelta = layerError relu(layers[i-1], True). I don't think this is a crucial difference.
Code Snippets
def sigmoid(x):
return 1 / (1 + np.exp(-x))
def sigmoid_derivative(x)
return x * (1 - x)from pybrain.structure.modules.neuronlayer import NeuronLayer
class ReluLayer(NeuronLayer):
""" Layer of rectified linear units (relu). """
def _forwardImplementation(self, inbuf, outbuf):
outbuf[:] = inbuf * (inbuf > 0)
def _backwardImplementation(self, outerr, inerr, outbuf, inbuf):
inerr[:] = outerr * (inbuf > 0)Context
StackExchange Code Review Q#132023, answer score: 3
Revisions (0)
No revisions yet.