snippetMinor
How can an artificial neural net change the sign of a weight?
Viewed 0 times
cantheneuralartificialweightnethowsignchange
Problem
My neural net is having trouble switching the sign of a weight. The issue is that the deltas applied to the weight are proportional to that weight, so when it gets closer to zero, the deltas become smaller and are never sufficient to get it past that point. I tried adding a momentum term with 5% of the previous iteration contributing to the current iteration without success.
In my example I have a very simple neural net with one input and one output and no hidden layer. The transfer function in the neural net is a sigmoid, and the function I am trying to learn is also a sigmoid. Specifically: $y = S(10 x - 5)$ where S is the sigmoid function. The neural net gets to a point where its internal weights are essentially generating a function very close to $y = S(2x + 0)$. At this point it gets stuck because the fitted function is sometimes above and sometimes below the desired function, but the weight on $x$ needs to go up and the weight on the constant term needs to go down. I ran it for 1,000 epochs and 100,000 epochs and it stays in the same place. The initial weights were 0.6 and 0.2, so it made improvements over that.
How can a net get over this hump? Should I not be scaling the weight delta by the weight value?
Here's the reference I'm using for computing the weights for back-propagation. I am using a learning rate of 0.05 and apply the deltas after each training example. The x-values (input) used in training are 0.00, 0.01, 0.02, ... 0.99, 1.00. I always play them in order.
Thanks for your help.
Note: I originally posted this on stack overflow and was directed here.
In my example I have a very simple neural net with one input and one output and no hidden layer. The transfer function in the neural net is a sigmoid, and the function I am trying to learn is also a sigmoid. Specifically: $y = S(10 x - 5)$ where S is the sigmoid function. The neural net gets to a point where its internal weights are essentially generating a function very close to $y = S(2x + 0)$. At this point it gets stuck because the fitted function is sometimes above and sometimes below the desired function, but the weight on $x$ needs to go up and the weight on the constant term needs to go down. I ran it for 1,000 epochs and 100,000 epochs and it stays in the same place. The initial weights were 0.6 and 0.2, so it made improvements over that.
How can a net get over this hump? Should I not be scaling the weight delta by the weight value?
Here's the reference I'm using for computing the weights for back-propagation. I am using a learning rate of 0.05 and apply the deltas after each training example. The x-values (input) used in training are 0.00, 0.01, 0.02, ... 0.99, 1.00. I always play them in order.
Thanks for your help.
Note: I originally posted this on stack overflow and was directed here.
Solution
Your problem stems from the fact that the equations in the link you provide refer to linear neural networks. Your network is not linear, as it has a sigmoid activation function. So you need to include its gradient in weights correction. Wikipedia explains how to do it pretty well. Here is an example Python script, which learns your target function in 1000 epochs achieving average error of order $10^{-3}$ in exactly the same conditions as you describe. Increasing the number of epochs makes the error smaller.
Note: this is NOT a recommended implementation of a neural network - just a demonstration script written on a whim.
Output:
Note: this is NOT a recommended implementation of a neural network - just a demonstration script written on a whim.
import math
def sigmoid(x):
return 1.0 / (1.0 + math.exp(-x))
class Network:
def __init__(self, w, b):
self.w = w
self.b = b
def __call__(self, x):
return sigmoid(self.w * x + self.b)
def propagateError(self, x, error, output):
dEdW = error * output * (1 - output) * x
dEdB = error * output * (1 - output)
self.w += 0.05 * dEdW
self.b += 0.05 * dEdB
def target(x):
return sigmoid(10 * x - 5)
n = Network(0.6, 0.2)
# Training
for it in range(1000):
for i in range(100):
x = 1.0 * i / 100
output = n(x)
t = target(x)
error = t - output
n.propagateError(x, error, output)
# Testing
te = 0
for i in range(100):
x = 1.0 * i / 100
output = n(x)
t = target(x)
error = t - output
print("Desired:",t)
print("Actual:",output)
print("Error:",error)
te += error
print("---")
print("Average error:",te/100.0)Output:
('Desired:', 0.0066928509242848554)
('Actual:', 0.012425919415559344)
('Error:', -0.005733068491274489)
---
('Desired:', 0.007391541344281971)
('Actual:', 0.013550016530277079)
('Error:', -0.006158475185995108)
...
('Desired:', 0.9918374288468401)
('Actual:', 0.9855607114050042)
('Error:', 0.006276717441835888)
---
('Desired:', 0.9926084586557181)
('Actual:', 0.9867575913715647)
('Error:', 0.005850867284153405)
---
('Average error:', -0.0013496485567271077)Code Snippets
import math
def sigmoid(x):
return 1.0 / (1.0 + math.exp(-x))
class Network:
def __init__(self, w, b):
self.w = w
self.b = b
def __call__(self, x):
return sigmoid(self.w * x + self.b)
def propagateError(self, x, error, output):
dEdW = error * output * (1 - output) * x
dEdB = error * output * (1 - output)
self.w += 0.05 * dEdW
self.b += 0.05 * dEdB
def target(x):
return sigmoid(10 * x - 5)
n = Network(0.6, 0.2)
# Training
for it in range(1000):
for i in range(100):
x = 1.0 * i / 100
output = n(x)
t = target(x)
error = t - output
n.propagateError(x, error, output)
# Testing
te = 0
for i in range(100):
x = 1.0 * i / 100
output = n(x)
t = target(x)
error = t - output
print("Desired:",t)
print("Actual:",output)
print("Error:",error)
te += error
print("---")
print("Average error:",te/100.0)('Desired:', 0.0066928509242848554)
('Actual:', 0.012425919415559344)
('Error:', -0.005733068491274489)
---
('Desired:', 0.007391541344281971)
('Actual:', 0.013550016530277079)
('Error:', -0.006158475185995108)
...
('Desired:', 0.9918374288468401)
('Actual:', 0.9855607114050042)
('Error:', 0.006276717441835888)
---
('Desired:', 0.9926084586557181)
('Actual:', 0.9867575913715647)
('Error:', 0.005850867284153405)
---
('Average error:', -0.0013496485567271077)Context
StackExchange Computer Science Q#23209, answer score: 4
Revisions (0)
No revisions yet.