patternModerate
Must Neural Networks always converge?
Viewed 0 times
mustneuralalwaysnetworksconverge
Problem
Introduction
Step One
I wrote a standard backpropegating neural network, and to test it, I decided to have it map XOR.
It is a 2-2-1 network (with tanh activation function)
For testing purposes, I manually set up the top middle neuron (M1) to be an AND gate and the lower neuron (M2) to be an OR gate (both output 1 if true and -1 if false).
Now, I also manually set up the connection M1-O1 to be -.5, M2-O1 to be 1, and
B2 to be -.75
So if M1 = 1 and M2 = 1, the sum is (-0.5 +1 -0.75 = -.25) tanh(0.25) = -0.24
if M1 = -1 and M2 = 1, the sum is ((-0.5)*(-1) +1 -0.75 = .75) tanh(0.75) = 0.63
if M1 = -1 and M2 = -1, the sum is ((-0.5)*(-1) -1 -0.75 = -1.25) tanh(1.25) = -0.8
This is a relatively good result for a "first iteration".
Step Two
I then proceeded to modify these weights a bit, and then train them using error propagation algorithm (based on gradient descent). In this stage, I leave the weights between the input and middle neurons intact, and just modify the weights between the middle (and bias) and output.
For testing, I set the weights to be and .5 .4 .3 (respectively for M1, M2 and bias)
Here, however, I start having issues.
My Question
I set my learning rate to .2 and let the program iterate through training data (A B A^B) for 10000 iterations or more.
Most of the time, the weights converge to a good result. However, at times, those weights converge to (say) 1.5, 5.7, and .9 which results in a +1 output (even) to an input of {1, 1} (when the result should be a -1).
Is it possible for a relatively simple ANN which has a solution to not converge at all or is there a bug in my implementation?
Step One
I wrote a standard backpropegating neural network, and to test it, I decided to have it map XOR.
It is a 2-2-1 network (with tanh activation function)
X1 M1
O1
X2 M2
B1 B2For testing purposes, I manually set up the top middle neuron (M1) to be an AND gate and the lower neuron (M2) to be an OR gate (both output 1 if true and -1 if false).
Now, I also manually set up the connection M1-O1 to be -.5, M2-O1 to be 1, and
B2 to be -.75
So if M1 = 1 and M2 = 1, the sum is (-0.5 +1 -0.75 = -.25) tanh(0.25) = -0.24
if M1 = -1 and M2 = 1, the sum is ((-0.5)*(-1) +1 -0.75 = .75) tanh(0.75) = 0.63
if M1 = -1 and M2 = -1, the sum is ((-0.5)*(-1) -1 -0.75 = -1.25) tanh(1.25) = -0.8
This is a relatively good result for a "first iteration".
Step Two
I then proceeded to modify these weights a bit, and then train them using error propagation algorithm (based on gradient descent). In this stage, I leave the weights between the input and middle neurons intact, and just modify the weights between the middle (and bias) and output.
For testing, I set the weights to be and .5 .4 .3 (respectively for M1, M2 and bias)
Here, however, I start having issues.
My Question
I set my learning rate to .2 and let the program iterate through training data (A B A^B) for 10000 iterations or more.
Most of the time, the weights converge to a good result. However, at times, those weights converge to (say) 1.5, 5.7, and .9 which results in a +1 output (even) to an input of {1, 1} (when the result should be a -1).
Is it possible for a relatively simple ANN which has a solution to not converge at all or is there a bug in my implementation?
Solution
(I assume by "error propagation" you mean what I call "error back-propagation.")
On page 231 of Neural Networks (by Haykin), he states that back propagation always converges, although the rate can be (in his words) "excruciatingly slow."
I think what you are asking though is not whether the algorithm will always converge, but whether it will always converge to the optimal answer. And unfortunately, it won't. Even in simple cases like yours, it's entirely possible that there are local minima which are not global minima.
Dealing with local extrema is a hugely important topic in optimization, and you can find a plethora of advice on how to deal with it. One of the most common is what it sounds like you're doing: random restarts (i.e. just run the algorithm multiple times, each starting from a random place).
To figure out if there's a bug in your code, I would print out the error term and verify that it decreases at each iteration. If so, then you're probably just hitting a local minima.
On page 231 of Neural Networks (by Haykin), he states that back propagation always converges, although the rate can be (in his words) "excruciatingly slow."
I think what you are asking though is not whether the algorithm will always converge, but whether it will always converge to the optimal answer. And unfortunately, it won't. Even in simple cases like yours, it's entirely possible that there are local minima which are not global minima.
Dealing with local extrema is a hugely important topic in optimization, and you can find a plethora of advice on how to deal with it. One of the most common is what it sounds like you're doing: random restarts (i.e. just run the algorithm multiple times, each starting from a random place).
To figure out if there's a bug in your code, I would print out the error term and verify that it decreases at each iteration. If so, then you're probably just hitting a local minima.
Context
StackExchange Computer Science Q#2406, answer score: 13
Revisions (0)
No revisions yet.