snippetMinor
Machine Learning: how to correctly calculate gradient descent for simple linear problem
Viewed 0 times
problemlinearsimplegradientlearningcalculateforhowmachinecorrectly
Problem
So, I was trying to learn machine learning, and, after watching a couple of Andrew Ng's lectures decided to try and write a simple piece of code to determine what someone's salary would be based on the number of years they worked at a company by this simple linear function:
salary = w1 * numberOfYears + w0
I then used the least square method to calculate the error between the predicted results and the actual results and tried to minimize the error of the gradient of that function:
Error = (1/2) [ sum of (predicted - actual)^2 for all samples (x, y) ]
The predicted value is the one calculated by the linear equation above, and actual = y. w1 and w0 are weights to the function that are set to some initial values (0 and 0 for example) and are changed to decrease the error throughout the process of gradient descent.
Steps to the algorithm:
1) Choose some (random) initial values for the model parameters (w).
2) Calculate the gradient G of the error function with respect to each model parameter.
3) Change the model parameters so that we move a short distance in the direction of the greatest rate of decrease of the error, i.e., in the direction of -G.
4) Repeat steps 2 and 3 until G gets close to zero.
In my implementation I have this code:
If I run this with only one sample (meaning ignore the outermost loop), then the weights continuously subtract a value that is continuously nearing infinity (or -infinity). If I change the w[j] -= ... to a w[j] += ... then the function works for the first value and sets initial w that will solve it for one sample (but that isn't what I was supposed to do as far as I can tell). Also, when I add multiple samples the numbers just do th
salary = w1 * numberOfYears + w0
I then used the least square method to calculate the error between the predicted results and the actual results and tried to minimize the error of the gradient of that function:
Error = (1/2) [ sum of (predicted - actual)^2 for all samples (x, y) ]
The predicted value is the one calculated by the linear equation above, and actual = y. w1 and w0 are weights to the function that are set to some initial values (0 and 0 for example) and are changed to decrease the error throughout the process of gradient descent.
Steps to the algorithm:
1) Choose some (random) initial values for the model parameters (w).
2) Calculate the gradient G of the error function with respect to each model parameter.
3) Change the model parameters so that we move a short distance in the direction of the greatest rate of decrease of the error, i.e., in the direction of -G.
4) Repeat steps 2 and 3 until G gets close to zero.
In my implementation I have this code:
for(int i = 0; i < numSamples; i++) {
double error = prediction(data.X[i]) - data.Y[i];
while(nearsZero(error)) {
for(int j = 0; j < w.length; j++) {
w[j] -= 0.1 * error * data.X[i][j];
}
error = data.Y[i] - prediction(data.X[i]);
}
}If I run this with only one sample (meaning ignore the outermost loop), then the weights continuously subtract a value that is continuously nearing infinity (or -infinity). If I change the w[j] -= ... to a w[j] += ... then the function works for the first value and sets initial w that will solve it for one sample (but that isn't what I was supposed to do as far as I can tell). Also, when I add multiple samples the numbers just do th
Solution
First off, your gradient is
Second, you're computing
error*x1 with respect to w1, but not to w0. The gradient for w0 is just error.Second, you're computing
error with the wrong sign. It should be predicted - actual, not the other way around. What you're doing is essentially adding the gradient at every step.Context
StackExchange Computer Science Q#6972, answer score: 3
Revisions (0)
No revisions yet.