snippetMinor

Machine Learning: how to correctly calculate gradient descent for simple linear problem

Submitted by: @import:stackexchange-cs·Mar 10, 2026·

Viewed 0 times

machine-learning cs stackoverflow linear-algebra

problemlinearsimplegradientlearningcalculateforhowmachinecorrectly

Problem

So, I was trying to learn machine learning, and, after watching a couple of Andrew Ng's lectures decided to try and write a simple piece of code to determine what someone's salary would be based on the number of years they worked at a company by this simple linear function:

salary = w1 * numberOfYears + w0

I then used the least square method to calculate the error between the predicted results and the actual results and tried to minimize the error of the gradient of that function:

Error = (1/2) [ sum of (predicted - actual)^2 for all samples (x, y) ]

The predicted value is the one calculated by the linear equation above, and actual = y. w1 and w0 are weights to the function that are set to some initial values (0 and 0 for example) and are changed to decrease the error throughout the process of gradient descent.

Steps to the algorithm:

1) Choose some (random) initial values for the model parameters (w).

2) Calculate the gradient G of the error function with respect to each model parameter.

3) Change the model parameters so that we move a short distance in the direction of the greatest rate of decrease of the error, i.e., in the direction of -G.

4) Repeat steps 2 and 3 until G gets close to zero.

In my implementation I have this code:

for(int i = 0; i < numSamples; i++) {
    double error = prediction(data.X[i]) - data.Y[i];
    while(nearsZero(error)) {
        for(int j = 0; j < w.length; j++) {
            w[j] -=  0.1 * error * data.X[i][j];
        }
        error = data.Y[i] - prediction(data.X[i]);
    }
}

If I run this with only one sample (meaning ignore the outermost loop), then the weights continuously subtract a value that is continuously nearing infinity (or -infinity). If I change the w[j] -= ... to a w[j] += ... then the function works for the first value and sets initial w that will solve it for one sample (but that isn't what I was supposed to do as far as I can tell). Also, when I add multiple samples the numbers just do th

Solution

First off, your gradient is error*x1 with respect to w1, but not to w0. The gradient for w0 is just error.

Second, you're computing error with the wrong sign. It should be predicted - actual, not the other way around. What you're doing is essentially adding the gradient at every step.

Context

StackExchange Computer Science Q#6972, answer score: 3

Revisions (0)

No revisions yet.