patternpythonMinor

Backpropagation in simple Neural Network

Submitted by: @import:stackexchange-codereview·Mar 10, 2026·

Viewed 0 times

neuralsimplenetworkbackpropagation

Problem

I've been working on a simple neural network implemented in python. Currently, it seems to be learning, but unfortunately it doesn't seem to be learning effectively. The graph below shows the output of my neural network when trained over about 15,000 iterations, with 1000 training examples (it's trying to learn x2). The cost of the network also seems to always bottom out just over 0.41, and I can't get it any lower.

EDIT: I tried leaving the network to learn for a few hours, over probably several million or more iterations, and the two lines on the grah did seem to come together more, however, the cost refused to budge below 0.41-ish.

I suspect the issue is with my implementation of the backpropagation algorithm, since the high value for cost given by my implementation seems to correspond with the seeming inaccuracy when the network is plotted on a graph.

However, my full code is below, in case I've missed something in other parts of the implementation.

```
import numpy as np
class neural_network:

def __init__(self,dimensions,nonlinear_function,nonlinear_function_derivative,seed=None):
if seed:
np.random.seed(seed)
self.g=nonlinear_function
self.g_dash=nonlinear_function_derivative
self.theta_one=2*np.random.random((dimensions[0],dimensions[1]))-1
self.theta_two=2*np.random.random((dimensions[1]+1,dimensions[2]))-1
def predict(self,x):
self.a1=x
self.z2=np.dot(x,self.theta_one)
self.a2=np.concatenate((self.g(self.z2),np.ones((np.shape(self.z2)[0],1))),axis=1)

self.z3=np.dot(self.a2,self.theta_two)
self.a3=self.g(self.z3)
return self.a3
def backprop(self,y,alpha,reg):
self.y=y
self.delta_three=self.a3-y
self.delta_two=np.dot(self.delta_three,self.theta_two.T)[::,:-1]*self.g_dash(self.z2)

self.Delta2=np.dot(self.a2.T,self.delta_three)
self.Delta1=np.dot(self.a1.T,self.delta_two)

self.theta_one_gradie

Solution

You shouldn't be using an activation function on the output layer of your network. This is restricting the output of your network to (0, 1) in your example, as you are using a sigmoidal activation. Try removing the activation and retraining.

Context

StackExchange Code Review Q#145954, answer score: 2

Revisions (0)

No revisions yet.