patternMajor

Siamese neural network

Submitted by: @import:stackexchange-codereview·Mar 10, 2026·

Viewed 0 times

neuralnetworksiamese

Problem

I have been studying the architecture of the Siamese neural network introduced by Yann LeCun and his colleagues in 1994 for the recognition of signatures ("Signature verification using a Siamese time delay neural network".pdf, NIPS 1994).

I had some problems in understanding the general architecture of this Siamese neural network model, and discussed with a friend on Cross Validated about it. I think I finally understood it, so now I have move to the next step: to implement it.

We ended up stating that the global algorithm should be something like:

Create the convolutional neural network convNetA, for the 1st signature.

Create the convolutional neural network convNetB, for the 2nd signature.

Tying the convNetA weights to the the convNetB weights.

Setting the cosine similarity function to compute the loss .

Run the training (forwards and backwards).

I'm new to Torch so I do not really know how to implement this algorithm. Here's my first version:

```
-- training
function gradientUpdate(perceptron, dataset, target, learningRate, max_iterations)

for i = 1, max_iterations do

predictionValue = perceptron:forward(dataset)
-- is this the cosine similarity?
-- [output] forward(input):
-- Takes an input object, and computes the corresponding output of the module. In general input and output are Tensors.

io.write(" pre-predictionValue= "..predictionValue .."\n");

-- the minus is because we're goin' backwards
gradientWrtOutput = torch.Tensor({-target})

perceptron:zeroGradParameters() -- zeroGradParameters(): If the module has parameters, this will zero the accumulation of the gradients with respect to these parameters, accumulated through accGradParameters(input, gradOutput,scale) calls. Otherwise, it does nothing.
-- initialization

perceptron:backward(dataset, gradientWrtOutput) -- Performs a backpropagation step through the module, with respect to the given input.

perceptron:update

Solution

I think it's a great project! But it could do with a few improvements:

Neuron Type(1)

Suppose we have a network of perceptrons that we'd like to use to learn to solve some problem. For example, the inputs to the network might be the raw pixel data from a scanned image of a signature. And we'd like the network to learn weights and biases so that the output from the network correctly classifies the digit. To see how learning might work, suppose we make a small change in some weight (or bias) in the network. What we'd like is for this small change in weight to cause only a small corresponding change in the output from the network.

If it were true that a small change in a weight (or bias) causes only a small change in output, then we could use this fact to modify the weights and biases to get our network to behave more in the manner we want. For example, suppose the network was mistakenly classifying an image as an "c" when it should be a "o". We could figure out how to make a small change in the weights and biases so the network gets a little closer to classifying the image as a "o". And then we'd repeat this, changing the weights and biases over and over to produce better and better output. The network would be learning.

The problem is that this isn't what happens when our network contains perceptrons. In fact, a small change in the weights or bias of any single perceptron in the network can sometimes cause the output of that perceptron to completely flip, say from 0 to 1. That flip may then cause the behavior of the rest of the network to completely change in some very complicated way. So while your "o" might now be classified correctly, the behavior of the network on all the other images is likely to have completely changed in some hard-to-control way. That makes it difficult to see how to gradually modify the weights and biases so that the network gets closer to the desired behavior. Perhaps there's some clever way of getting around this problem. But it's not immediately obvious how we can get a network of perceptrons to learn.

We can overcome this problem by introducing a new type of artificial neuron called a sigmoid neuron. Sigmoid neurons are similar to perceptrons, but modified so that small changes in their weights and bias cause only a small change in their output. That's the crucial fact which will allow a network of sigmoid neurons to learn.

Just like a perceptron, the sigmoid neuron has inputs, \$ x1 \$, \$ x2 \$, ... But instead of being just 0 or 1, these inputs can also take on any values between 0 and 1. So, for instance, 0.638 is a valid input for a sigmoid neuron.

The Sigmoid Neuron is defined as:

$$ \sigma(z) = \dfrac{1}{1 + e^{-z}} $$

Torch implements this neuron type here.

(1) Excerpt with minor edits from Neural Networks and Deep learning

Cost Function

I don't see any use of a cost function in your code. I'm going to recommend you read this section in Neural Networks and Deep Learning to get a good reason why you should be using one.

In short, the cost function returns a number representing how well the neural network performed to map training examples to correct output. The basic idea is that the more "wrong" our network is at achieving the desired results, the higher the cost and the more we'll want to adjust the weights and bias to achieve a lower cost. We try and minimize this cost using methods such as gradient descent.

There are certain properties that you look for in a cost function, such as convexity (so gradient descent finds a global optima instead of getting stuck in a local optima). As the book suggests, I would lean towards using the cross-entropy cost function.

The way we implement this in Torch is with Criterions. Torch seems to have implemented a bunch of these cost functions, and I encourage you to try different ones and see how they affect your neural net accuracy.

Over-fitting

It could be possible that you fit your data too well, to the point where we don't generalize well enough. An example of this is given in the picture:

Noisy, linear-ish data is fitted to both linear and polynomial functions. Although the polynomial function is a perfect fit, the linear version generalizes the data better.

I don't know Lua very well, but by looking at your code I don't see any attempts to reduce over-fitting. A common approach to this is by implementing regularization. Since it's too hard of a topic to cover in-depth here, I'll leave you to understand it if you would like. It is quite simple to use once its concepts are understood, you can see from this Torch implementation here.

Another way to reduce over-fitting is by introducing dropout. At each training stage, individual nodes are "dropped out" of the net so that a reduced network is left. Only the reduced network is trained on the data in that stage. The removed nodes are then reinserted into the network with their original weights. The nodes become somewhat more insensitive to the weights o

Context

StackExchange Code Review Q#93690, answer score: 30

Revisions (0)

No revisions yet.