patternMinor
Unsupervised Learning: BCM or Oja's Rule
Viewed 0 times
unsupervisedrulelearningbcmoja
Problem
I am learning about unsupervised machine learning, and am a bit confused regarding different algorithms to update weights. So, I understand that both Oja's Rule and BCM can be used.
In Oja's rule:
Where
In BCM:
Where the idea is that unless my postsynaptic strength exceeds a threshold theta then I don't want my connectio to be strenghtened.
Studying competitive learning, which is yet another type of unsupervised learning I cam across another rule:
In this case however
However, I don't understand when should I use which rule? For example, why couldn't I use a rule that combines both Oja's and BCM, hence only increasing connection weights when the output exceeds a given threshold, and preventing weights from growing out of proportion?
In Oja's rule:
dw/dt = k*x*y - w*y^2Where
x is the value at the input neuron, y is the value at the output neuron and w is the connection strength between the two. The idea is that this prevents weights from growing out of proportion.In BCM:
dw/dt = k*(y-theta)*xWhere the idea is that unless my postsynaptic strength exceeds a threshold theta then I don't want my connectio to be strenghtened.
Studying competitive learning, which is yet another type of unsupervised learning I cam across another rule:
dw/dt = n*(x-y)In this case however
x is the full input vector and y is the vector representation of the output vector. The idea being that we move the prototype that responded the strongest to a given input closer to it, making the two more similar.However, I don't understand when should I use which rule? For example, why couldn't I use a rule that combines both Oja's and BCM, hence only increasing connection weights when the output exceeds a given threshold, and preventing weights from growing out of proportion?
Solution
Oja's learning rule and the BCM share the underlying generative model, a linear perceptron: $$y = w^T x,$$ where $w$ and $x$ are vectors of the similar dimension.
But they differ in their goals:
Oja's rule (run with a sufficiently small learning rate) extracts the 1st principle component of the covariance matrix of the data: $$C = \langle xx^T \rangle_{p(x)}$$ s.t. $ \lambda_1 w_1 = C w_1$, where $\lambda_1$ is the largest eigenvalue.
The BCM rule maximizes the input selectivity with respect to a specific input pattern. The selectivity is defined as $$ s(w) = 1 - \frac{\langle w^T x^{(i)} \rangle}{ \max_i w^T x^{(i)} }. $$ The selectivity is maximized if the output neuron responds strongly to one of the input patterns (e.g. the kth stimulus $x^{(k)}$) but barely responds to all others.
The third rule doesn't make sense to me (I'll use explicit notation for vectors now). Assuming that $n$ is a scalar (e.g. learning rate), $\vec{x}$, $\vec{y}$ and $\vec{w}$ must have the same dimensions. Then, what's your generative model? Neither can it be $y = \vec{w}^T \vec{x}$ (because $\vec{y}$ is a vector) nor $\vec{y} = W \vec{x}$ because your weights are vector and not a matrix..
But they differ in their goals:
Oja's rule (run with a sufficiently small learning rate) extracts the 1st principle component of the covariance matrix of the data: $$C = \langle xx^T \rangle_{p(x)}$$ s.t. $ \lambda_1 w_1 = C w_1$, where $\lambda_1$ is the largest eigenvalue.
The BCM rule maximizes the input selectivity with respect to a specific input pattern. The selectivity is defined as $$ s(w) = 1 - \frac{\langle w^T x^{(i)} \rangle}{ \max_i w^T x^{(i)} }. $$ The selectivity is maximized if the output neuron responds strongly to one of the input patterns (e.g. the kth stimulus $x^{(k)}$) but barely responds to all others.
The third rule doesn't make sense to me (I'll use explicit notation for vectors now). Assuming that $n$ is a scalar (e.g. learning rate), $\vec{x}$, $\vec{y}$ and $\vec{w}$ must have the same dimensions. Then, what's your generative model? Neither can it be $y = \vec{w}^T \vec{x}$ (because $\vec{y}$ is a vector) nor $\vec{y} = W \vec{x}$ because your weights are vector and not a matrix..
Context
StackExchange Computer Science Q#42611, answer score: 3
Revisions (0)
No revisions yet.