0:30

Let's begin by asking what can a linear recurrent network do?

Here's the equation for a linear recurrent network.

If there are N output neurons, then the output vector v is going to be N by 1,

and the feedforward input to these ouput neurons is given by W times U, and that's

again going to be an N by 1 vector. And we can call this N by 1 vector h, so

we don't have to write W times U each time.

And the feedback to the output neurons, is given by M x v, where M is the

recurrent connection matrix. What we want to find out, is how the

output of the network v(t) behaves for different values of M, the recurrent

connection matrix M. This is where eigenvectors come to our

rescue. Here is the differential equation that we

are trying to solve, to understand how v{t} behaves.

And this equation, as you can see, contains a mix of vectors and this matrix

times a vector. So that's a pretty complicated equation

to solve. Fortunately, we can use, eigenvectors to

solve this particular differential equation.

How do we do that? Well, suppose the connection matrix, the

N by N vector and connection matrix, is symmetric.

What does that mean? It means that for any particular pair of

output neurons, so if this is neuron number one and this is neuron number two.

Then the fact that the recurrent connection matrix is symmetric, just

means that if one connects to two with some particular value or strength A, then

two connects to one also, with the same value array.

So in other words, M 1, 2 is equal to M 2, 1, is equal to the value A.

And that's what it means for this matrix M, to be symmetric.

Now why is it useful to have the connection matrix M symmetric?

Well it turns out that if M is symmetric, then M has N different orthogonal

eigenvectors, and corresponding eigenvalues would satisfy the standard

eigenvector, eigenvalue equation shown here.

Now what does it mean for these eigenvectors to be orthogonal?

Well, if you take any two of these eigenvectors ei and ej, as long as i is

not equal to j, the fact that they're orthogonal just means that he dot product

of these two eigenvectors is going to be, you guessed it, 0.

Now we can further make these eigenvectors orthonormal, so, orthonormal

means, that these eigenvectors are not only orthogonal, but also, they have a

length of 1. And, we can do that, by dividing each of

these item vectors by their length, then we have the fact that ei.ei is going to

equal 1. And if that's satisfied, then we say that

we have a set of vectors, these eigenvectors, which are orthonormal to

each other. Why is it useful to have these

eigenvectors of them, which are orthonormal to each other?

Well, it turns out that we can now write any n-dimensional vector, including our

output vector, v{t}, as simply a linear combination of our orthonormal

eigenvectors. So these eigenvectors now form a new

basis or a new coordinate system for expressing n-dimensional vectors such as

v{t}. To drive home the point let's look at the

special case of a three dimensional space.

So here's x, y, and z, and lets suppose that this is our vector v{t}.

All we're doing now is expressing this vector vt in a new coordinate system

given by our orthonormal eigenvectors e1, e2, and e3.

And in the xyz system we were writing v(t), as simply the linear combination of

the first component of v times 1,0,0. This was our vector for x.

And v2 times 0,1,0, this is our y component.

And finally for the z component v3 times 0,0,1.

So all we are doing now is instead of expressing vt in the coordinate system

given by the x, y, and z vectors, we are now writing v as a different linear

combination, c1 times e1, plus c2 times e2, plus c3 times e3.

5:14

Now why go through all this trouble? Well it turns out that if you substitute

the equation for v(t) in terms of the ei's into the differential equation for

v, and then further we use the eigenvector equation, as well as the

orthonormality of ei, then we can solve for ci as a function of time.

And so here is the equation for ci as a function of time, and once you have a

closed form expression for ci as the function of time, we can substitute that

value for ci into our equation for v. And therefore, we have solved the

differential equation, and we now have a complete expression, that characterizes

how v changes as a function of time. And if you want to get into all the

mathematical detail of how we derived this expression for ci(t), I would

encourage you to go to the supplementary materials on the course website.

We can now show that the eigenvalues of the recurrent connection matrix,

determine whether the network is stable or not.

To see this, suppose one of lambda I is bigger than 1.

Well, what happens to the output of the network, given by v(t), which is a linear

combination of the item vectors weighted by these coefficient ci?

Well if one of the lambda I's is bigger than 1, lets say that this lambda I here

is equal to 2, which is bigger than 1. Then this term ends up being an

exponential function of time. And so as time goes on you're going to

have this term becoming larger and larger, and therefore ci of t is also

going to become larger and larger. And so the output of the network then,

also grows without any bound, which means that v(t) explodes, and so what you end

up getting is an unstable network. On the other hand if all the eigenvalues

are less than 1, then you should be able to convince yourself, by plugging in

values of lambda I less than 1, in our equation for ci(t), that the network is

stable because v(t) is going to converge to some steady state value.

Which is given simply by the linear combination of all of these coefficients

which are conversed now to this particular value, multiplied by each of

the corresponding eigenvectors. Now we can answer the question that we

posed earlier in the lecture. What can a recurrent network do?

One thing that a linear recurrent network can do, is amplify its inputs.

To see this, suppose that all the lambda I, the eigenvalues are less than 1.

So we showed in the previous slide that the output of the network in the steady

state is going to look like this. And if one of these eigenvalues, let's

say lambda 1 is very close to 1, and all the other eigenvalues are much much

smaller. Then the lambda 1 term, is going to

dominate the sum, and so the steady state output of the network, is going to be

basically the projection of the input onto the, first item vector, divided by

1, minus lambda 1, multiplied by e 1. So, what we have then, is a network that

is amplifying it's input projection. So, if lambda 1, for example, is equal to

0.9, which is close to 1, then 1 over 1 minus lambda 1 is going to be 10.

And so, we have an amplification factor of this projection of the input on to e1

of 10. Now let's look an example of a Linear

Recurrent Network. So, let's assume that each of these

output neurons codes for some angle between minus 180 degrees to plus 180

degrees. So instead of labeling these neurons with

1, 2, 3, 4 and 5, we can label them according to some angles.

So for example, this could be minus 180 degrees, this neuron could be minus 90.

This neuron could be labeled with 0, this with plus 90, and this with 180.

Now, why are we labeling neurons with angles?

It's because we can now define the connection matrix M, as a cosine

function, for example, of the relative angle labeling the neurons.

So in other words, m of theta, theta prime, could be proportional to cosine of

theta minus theta prime. What does this type of connectivity look

like? Well it results in neurons exciting other

neurons that are nearby, and inhibiting other neurons that are further away.

And here's a graphical depiction of the cosine based connectivity function.

So, for neurons that are close to any given neuron, you have excitation, and

for neurons that are further away, you have inhibition.

Now let's ask the question, isn't M, defined by such a connectivity function,

symmetric? In other words, is M theta, theta prime

equal to M theta prime theta? Well, that's the same as asking whether

cosine of x is equal to cosine of minus x, which we know is true.

Which means that yes, the connectivity matrix is indeed symmetric.

Now this type of a connectivity function's interesting because there's

some evidence that such connectivity is also found in the cerebral cortex.

Neurons in the cerebral cortex tend to excite other neurons that are near them,

and inhibit neurons that are further away.

10:44

Now suppose we choose the connectivity matrix of a linear recurrent network to

be proportional to the cosin function, such that all the eigenvalues are 0

except one eigenvalue, which is equal to 0.9.

Then as we showed earlier, we would expect to see amplification.

And we'd expect to see an amplification of the input by a factor of 10.

So, let's see if that really happens when we simulate such a network.

And, not surprisingly, the answer is yes. When we present the network with a noisy

input, we do get an output that is an amplified version of the input, where the

peak of this noisy input has been amplified, and the smaller peaks have

been suppressed. So what else can a linear recurrent

network do? Well the earlier remark that if all the

eigenvalues are less than 1, then the network is stable.

Now suppose one of these eigenvalues, lets say Lambda 1 is exactly equal to 1.

In that case, we can show that we have a different kind of equation for how the

coefficient for c1 evolves. It's given by this differential equation.

And here's something interesting that happens.

So suppose that the input was initially 0, and then it was turned on and then it

was turned off. So we have the input h, which was

initially 0, and then it was turned on to some value and then turned off again.

Then here's what happens, even after the input has been turned off, so even after

h is equal to 0, the network maintains an output.

So the network now maintains a memory of the integral of the past inputs, as given

by this integral shown here. Interestingly there's evidence for

integrator neurons in the brain. In particular in the medial vestibular

nucleus, there are these neurons that maintain a memory for eye position.

So when the input to these neurons comes in the form of bursts, so here's one

burst spikes that changes the eye position.

Here's another burst of spikes from a different neuron, that decreases the eye

position. We note that the integrated neuron

maintains persistent activity, or a memory of the I position by changing its

firing rate. And this is very similar to what we had

in the previous slide. Where we had the neuron maintaining a

memory of the integral of past inputs. So what this goes to show, once again, is

that the brain can do calculus. In this case, we've shown that it can do

integration. And we already showed that it can do

differentiation in the previous lecture. So once again, sorry Newton and Liebowitz

,looks like the brain has beaten you to the punch.

13:36

Let's conclude our tour of recurrent networks, by looking at nonlinear

recurrent networks. And we can make the network nonlinear by

applying a nonlinear function F to the sum of the input and recurrent feedback.

And perhaps the simplest kind of non-linearity is the rectification

non-linearity, which takes any input x and sets it equal to x, if x is greater

than 0 and sets it equal to 0 otherwise. This non-linearity is quite useful

because if you recall, the vector v represents the firing rates of neurons.

And so the rectification non-linearity makes sure that the firing rates never go

below 0. So what can non-linear recurrent networks

do? They can perform amplification, similar

to linear recurrent networks. So here is the input to the non-linear

network. Which is a noisy input with a peak near

0. And here is the output of the nonlinear

network. And you can see how the network has

amplified the input, but it has also cleaned up the input, and it has

suppressed some of the other peaks in the input.

14:44

Now the interesting thing here is that the recurring connections, although they

were again the cosine type recurrent connections, with excitation nearby and

inhibition further away, the eigenvalues, in this case are all 0, but one of the

eigenvalues was actually bigger than 1. So lambda 1 was actually 1.9.

So in the linear recurrent network case, this would have led to an unstable

network. But since we have the rectification

non-linearity, it saves the day, and the network is in fact, stable and gives us

this kind of amplification. Now here's something else that the

non-linear recurrent network can do. It can perform selective attention, which

is it can select one part of the input, and suppress the other part.

So here's an input that contains 2 peaks. And if you look at the output of the

non-linear network, it has essentially focused only on the peak at minus 90

degrees, and it has suppressed the other peak.

So the network is performing a type of winner takes all input selection.

Some might say that the network is implementing the capitalist credo, of the

rich get richer, and the poor get poorer. And some people might even say that the

moral of the story here, is that you have to be non-linear to be a capitalist.

But I think we digress. The same non-linear network can also

perform something called gain modulation. What does that mean?

Well if the inputs look like this. Where you're adding a constant amount to

a particular input. Which basically means you're shifting the

input additively from one level to the other.

The effect on the output is multiplicative.

So the change in the input multiplies the output, and so you get this type of

modulation. Also called, Gain Modulation, of the

output firing rate of the neuron. Now, this is interesting because, this

type of Gain modulation of neuro responses, has also been observed in the

brain. Specifically in area 7A of the parietal

cortex. Finally, the same non-linear network also

maintains a memory of past inputs, just like the linear recurrent network that we

considered a while ago. Here is the input for the non-linear

network, it's basically a bump center around 0.

That's the local input, along with some background input, which is about 0.

The output of the network, as you might expect, is just an amplified version of

the input, with the background suppressed.

What happens to this output, when we turn off the local input?

Here's what we get. So when the local input is turned off,

you still have an output in this network, and the output has a peak at 0, which is

exactly where the peak of the local input was.

So this memory of the input, is being maintained in this network by rigorant

activity. So what we have here then, is a network

that maintains a memory of past activity, when the input has been turned off.

And this is quite similar to the short term memory or working memory of past

inputs, that is maintained by neurons in the pre-frontal cortex in the brain.

We have been so far looking at networks with symmetric recurrent connections,

what about non-symmetric recurrent networks?

Well the simplest form of non-symmetric recurrent networks, would be a network of

excitatory and inhibitory neurons. So for example, if you had one excitatory

neuron, and one inhibitory neuron. You could have the excitatory neuron

exciting the inhibitory neuron, and the inhibitory neuron then inhibiting, the

excitatory neuron. And perhaps there is also connection from

the neuron onto itself. These are called autapses, and so this

will again be excititory, this will be inhibitory.

So you can see why, the connections cannot be symmetric, because you cannot

have excititory connection be plus, and the inhibitory connection also be plus.

It has to be a negative, or an inhibitory connection.

19:01

Here are the differential equations for our two neurons.

So here is the differential equation for the firing rate of the excitatory neuron,

here is the differential equation for the firing rate of the inhibitory neuron.

And these are all the different parameters.

The Excitatory connection from the neuron onto itself.

Here is the connection from the inhibitory neuron onto the excitatory

neuron, and so on. And you also see that we've added these

parameters for thresholds that we apply And then that in turn is passed through a

non-linearity, which is the rectification non-linearity.

And just to make things concrete, let's assign some values.

So these are some values for each of these parameters, for the connections and

the threshold. And then finally we will leave one

particular parameter. We're calling that tau i.

That is the time constant for the inhibitory neuron.

We will leave that unassigned, and we will vary this parameter to study the

behavior of this non-linear and non-symmetric recurrent network.

So how do we analyze the dynamics of such a non-linear and non-symmetric network?

Well, hold on to your eigenhats, because we're going to need to use eigenvectors

and eigenvalues again. To understand the dynamic behavior of

this network, we can perform linear stability analysis.

What does that mean? It means we can how stable the network is

near a fixed point. The fixed point is basically obtained by

looking at one of the values for vE and vI that make DvEdt and DvIDT go to 0.

So, when both of these are 0, then we have values for vE and vI which are

fixed, and which do not change as the function of time, and that would give you

a fixed point for this network. So, how do we perform Linear Stability

Analysis? Well we take the derivatives of the

right-hand side of both of these equations, with respect to vE and vI.

What we get then is a matrix, which is called the stability matrix, or if you

want to be cool, you can call it the Jacobian matrix.

Since the Jacobian matrix is not symmetric, the eigenvalues of the matrix

can have both real and imaginary parts. So the eigenvalues can be complex, and

these real and imaginary parts of the eigenvalues, in turn, determine the

dynamics of the nonlinear network near a fixed point.

So they determine whether the network is stable or not.

Now we've assigned values for all of the parameters except for tau I.

So what we can do now is choose different values for tau I, and this will in turn

cause different eigenvalues for J. And then we can look at the effect of the

different eigenvalues for J, on the stability and the behavior of this

nonlinear network. First, let's look at what happens when we

set tau I equal to 30 milliseconds. This makes the real part of the 2

eigenvalues for the stability matrix, negative.

And, as we show in the supplementary materials for this lecture, on the course

website, the real part being negative causes the network to be stable near the

fixed point. So here's a pictorial depiction of what

happens when we set tau I equal to 30 milliseconds.

So the x axis is vE, the y axis is vI. And so if we start out at some particular

location, which is some particular value for vE and vI.

Then the network essentially converges to the fixed point, which is the point at

which dve dt equal to 0, and dv1 dt equal to 0.

So both vE and vI are not changing at this location, in this particular plot.

Now if we look at what's happening as a function of time, you can see that both

vE and vI oscillate. And the oscillations are damped, and

eventually the oscillations are no longer there, and the network has converged to a

specific value for vE, and a specific value for vI, and that is the stable

fixed point of the network. This stable fixed point is also called a

point attractor in the terminology of dynamical systems.

Now look at what happens when you choose tau I to be 50 milliseconds.

That makes the real part of the eigenvalues for the stability matrix

positive. And as we show in the supplementary

materials for this lecture, when the real part of the eigenvalues turn out to be

positive, then the network is unstable. And so if you start out, in this plot of

vE and vI at some location, near the fixed points, so here is the fixed point.

And if you start out here with some value for vE and vI, then the network moves

away from the fixed point, and so the network is unstable, and diverges away

from the fixed point. But luckily, the rectification of

linearity comes to the rescue. How is that?

Well, as the value for vE tends to go negative, the rectification on linearity

stops it from going negative, and it puts it back on track.

And so we have the network looping around on this limit cycle.

24:28

Here's another way to look at this limit cycle.

So if you plug vE and vI as a function of time, then you'll observe that initially

the vE and vI values start to increase. But then, once you hit this rectification

non-linearity, then you have a stable oscillation.

So, both vE and vI start to oscillate in a stable manner, and that corresponds to

a going around on this limit cycle. So let's summarize what we saw in the

previous slide and in this slide. So when you change the parameter tau I

from 30 to 50 milliseconds, the nonlinear network made a transition, from having a

stable fixed point, to becoming unstable and resulting in a limit cycle.

In dynamical systems theory, such a transition is known as a half

bifurcation. Well, I think it's time now for our own

half bifurcation. That wraps up our journey into the land

of networks. Next week, we learn about how the brain

learns, by changing the connections between neurons in its networks.

Until then, adios and goodbye.