0:03

We'll finish up this week's material by

considering a final couple of model upgrades.

Let's go back and stare at this data again.

There are a couple of issues that we haven't yet addressed.

One is that we modeled only a time bearing firing rate.

And of course this data is in the form of spiked times.

In what sense the precise patterns of these spike trains might be

meaningful is something that we'll return to in a couple of weeks.

But in this section we'll address directly the hidden assumptions

of models, like the ones we've been developing, about the relationship between

that time varying firing rate, RFT, and the currents of single spikes.

And we'll try to deal with the fact that there does appear to be some

fine structure, here maybe, in the spike

trains that a smooth function RFT can miss.

0:46

But first we'll talk about the fact that

this data was produced by showing the retina a

natural movie and not white noise, which was

the stimulus that we used in our previous discussion.

In real life, neurons aren't living in a world

of white noise and it turns out that the

statistics of this stimulus that you used to sample

a model do affect that model that you arrive at.

So we choose to use white noise rather than

some more natural stimulus because no matter how you filter

it, it's always Gaussian, which means that there's no

special structure, no special directions in the stimulus set itself.

1:17

Since it's already come up and it will be coming up again let me just remind

you what a Gaussian function is. So it's defined as follows.

Some coefficient multiplied by this exponential factor,

which includes x minus some, some parameter.

X not squared divided by 2 sigma squared.

So here, x not is the center of this function.

And sigma is a measure of its width.

1:41

So for thinking about this function p of x, as a Gaussian

probability, distribution over x.

Then x is mean, x bar, which is the mean of x, is x naught.

1:55

And it's variance, defined as x minus

it's average, squared is equal to sigma squared.

So that standard deviation is just the square root of that, which is sigma.

2:11

Now, if you add together two or more Gaussian random numbers, the new random

number also has a Gaussian distribution and

that's just what you're doing by filtering.

Taking linear combinations of the values of

the white noise at different time points.

So with white noise, when we're using geometrical

techniques like PCA, we're making sure that we

have a stimulus that's as symmetric as possible

with respect to those coordinate transformations that filtering

give us.

There are no special stimulus dimensions that are built

into the prior, into the, into the stimulus ensemble itself.

2:46

Let's go back to the question that we posed last time.

When have we found a good feature?

When have we identified a good, a good filter, f?

We answered that by looking for the response function with respect to that

stimulus component f, an input output curve

that is interesting or has some structure.

3:04

So, recall, I showed you these, these two cases.

In this one, the Gaussian prior here of the distribution

of the filtered stimulus.

3:15

And the conditional distribution, so those values of the filtered stimulus

that are conditional on, on the arrival time of the spike.

In this case, those two distributions are very similar, so when we take

their ratio to compute the input output function, we get just a flat curve.

3:40

So, instead of taking the average or doing PCA

to find that filter, could we just go directly to

these quantities, to these, to these distributions, the prior and

the conditional distribution, and ask, can I find an F?

A choice of F, that when I project the stimulus onto it,

that the conditional distribution and the prior are as different as possible.

So, what would it mean to be

as different as possible?

There's a standard measure that we use for evaluating the

difference between two probability distributions,

and that's called the Kullback-Leibler divergence.

So here is the, the definition of the Kullback- Leibler divergence, DKL.

So here is the divergence between two distributions P of s and Q of s.

It's given by integrating over all the, all the random variables.

So, in this case s, so we integrate

over s.

P of s, multiplied by the logarithm of the ratio of those two distributions.

4:38

So what do we get if we use this DKL

between the prior and the spike conditional distribution as a measure

of the success of the choice of f, and just

try to find an f that maximized this this quantity directly?

5:00

So now, I'm taking some arbitrary stimulus distribution.

And here I've drawn it in a, you know, pseudo high dimensional space.

P of s is the, is the distribution of all possible stimuli.

We're going to take some filter, again a vector, in this high dimensional space,

that's f1, and project all of the stimuli onto it to compute the prior here in gray.

And now we'll project the spike-triggering stimuli which

here, we pictured in, in yellow to compute

the spike-conditional distribution, here in yellow.

And now, one can vary f around.

Right, so we can take different directions of this f.

And repeat this procedure and compute the DKL

between this prior and the spike conditional distribution.

Here's another example of a different choice of f, f2.

5:47

In that case our prior has a slightly different shape because

the stimulus distribution has a different shape in that direction and

the spike conditional distribution also has a different shape.

You can see that these two distributions are much more similar than these two are.

And so, we would prefer f1 as a better choice of our filter than we would f2.

And so one can move around in this space and keep evaluating these two

distributions, and look, search for an f

that maximizes the difference between those two distributions.

6:21

Now, this turns out to be equivalent to maximizing

the mutual information between the spike and the stimulus.

So we're trying to find a stimulus

component that is as informative as possible.

So observing a spike pins down our estimate for the stimulus much better

for the f1 component, in this case, than it does for the f2 component.

6:45

So notice that the stimulus here is no longer Gaussian, we mentioned that.

It's no longer a nice, symmetric ball, and I've draw it like that

because there's nothing about this technique that

demands that our stimulus be white noise.

Since this is a stimulus with some

arbitrary distributions, you can see both the prior

and the spike-conditional distributions and varying with

the direction of f, but that is okay.

7:06

The fact that this method can be applied to arbitrary inputs

means that this technique has been applied to derived models using natural stimuli.

So one can then take, take this to the next step and compute

the input-output function from the ratio

of the conditional distribution and the prior.

So it's a powerful technique.

It generalizes to, to complex stimuli.

7:40

So to summarize, we saw how to build a model

with a single filter, by taking the spike triggered average.

We saw that we could generalize that to multiple filters using PCA.

And, finally we introduced an information theoretic

method that uses the whole distribution of stimuli

to compute and optimal filter and this light

less method removed the requirement for Guassian stimuli.

8:16

So to go from r of t to spikes, the assumption that we'll be making is that

every spike is generated independently with the probability

that scaled by that time variant r of t.

What does this mean and how can we test it?

8:32

Let's start from the most elementary random process, the flip of a coin.

Says probability 1/2 of landing heads, probability 1/2 of landing tails.

Now, let's take a biased coin, it only has some small

probability, p, of landing heads up, and that;s when the system spikes.

8:50

So now we can think of the arrival times

of spikes as,as obeying something as simple as that.

We have some time, t.

We divide it into many time bends of size delta t.

Let's say there's n of them, right, n is t over delta t.

9:16

Now we'd like to know how many spike will occur in the total time t?

This is, of course, a random number.

It will vary on every trial.

This random number has what's called a binomial distribution.

Binomial meaning two value and those two values have

the probability firing p and the probability of not

firing one minus p.

9:45

How do we compute this?

All we need to do, is count, what's the probability that there's a spike

at exactly k bends, it's the probability, bend by bend, that a spike occurred.

So, we need probability to the power k.

9:59

And then the probability that a spike didn't

occur in the remaining bends, so 1 minus p.

How many bends did a spike not happen in, that's n minus k.

And we don't really care which of the k bends it occurred in,

so we need to count up the number of different ways that we could arrange those

k spikes among the, among the n bends. And that's a quantity often called

n choose k, and we can write that as n factorial,

over k factorial n minus k factorial. Where factorial,

let's give an example, three factorial is three times two, times one.

So n factorial is n times n minus one, times n minus two all the way down to one.

11:11

Now, in the limit that there are many time bends and the probability of a spike

in any bend becomes very small, one can

show that the binomial distribution has a limit.

That's the following form.

So we go from that distribution that we just arrived,

in the limit of very small time bends and now where

we set a parameter r, which is the probability in

a time bend, divided by the size of the time bend.

So the probability for a given time bend is going to be

coming very small as the time bend size becomes very size small.

So what we want to do is set some parameter r, such

that that parameter stays finite as the time bend gets very small.

And so that's the rate or probability per unit of time.

11:54

So now that becomes our parameter in this distribution.

So one can start with that previous distribution of the binomial distribution,

do some calculations and end up with with an expression like this.

Some of you might like to try that for

yourself or perhaps look it up on, on Wikipedia.

12:12

This new distribution is called the Poisson Distribution.

I've sub scripted it now, not by the number of bends but by the total

time, t, as we again assumed that we've

taken limit where delta t becomes very small.

12:26

So what are the properties

of the Poisson distribution?

It has a mean of r times t, which hopefully feels intuitive.

The number of spikes is the rate times the total time, slightly less intuitive.

So it has a variance that's given by r times t.

So you might notice that that's the same as the mean.

That is a very unusual propedate, and because of that, a quantity called the

Fano factor, which is the ratio of the mean to the variance, has become

a way to test whether a distribution is Poisson or not.

If it has a value of one, then it's Poisson.

13:01

Finally false spikes have been generated through a Poisson process,

which fundamentally expresses the idea we started from, which is

that they're generated in every time bin, delta t, as

though they were independent with the probability r times delta t.

13:16

Then they'll also have the property that the

intervals between successive spikes has an exponential distribution.

You can gets some intuition for why this is by considering this distribution

above but, evaluated just for one spike as a function now of the time, t.

You'll see the appearance of the exponential, and the factorial goes away.

So comparing between them, the interval distribution

doesn't have this factor t out the

front because it has to be normalized

over all time while the expression above doesn't.

13:46

Now, the probability of seeing 5 spikes in a

chunk of time, t, depends on the firing rate in

this way, this is the Poisson distribution.

13:56

So these are two strong characteristics of a Poisson distribution.

One, that the final factor is 1.

And, second that the interval distribution should

look like an exponential distribution of times.

14:10

So here are some examples of the Poisson distribution

for a few different choices of the firing rate.

For low firing rate, the distribution is almost exponential, whereas as the rate

gets higher, the Poisson distribution looks more and more Gaussian.

Now in general, the rate is varying as a function of time.

So if we want to see if this idea is

reasonable by looking at data, we need to allow r,

the rate, to vary in time.

Here is a data from a neuron in monkey MT cortex, which is sensitive to motion.

The monkey is watching the variable patterns drift across the screen

and we're going to look in more detail at this experiment next week.

The same pattern is being shown over and over again.

15:08

Now if you split the data up into these

little windows of time and plot the main number

of spikes in a time bend against the variance

in that time bend what would you expect to see?

In every bend, if the spikes of Poisson but with

a different rate, you could plot the rate against the variance.

What would you expect?

Remember that the slope of that plot would be the Fano factor.

So it expected, if it were Poisson, to have a constant slope of about one.

And in the data you see that, that is, that is very close to being true.

Here is the line, the line of slope 1.

You see that the data is very close to that.

So, even though the firing rate is changing in

each short time chunk, the cell's response looks Poisson.

15:53

Where does this kind of variability come from?

It's likely that while the neuron is receiving a mean input that's

proportional to the stimulus, it's also receiving a barrage of background input.

Remember that a cortical neuron gets inputs from around 10,000 other neurons.

If that input is balanced, that is, if it varies around zero, to

be both positive and negative, it won't add much to the average firing rate.

But it will jitter at the spikes.

16:22

For example, here's the behavior of a

neuron model, that's driven by white noise.

It also looks very close to Poisson, in the sense

that the interspike interval distribution looks very close to exponential.

I've emphasized that by plotting the number

of intervals in log, against the interval itself.

Which, if it's an exponential distribution, should look like

a straight line with a negative slope, given by

the firing rate.

17:00

At short intervals, the distribution stops looking exponential.

This is for the very good reason that

a neuron is unable to fire arbitrarily rapidly.

There are bi-physical processes that

prevent a neuron from firing immediately after

an action potential, and you see here that's

caused a gap of maybe a minimum of

10 seconds, in this case, between successive spikes.

17:21

So we're going to talk about those processes in a few weeks from now.

So, we might want to improve our model yet

more, by taking these intrinsic limitations in firing seriously.

This can be very helpful, as these intrinsic processes going on inside

the neuron, might add quite a bit of structure to the spike trains.

For example, there may be some resonance such the neuron likes to

fire at a certain frequency independent of the fluctuations of the stimulus.

17:48

So these intrinsic effects can be built into coding models.

They're elaborations of the ones we've

been looking at called generalized linear models.

Here the setup is very similar, the stimulus comes at similar, the

stimulus comes in, is filtered through

some feature, processed through a nonlinearity.

Here the nonlinearity is drawn as exponential.

I'll talk about that in a minute.

And there's, then there's an explicit spike generation step,

explicit Poisson spike generation step.

18:16

If generation of the random process generates

a spike, then a so-called post-spike filter,

drawn here, is injected back in to the input that's going into the nonlinearity.

18:30

So, of example, if the system is refractory what you'd want for

this waveform is that it would quickly move you away from threshold and

hold you away from it for some time, so you

want a big negative pulse that might decay back over time.

So we might want to add in something like this that decays back over time.

18:57

The one that's drawn here, taken from this, this

very nice paper, is a little bit more sophisticated.

It first draws the neuron away from spiking, with

a big initial dip, so it has the refractory property

built in, but then it becomes positive, which is going

to promote spiking at some time after the previous spike.

So that could give a neuron that has a

slight tendency to fire periodically which is very nice.

So the spiking probability is now proportional to

an exponential of the filtered stimulus as before,

plus the filtered spiking activity as we've

drawn, as we've written out right here.

19:33

So why this exponential non-linearity?

In the models that I've shown before,

we've allowed the non-linearity to be something

that we've computed directly from the data

whereas here it's fixed as a non-linearity.

Liam Paninsky showed that by fixing the non-linearity to be exponential, or to

be in the exponential family, you become able to, you become able to find

all the parameters of this model, all the values of

these filters using an optimization scheme that's now globally converted.

So you've sacrificed some generality for a model that's more complete in another way.

You get more power in that you can add more, more filters and it's guaranteed to

be solved reliably and repeatably. So if we're going on adding additional

factors to what can influence the spiking probability, why, why stop at that?

As Emery Brown and colleagues pointed out, one

can also include many other intrinsic and extrinsic factors.

In this paper, the group included the influence, not

only of refractory effects, but also of the firing of

other neurons in the network and applied this to

the type of data that you saw from the retina.

So including both self firing,

the output of the neuron itself, and also the effects of the

firing of other neurons, they allowed them to predict the spike patterns.

So they got, they were able to captured these detailed

spike interval patterns that we saw in the retinal data,

21:07

So, I'll finish up with another beautiful idea from, from Emory Brown's group.

We can use this Poisson nature of firing to test whether we

have captured everything that we can about the inputs in our model.

Let's say we have a model like the GLM, where

the output depends on many influences, on the stimulus, on

the history of firing in the neuron that recoding from

on the history of firing in, in other neurons as well.

Then we can our

output spike intervals and scale them by

the firing rate that's predicted by the model.

So we take these intervals times between successive spikes, we scale them by the

firing rate that our model predicted given

all the interactions that, that we've incorporated.

21:49

If this predicted rate does truly account for all the influences on

the firing, even ones due to previous spiking, then these new scaled

intervals should be distributed like a pure Poisson process, with an

effective rate of one, that is as a single clean exponential.

So this is called the Time-rescaling theorem and it's used as a way to test

how well one has done in capturing all the influences on spiking with ones models.

22:18

So, we've reached the end of this stretch.

We've looked at some classical, and some more modern ways, of thinking

about what spikes represent and how one can predict them from, from data.

I'd like to emphasize that some of these models and methods

are a very powerful way of thinking about the neural code.

But there is a lot that they ignore.

These models, in particular, give the impression that

neurons represent a particular thing, and that's it.

In fact,

neural responses are modulated by many other influences, by

how the animal is using it's body to deploy

it's senses, to what it expects to see in

the environment, by the context in which the stimulus appears.

We'll have a look at one example of such influences in the later lecture.

But you should also, always keep in mind that while I'm trying

to give you an overview of current approaches to understanding the brain.

And these methods have made huge progress in allowing us to make sense

of a lot of data even if under rather limited circumstances.

It's likely that some of these ideas might

be overturned completely with a much more general approach.

So the field is still really wide open to new ideas

and concepts that will provide a richer and a more powerful understand.

23:24

So to wrap up, I know this week has started

to exercise maybe some math muscles that might be rusty.

So please refer to the supplementary materials online to see if there's

anything that can help you, and do hit the forums.

There are a lot of knowledgeable people among you, and

it's great to see questions being answered and discussions developing there.

And, of course, our team is standing by, ready to pitch in and to help, as well.

For next week, I hope you'll join us again as

we start to learn how to use decoding to read minds.

Back next week.