[MUSIC] Hello, and welcome back to Computational Neuroscience. This is week three, we'll be discussing decoding. How well can we learn what the stimulus is by looking at neural responses? We'll be covering a few different approaches, starting with some very simple cases in which one has to decide between one of two choices. Given the output of a single neuron and then to the case where one has a range of choices and has a few neurons that might be taking a vote on what the stimulus is. To finally thinking about how do we decode in real time to try to construct the whole time varying complex input that the brain might be absorbing. Or even ultimately the imagery of plans that the brain is concocting on its own. So, let's say you're walking in the park, and you heard a rustle. The rustle could be the breeze or there could be a tiger or a rabid racoon hidden there. You have to choose, do you stay or do you go? What does that look like mathematically? Or let's say we can arrange all possible bush rustling sounds along some axis. Some axis, s, these are s sounds. Some of them, clearly the breeze that many of them like somewhere in the middle. So on the basis of this evidence, on the basis of the sound that you heard, how should you decide what to do? Now imagine that all you had to listen to was your neurons, actually that is the case, but what if you only had one neuron or a small group of neurons? So that's the problem we'll be starting with today. Here's a classic experiment that set out to probe how noisy sensory information was represented by noisy sensory neurons, and how the animal's decision related to the neuronal representation. So here is he set up. A monkey would fixate on the center of a screen and watch a pattern of random dots move across the screen. The monkey's been trained that if the dots move for example upward, he should move his eyes or make a saccade upward into a location, and then where he'll get a reward whenever he moves his eyes in the same direction as the dot pattern is moving. So here's the difficulty. The dot pattern is noisy, and sometimes it's rather hard to tell which way there going. Moreover, the experimenters they want to change the difficulty of the task by making the dot pattern more noisy. They did that by varying the number of dots that are actually moving in the chosen direction. The rest are made to move randomly. So, one extreme, you have a stimulus, like this one, for which the dots are all moving together, so no noise, that's 100% coherence. At the other extreme, all the dots are moving randomly. And in this case there's, in fact, no correct answer, they're neither moving upward or downward. So let's take a look at what the neuron tells us. This data is taken from a neuron in NMT, a region in the monkey brain that's sensitive to visual motion. The experiment was repeating many times with different particular patterns of dots. And on every trial the number of spikes that the neuron produced was counted. The experimenters then made these histograms of the results, in black, you see a histogram of the number of trials, in which that number of spikes was counted from the neuron. And then in white, you see the number of trials in which that number of spikes was generated from the neuron. And the monkey made a saccade in the other direction. So these would be say, for upward choices, and these would be for downward choices. So now, the experimenters changed the coherence, and now what you see is that, as one might expect, these two distributions of upward versus downward choices, are moving closer together. There's less visual information that discriminates between left and right and correspondingly, the firing rates are more similar in response to those two different trials. If we look at another example where the coherence is almost zero, the motion signal, discriminating left from right, is very small, those two distributions are almost overlapping. And so given that one sees a firing rate, one response, one trial from this neuron when trying to make a decision, how should one decode that firing rate in order to get the best guess about whether the stimulus was moving upward or downward? So here's how the monkey does on the task. The fraction that he or she gets right is a function of the noise level, it's a function of the coherence in the dot pattern. Those are the dots here in black. And open suckles is how a single neuron does. And amazingly, it's very similar. So how does one go from that distribution of firing rates that we saw in the last step to this measure of performance? So this requires a decoding step. So here, we have distributions of responses, so let's take a cartoon of the data we just saw. This is as a function of r, the probability of response given that the stimulus was upward moving, we show in red, the probability of the response given that was downward moving, we show here in blue. And these are the averages, r- and r +. Decoding means that we’d like a policy that tells us if we see some value r, we can map the stimulus unto either an upper going or downward going stimulus. So what should we do in this case? We’d like to map some range of r to upper going stimuli, and that means setting a threshold, I’ll let you guess where it should go. Hopefully you intuitively chose here. Why? This choice of threshold, z is the one that maximizes the probability that you get it right. With that threshold how you going to do? The probability of a false alarm, of calling it upward when it was in fact downward is going to be the area under this curve. These are all the cases where the stimulus was in fact going downward, but the response was larger than our threshold, z. So this is the probability of a response being greater than or equal to z when in fact the stimulus was going down. Whereas the probability that you got the call upward right is going to be the area under this code. These are all the cases where the stimulus was going upward, and you in fact, answered upward because r was greater than that threshold. So this is the probability of r being greater than or equal to z, given that the stimulus was, in fact, plus. So this choice of z maximizes the total probability of being correct. P correct, that is the probability that this stimulus was in fact upward, multiplied by the probability that you called it upward, probability that the response was greater than or equal to z given that it was going upward. Plus the probability that was in fact going downward, so that's now going to be 1 minus probability of response being larger than or equal to z given that it was minus. So the number of False alarms is this probability, the number of Good calls is this probability, and this choice of z maximizes the total probability correct. The conditional probabilities, p(r|-) and p(r|+) are also known as the likelihood, they measure how likely we are to observe our data r our fine rate given the cause of the stimulus. So notice that what we're doing by choosing z word is, we're choosing value of the stimulus for which the likelihood is largest. Now walking along these curves and if this probability, the response is downward is the larger, will map those values of r to minus. And once we've crossed over this point, now the probability of response being positive is larger and will map all of these values to plus. Alternatively, we can think about this as putting a threshold on the likelihood ratio itself. So what we are really doing is saying that we want the likelihood ratio p(r|+)/p(r|-) to be greater than 1 whenever we choose plus. So it turns out that the likelihood ratio test is the most efficient statistic we can use to analyze our data, in that it has the most power for a given size. This is called the Neyman-Pearson lemma. The data showed that there's a close correspondence between the decoded neural response and the monkeys behavior itself. Which of course raises the question, why then do we need all these neurons, especially in cortex when many of them seem to be doing approximately the same thing? I think that's a good mystery for you to ponder and we'll hit on one of the answers later on. Now let's say we're able observe outputs from the unknown source for quite a while. So we should be able to use that extra information to set our conference threshold quite high, assuming that in every time pin, everyone timely we're getting an approximately independent sample. We can now accumulate evidence in favor of one hypothesis over the other. So let's say, we observe some particular noise, say here. So, it's most likely as we can see, to be due to the breeze. So, what's our evidence in favor of a tiger? That's going to the likelihood ratio l of that observation s, which is the probability of s given a tiger over probability of s given that it was the breeze. So in this case, this likelihood ratio is going to be less than 1. So now when we accumulate this evidence over time, every sample is independent, we're really multiplying this probabilities together. So, instead let's take the log and sum them. So let's start, we'll start here at zero, at time t equals 0, we don't have any evidence either way and start to take observations. Now, if the likelihood of the breeze is higher, that makes our sum go down. So now though, if the likelihood in this case was less than 1, then the log of that likelihood is negative, less then 0. And so that's going to give us a negative blip in this sum. And now if we get another observation that's also has a negative look likelihood, but now we might hear some growly sound but suddenly takes us in favor of a tiger, but then no, it was just a rustle. And so, similarly we'll just keep taking observations until at some point, we'll be completely confident given our sequence of observations that will hit that band. That we will hit a band and say, at this point, for sure given all of my observations I'm willing to say that that's the breeze. So here is some evidence for such a process taking place in the brain. In this task, the monkeys are doing almost the same tasks that we saw earlier. They're viewing a pattern here, so they fixate and they start to see a pattern of moving dots. And they have to indicate which direction the dots are moving in. Here the directions are left and right. What's different about this task is that the monkeys can respond whenever they want. They are under some time pressure to respond quickly because they get a reward when they answer, and if they take too long they get a time out where they can't get any juice for a while. So now the recording in this case were made not from NT but from area lateral interperital cortex or LIP. This area is part of the circuitry for planning and executing eye movements. And now, when a neuron was found, the region in space to which it was sensitive was located. And that was chosen as the place to which the monkey had to move his eyes, or saccade, to show that he understood which direction the dots were moving in. Now, let's look at the neuron's firing rate. Aligned to the onset of the trial, the firing rate gradually wraps up through the course of the trial. The different color curves correspond to experiments with different coherences on motion strengths. So which would you guess is the strongest motion strength? I'm sure you chose the dark brown one and that's correct. It seems that when the evidence is strong, the firing rate increases fastest, suggesting that the firing rate represents that integrated evidence, for in this case right with motion. When the evidence is weak it accumulates more slowly. So now let's see the trials aligned at the time of the saccade. It's really cool to see that these firing rates seem to peak at a common point. And this has been interpreted at exactly that bound that we thought before, that the firing rates wrap up until they reach some threshold of confidence, at which point the monkey is willing to make a move to signal his decision. So let's go back to our single neuron, single trial readout case. We use the likelihood ratio to tell us what value of the sound should be interpreted as a tiger, but straight away, you probably realize that this is not the smartest way to go. After all, the probability that there actually is a tiger is very small. So, if we're thinking correctly, we should include in our criterion the fact that these distributions don't generally have the same weight. They should be scaled up and down by the factors, the probability of the breeze, and soon by the probability that there was in fact, a tiger. That means that we need to take into account the role of priors. These prior probabilities that these stimuli were in fact present. Here's a very specific example where biology seems to build in that knowledge of the prior explicitly. This is work from the lab of Fred Rieke, who will be presenting a guest lecture this week about this intriguing and beautiful result. But I'll summarize very briefly for you now with some cartoons. Some rods in the retina, these cells that collect light, are capable of responding to the arrival of single photons. So what you're seeing here is a current recorded from a photoreceptor. And you can see these photon arrival events here as these large fluctuations in that current. You also see that there's a lot of background noise. Here's a sample of noise. If we make a distribution of that noise, it has some width. If we now consider cases in which there was a photon arrival event, that has another distribution that's separated somewhat from the noise but not entirely because the amplitude of this noise is quite large compared to the signal. So if you set down stream from the photoreceptor and want to know when one of these events occurred, how should you set a threshold so that you can catch as many of these events as you can without being overwhelmed by the background noise? Our signal detection theory understanding suggest that we should put the threshold at this crossing point with the distributions. However, what does biology do, Biology that is, in the form of the synapse that takes the signal from the photoreceptor to the bipolar cell. Instead it sets the threshold way over here. So what's going on is, at this light level, these photon responses in any one photoreceptor are very rare, so that most of the time, the fluctuations are due to noise. If one takes that into account, that is the prior probability of signal and of noise, then the two distributions now look more like this. Now the crossing point is way over, and the response properties start to look a lot more sensible. This cover of Nate Silver's book neatly summarizes what's true for many important decisions. There's a small amount of signal in the world, as in the case of the photoreceptive current, and an awful lot of noise relative to any particular decision for the same reasons as we discussed in our last lecture. A given choice establishes a certain set of relative stimulus aspects and all other information, which may be very useful information for other purposes, becomes noise. In deciding whether to invest energy in reacting, you're not running away from the tiger, calling in the bomb squad to detonate a shopping bag, asking a girl for a date, the prior probability isn't the only factor. One also might want to take into account the cost of acting or not acting. So now let's assume there is a cost, or a penalty, for getting it wrong. You get eaten, the shopping bag explodes. And the cost for getting it wrong in the other direction, your photo gets spoiled, you miss meeting the love of your life. So how do we additionally take these costs into account in our decision? Let's calculate the average cost for a mistake, calling it plus when it is in fact minus. We get a loss which we'll call L minus, penalty weight, and for the opposite mistake, we get L plus. So our goal is to cut our losses and make the plus choice when the average loss for that choice is less than the other case. So we can write this as a balance of those average losses. The average or the expected loss from making the wrong decision, for choosing minus when it's plus is this expression, the weight for making the wrong decision multiplied by the probability that that occurs. And now we can make the decision to answer plus when the loss for making the plus choice is less than the loss for the minus choice. That is, when the average loss for that decision is less than the average loss in the other case. So now, let's use base rule to write these out. So now have L + P(r|-) P(r|-) divided by P(r), all that to be less than the opposite case, P(r|+)P(r) divided by the probability of response. So now you can see that when we cancel out this common factor, the probability of response, and rearrange this in terms of our likelihood ratio, because now we have here the likelihood. The probability of response given minus, on this side the likelihood for the probability of response given plus, we can now pull those factors out as the likelihood ratio and now we have a new criteria for our likelihood ratio test. Now one that takes these loss factors into account. That's where we are going to stop for this section. In the next few sections of this lecture we'll be talking about methods for decoding from populations. The very next point that we'll pick up is what is called the population vector, a way of allowing many neurons to vote for a given stimulus outcome.