So we've now seen that the algorithm that we have, this branch and bound algorithm, is not going to perform well on noisy data sets. And as soon as we go from 18 amino acid masses to 144, we're going to start to really struggle with using this algorithm. We're not going to give up on it, we just need to figure out if we can find more information that's in that spectrum. Is there something that the spectrum is telling us that we just haven't been listening to? So our goal here is to say, we've got to get the number of amino acid masses that we consider down. We've got to reduce these. So, we've got to go from 144, wouldn't it be nice to knock that down to 50, or 20 or even 10, in a great scenario? Let's recall this spectrum that we had, this hypothetical experimental spectrum. And let's just see what we notice about it. This is a great problem solving strategy, where if you're stuck on a solution, one of the most common things to do is: consider a small example. Look and see what that small example is telling you, and see if that maybe generalizes. So here's our small example. It's just going to be for a peptide of length four. Then we're going to say, okay, well, we can't detect the mass of E, it's not in this experimental spectrum, so we're not going to identify that E is one of our letters, one of our amino acids that way. But…hold on. We do detect 257 which is the mass of QE. And we do detect 128 which is the mass of Q. If you subtract these two masses, you should get the mass of E. E is what's left over with QE and Q. So when you subtract them, you get 129. So that's good. Can we find it anywhere else? The answer is yes. ELN is another subpeptide of this cyclic peptide NQEL, as is LN. And we notice that the masses of these two are in the spectrum as well. We get 356 minus 227 and that gives us 129. We even get one more. The mass of the entire peptide is 484. And the mass of LNQ, the subpetide that doesn't include E, is 355. And when we subtract those we get 129. So 129 is still there, right? It may not be on the spectrum, but if we start subtracting masses that are in the spectrum, then we're going to get three occurrences of 129. This is going to motivate our definition of what's called a “spectral convolution”. So the convolution of a spectrum. It's the positive difference between every pair of masses in the spectrum. So here is the same spectrum that we had, we represent the masses along here, and along here, and we go through and we subtract every pair. We have 227 here as the mass of LN. We don't know it's the mass of LN, but we have 227. And here we have the mass detected of 370 and when we subtract them, we get 143. And we can do this for every pair and we form this table. Now, I've highlighted certain masses, and I'll tell you why I've done that. The reason is, we want to ask, what are the most frequent elements between 57 and 200? Because remember with E, we picked it up in a number of different places, we picked it up three different times. So we want to ask ourselves, what are the most frequent elements between 57 and 200, and there's a good chance that's going to give us what the amino acids should be. The five most frequent elements that I highlighted are: 99, 113 114, 128 and 129, and when we convert those from the integer mass table, we see that that's V, L, N, Q, and E. They correspond to those letters. If we take the five most frequent elements, four of them are going to give us the amino acids of NQEL. This is really promising on this small example. So now we have the outline of an algorithm called ConvolutionCyclopeptideSequencing, and the first thing that it's going to do is to say OK, given our experimental spectrum, let's form the spectral convolution of it. Then we're going to look at that spectral convolution and say, “what are the most frequent elements in it”? We may choose this parameter, capital M, taking the M most frequent elements between 57 and 200, so M may be 50 or 20 in practice. And then we're going to run our leaderboard algorithm using only those top M masses that we found between 57 and 200 in the spectral convolution, and we're going to form our peptides only on those integer masses. The question then is, does this algorithm really work? That's the question that we constantly ask, right? And so let's consider Spectrum10. Remember, when we expanded our alphabet, the leaderboard method didn't work for Spectrum10. So, let's see if it works for Spectrum10 now that we're dealing with the convolution. So the first step is, take the convolution. Then we pick the ten most frequent elements in that convolution, which I'm showing here on the right. Most of these are actually standard masses or correspond to standard amino acids. So 145 was nonstandard but the other ones were identified as standard. Then we run the algorithm on only these amino acids,so these are the only amino acids that will be allowed in our peptide, and the winning peptide is Tyrocidine B1 and we should be happy. So it reconstructed the correct peptide just from taking the ten most frequent elements in the spectral convolution, so it worked really, really well. Well then we ask ourselves, OK, what about Spectrum25? What about this noisier spectrum that none of our algorithms has been able to correctly reconstruct Tyrocidine B1 from? What happens if we apply ConvolutionCyclopeptideSequencing to that really noisy spectrum? And when we do it, it's good news. We can reconstruct Tyrocidine B1 again and so now we can start the party. But I do want to mention one more thing about practical considerations, and I'll do that in the next section.