So at the end of the first section, we posed a number of different questions about antibiotics, the main one being, how is it, that we sequence antibiotics? We're going to get to that, but let's take a detour to learn about another question, which is, how is it that bacteria make antibiotics? If we can learn about this, maybe we have some insights in terms of how to sequence them. We described an antibiotic peptide as a “mini-protein”. And the question then is, how is it that the cell produces proteins? So, it starts at the level of DNA. This is one of the many reasons why DNA is so important. DNA is a double-stranded molecule and it gets transcribed into RNA, which is a single stranded molecule. So, this is a complex molecular procedure that for the most part we're not going to get into, but in terms of strings how we think about this is that RNA has the same bases, or similar bases, as DNA. And so the G, for guanine, the A for adenine, and the C for cytosine, are not going to change. However, the T, which represents thymine in DNA, gets represented instead by the base U, or uracil, in RNA. After this method, which is called transcription, DNA goes into RNA, RNA is then translated into peptides, or proteins. So, here we have four nucleotide bases, adenine, cytosine, guanine, and uracil, and I mentioned that there are 20 common amino acids. So, here is the table of amino acids, along with their three letter codes. And, unlike the transcription of DNA into RNA, where we had four letters in each case, there's not a natural one-to-one correspondence here. So, our question is, can we translate two nucleotides at a time? Could we take all 2-mers, AA, AC, AG, AU, and can we map those, into proteins? The answer is going to be no, right? We don't have enough, there are going to be 16 of these 2-mer nucleotide pairs. So we're going to need at least three nucleotides at a time, and that's going to give us enough. And so this motivates the definition of a codon, which is just a triplet of nucleotides, and the “genetic code” is the term that we use for the assignment of codons into single amino acids, in order to make proteins. Here's an example of one codon, that gets set to one base. So tryptophan is always encoded by UGG. Usually what happens is that you see six different codons or not necessarily six different, but multiple different codons. So here we have six different codons, going into lysine [sic], or L. You also have three special codons called stop codons. These serve as messages to halt translation. They say "stop translation here and don't continue translating". These methods of transcription and translation, represented by beautiful molecular processes that we encourage you to learn more about but that we're interested on a computational level, form what are called the “Central Dogma of Molecular Biology”. So we start with DNA, it gets transcribed into RNA, that gets translated into protein. The “Central Dogma of Molecular Biology”, this term was coined in 1958 by Francis Crick, of Watson & Crick fame. I always thought, “dogma” is kind of a strange word to use because “dogma” in English means, “a belief that cannot be questioned or overturned in any circumstance” and it's usually used in a religious context and that's usually not a word that you see in science. And Crick admitted years later that he had made a mistake. He realized that he should have used a synonym that was more appropriate like "principle”, but he used the wrong word and now it's something that we're stuck with, and we should have a hint that because we've used this strange word “dogma”, there may be some irony there… …maybe this dogma doesn't necessarily hold all the time. So, assuming that it does hold, our goal is, we know a lot about genomes, and if we have sequenced genomes, can we look for the “mini protein” of Tyrocidine B1 in the genome? So, can we find a 30-mer in this Bacillus brevis genome that transcribes into RNA and then translates into the peptide of length 10 that's represented by Tyrocidine B1? Unfortunately, there are going to be thousands of different 30-mers that could do this. So, I'll show three here. Notice that each of these three DNA strings gets translated into the same string of amino acids. And if we look at the three different DNA strings, we can notice that they're not very similar. So, there are thousands of different ones, and they're not very similar, so we can't really draw conclusions by trying to reverse the process of translation. Another thing to notice is that, translation is going to be able to start pretty much anywhere. By “anywhere” I mean, if you look at this DNA string, this DNA string here gets transcribed into an RNA string. Similarly, its reverse complement gets transcribed into an RNA string, running in the opposite direction. When you look at translation though, translation can start at GUG, in which case the translated amino acid is valine. It could also start at U though, so it could start and, reading off UGA, in which case, it's translated into a stop codon. And thirdly, it could be read as GAA, in which case it becomes translated into glutamine. So we wind up with six of these different “reading frames”, or ways of reading off the same molecule of DNA, three from this strand, running this direction, and three from the reverse complimentary strand, running in the opposite direction. We also have this additional fact that, the Tyrocidine B1 peptide, like many peptides of its kind, is actually cyclic. So, rather than just the one linear representation as a string of ten amino acids that I've given you, we get ten different representations, depending on where we start on the string. So if we start at valine, we get one representation, if we start at lysine, we get a completely different one. So then we come back to our question and we say, how many 30-mers in the genome of Bacillus brevis, are actually going to encode one of these linear representations? So, this is a question that we ask and we could go off and program it in a computer and have the computer run our algorithm. And then our computer would say, well actually there aren't any 30-mers in this genome that encode one of these linear representations, so that's weird… …and our conclusion then is that somehow, something weird is going on with this dogma. Somehow it's not getting translated from the DNA of Bacillus brevis. To learn a little bit more about this process, I mentioned that it's a beautiful molecular process, transcription and translation. Transcription is carried out by a molecule called RNA polymerase, and translation is carried out by an enzyme called a ribosome. an obvious idea then to say, is it the case that an antibiotic peptide is maybe outside of this dogma is to say, well, let's shut down the ribosome. If the ribosome is in charge of translation, let's shut it down and see if these mini-proteins are still getting produced. So in 1963, that's what Edward Tatum does. He inhibits the ribosome in bacillus brevis and he hypothesized that all translation would stop. But instead, the production of some peptides, including the tyrocidines (like Tyrocidine B1) keeps going. All right, so this is a major sign that the dogma doesn't necessarily hold in all cases. Six years later, Fritz Lipmann comes along and says, well, these tyrocidines are actually non-ribosomal peptides. They're synthesized, not by the ribosome, but by a completely different process, that relies on a giant protein called NRP synthetase. Now this giant protein is made up of a number of different modules, it's made up of ten different modules, actually, and each module is responsible for adding a single amino acid to this small peptide of length ten. So it's a huge protein made up of a number of different pieces, and each piece is responsible for just one amino acid. Once the amino acids are added, then the peptide is circularized.