In this segment, I want to give a few examples of stream ciphers that are used in practice. I'm gonna start with two old examples that actually are not supposed to be used in new systems. But nevertheless, they're still fairly widely used, and so I just want to mention the names so that you're familiar with these concepts. The first stream cipher I want to talk about is called RC4, designed back in 1987. And I'm only gonna give you the high-level description of it, and then we'll talk about some weaknesses of RC4 and leave it at that. So RC4 takes a variable size seed, here I just gave as an example where it would take 128 bits as the seed size, which would then be used as the key for the stream cipher. The first thing it does, is it expands the 128-bit secret key into 2,048 bits, which are gonna be used as the internal state for the generator. And then, once it's done this expansion, it basically executes a very simple loop, where every iteration of this loop outputs one byte of output. So, essentially, you can run the generator for as long as you want, and generate one byte at a time. Now RC4 is actually, as I said, fairly popular. It's used in the HTTPS protocol quite commonly actually. These days, for example, Google uses RC4 in its HTTPS. It's also used in WEP as we discussed in the last segment, but of course in WEP, it's used incorrectly and it's completely insecure the way it's used inside of WEP. So over the years, some weaknesses have been found in RC4, and as a result, it's recommended that new projects actually not use RC4 but rather use a more modern pseudo-random generator as we'll discuss toward the end of the segment. So let me just mention two of the weaknesses. So the first one is, it's kind of bizarre basically, if you look at the second byte of the output of RC4. It turns out the second byte is slightly biased. If RC4 was completely random, the probability that the second byte happens to be equal to zero would be exactly one over 256. There are 256 possible bytes, the probability that it's zero should be one over 256. It so happens that for RC4 the probability is actually two over 256, which means that if you use the RC4 output to encrypt a message the second byte is likely to not be encrypted at all. In other words it'll be XOR-ed with zero with twice the probability that it's supposed to. So two over 256, instead of one over 256. And by the way I should say that there's nothing special about the second byte. It turns out the first and the third bytes are also biased. And in fact it's now recommended that if you're gonna use RC4, what you should do is ignore basically the first 256 bytes of the output and just start using the output of the generator starting from byte 257. The first couple of bytes turned out to be biased, so you just ignore them. The second attack that was discovered is that in fact if you look at a very long output of RC4 it so happens that you're more likely to get the sequence 00. In other words, you're more likely to get sixteen bits, two bytes of zero, zero, than you should. Again, if RC4 was completely random the probability of seeing zero, zero would be exactly 1/256 squared. It turns out RC4 is a little biased and the bias is 1/256 cubed. It turns out this bias actually starts after several gigabytes of data are produced by RC4. But nevertheless, this is something that can be used to predict the generator and definitely it can be used to distinguish the output of the generator from a truly random sequence. Basically the fact that zero, zero appears more often than it should is the distinguisher. And then in the last segment we talked about related-key attacks that were used to attack WEP, that basically say that if one uses keys that are closely related to one another then it's actually possible to recover the root key. So these are the weaknesses that are known of RC4 and, as a result, it's recommended that new systems actually not use RC4 and instead use a modern pseudo-random generator. Okay, second example I want to give you is a badly broken stream cipher that's used for encrypting DVD movies. When you buy a DVD in the store, the actual movie is encrypted using a stream cipher called the content scrambling system, CSS. CSS turns out to be a badly broken stream cipher, and we can very easily break it, and I want to show you how the attack algorithm works. We're doing it so you can see an example of an attack algorithm, but in fact, there are many systems out there that basically use this attack to decrypt encrypted DVDs. So the CSS stream cipher is based on something that hardware designers like. It's designed to be a hardware stream cipher that is supposed to be easy to implement in hardware, and is based on a mechanism called a linear feedback shift register. So a linear feedback shift register is basically a register that consists of cells where each cell contains one bit. Then basically what happens is there are these taps into certain cells, not all the cells, certain positions are called taps. And then these taps feed into an XOR and then at every clock cycle, the shift register shifts to the left. The last bit falls off and then the first bit becomes the result of this XOR. So you can see that this is a very simple mechanism to implement, and in hardware takes very few transistors. Just the shift right, the last bit falls off and the first bit just becomes the XOR of the previous bits. So the seed for this LFSR basically, is the initial state of the LFSR. And it is the basis of a number of stream ciphers. So here are some examples. So, as I said, DVD encryption uses two LFSRs. I'll show you how that works in just a second. GSM encryption, these are algorithms called A51 and A52. And that uses three LFSRs. Bluetooth encryption is an algorithm called, E zero. These are all stream ciphers, and that uses four LFSRs. Turns out all of these are badly broken, and actually really should not be trusted for encrypting traffic, but they're all implemented in hardware, so it's a little difficult now to change what the hardware does. But the simplest of these, CSS, actually has a cute attack on it, so let me show you how the attack works. So, let's describe how CSS actually works. So, the key for CSS is five bytes, namely 40 bits, five times eight is 40 bits. The reason they had to limit themselves to only 40 bits is that DVD encryption was designed at a time where U.S. Export regulations only allowed for export of crpyto algorithms where the key was only 40 bits. So the designers of CSS were already limited to very, very short keys. Just 40 bit keys. So, their design works as follows. Basically, CSS uses two LFSR's. One is a 17-bit LFSR. In other words, the register contains 17 bits. And the other one is a 25-bit LFSR, it's a little bit longer, 25-bit LFSR. And the way these LFSRs are seeded is as follows. So the key for the encryption, basically looks as follows. You start off with a one, and you concatenate to it the first two bytes of the key. And that's the initial state of the LFSR. And then the second LFSR basically is intitialized the same way. One concatenated the last three bytes of the key. And that's loaded into the initial state of the LFSR. You can see that the first two bytes are sixteen bits, plus leading one, that's seventeen bits overall, whereas the second LFSR is 24 bits plus one which is 25 bits. And you notice we used all five bits of the key. So then these LFSRs are basically run for eight cycles, so they generate eight bits of output. And then they go through this adder that does basically addition modulo 256. Yeah so this is an addition box, modulo 256. There's one more technical thing that happens. In fact let's actually—also added is the carry from the previous block. But that is not so important. That's a detail that's not so relevant. OK, so every block, you notice we're doing addition modulo 256 and we're ignoring the carry, but the carry is basically added as a zero or a one to the addition of the next block. Okay? And then basically this output one byte per round. Okay, and then this byte is then of course used, XOR-ed with the appropriate byte of the movie that's being encrypted. Okay, so it's a very simple stream cipher, it takes very little hardware to implement. It will run fast, even on very cheap hardware and it will encrypt movies. So it turns out this is easy to break in time roughly two to the seventeen. Now let me show you how. So suppose you intercept the movies, so here we have an encrypted movie that you want to decrypt. So let's say that this is all encrypted so you have no idea what's inside of here. However, it so happens that just because DVD encryption is using MPEG files, it so happens if you know of a prefix of the plaintext, let's just say maybe this is twenty bytes. Well, we know if you XOR these two things together, so in other words, you do the XOR here, what you'll get is the initial segment of the PRG. So, you'll get the first twenty bytes of the output of CSS, the output of this PRG. Okay, so now here's what we're going to do. So we have the first twenty bytes of the output. Now we do the following. We try all two to the seventeen possible values of the first LFSR. Okay? So two to the seventeen possible values. So for each value, so for each of these two to the seventeen initial values of the LFSR, we're gonna run the LFSR for twenty bytes, okay? So we'll generate twenty bytes of outputs from this first LFSR, assuming—for each one of the two to the seventeen possible settings. Now, remember we have the full output of the CSS system. So what we can do is we can take this output that we have. And subtract it from the twenty bites that we got from the first LFSR, and if in fact our guess for the initial state of the first LFSR is correct, what we should get is the first twenty-byte output of the second LFSR. Right? Because that's by definition what the output of the CSS system is. Now, it turns out that looking at a 20-byte sequence, it's very easy to tell whether this 20-byte sequence came from a 25-bit LFSR or not. If it didn't, then we know that our guess for the 17-bit LFSR was incorrect and then we move on to the next guess for the 17-bit LFSR and the next guess and so on and so forth. Until eventually we hit the right initial state for the 17-bit LFSR, and then we'll actually get, we'll see that the 20 bytes that we get as the candidate output for the 25-bit LFSR is in fact a possible output for a 25-bit LFSR. And then, not only will we have learned the correct initial state for the 17-bit LFSR, we will have also learned the correct initial state of the 25-bit LFSR. And then we can predict the remaining outputs of CSS, and of course, using that, we can then decrypt the rest of the movie. We can actually recover the remaining plaintext. Okay. This is things that we talked about before. So, I said this a little quick, but hopefully, it was clear. We're also going to be doing a homework exercise on this type of stream ciphers and you'll kind of get the point of how these attack algorithms work. And I should mention that there are many open-source systems now that actually use this method to decrypt CSS-encrypted data. Okay, so now that we've seen two weak examples, let's move on to better examples, and in particular the better pseudo-random generators come from what's called the eStream Project. This is a project that concluded in 2008, and they qualify basically five different stream ciphers, but here I want to present just one. So first of all the parameters for these stream ciphers are a little different than what we're used to. So these stream ciphers as normal they have a seed. But in addition they also have, what's called a nonce and we'll see what a nonce is used for in just a minute. So they take two inputs a seed and a nonce. We'll see what the nonce is used for in just a second. And the of course they produce a very large output, so n here is much, much, much bigger than s. Now, when I say nonce, what I mean is a value that's never going to repeat as long as the key is fixed. And I'll explain that in more detail in just a second. But for now, just think of it as a unique value that never repeats as long as the key is the same. And so of course once you have this PRG, you would encrypt, you get a stream cipher just as before, except now as you see, the PRG takes as input both the key and the nonce. And the property of the nonce is that the pair, k comma r, so the key comma the nonce, is never—never repeats. It's never used more than once. So the bottom line is that you can reuse the key, reuse the key, because the nonce makes the pair unique, because k and r are only used once. I'll say they're unique. Okay so this nonce is kind of a cute trick that saves us the trouble of moving to a new key every time. Okay, so the particular example from the eStream that I want to show you is called Salsa twenty. It's a stream cipher that's designed for both software implementations and hardware implementations. It's kind of interesting. You realize that some stream ciphers are designed for software, like RC4. Everything it does is designed to make software implementation run fast, whereas other stream ciphers are designed for hardware, like CSS, using an LFSR that's particularly designed to make hardware implementations very cheap. It's also, the nice thing about that is that it's designed so that it's both easy to implement it in hardware and its software implementation is also very fast. So let me explain how Salsa works. Well, Salsa takes either 128 or 256-bit keys. I'll only explain the 128-bit version of Salsa. So this is the seed. And then it also takes a nonce, just as before, which happens to be 64 bits. And then it'll generate a large output. Now, how does it actually work? Well, the function itself is defined as follows. Basically, given the key and the nonce, it will generate a very long, well, a long pseudorandom sequence, as long as necessary. And it'll do it by using this function that I'll denote by H. This function H takes three inputs. Basically the key. Well, the seed k, the nonce r, and then a counter that increments from step to step. So it goes from zero to one, two, three, four as long as we need it to be. Okay? So basically, by evaluating this H on this k r, but using this incrementing counter, we can get a sequence that's as long as we want. So all I have to do is describe how this function H works. Now, let me do that here for you. The way it works is as follows. Well, we start off by expanding the states into something quite large which is 64 bytes long, and we do that as follows. Basically we stick a constant at the beginning, so there's tao zero, these are four bytes, it's a four byte constant, so the spec for Salsa basically gives you the value for tao zero. Then we put k in which is sixteen bytes. Then we put another constant. Again, this is four bytes. And as I said, the spec basically specifies what this fixed constant is. Then we put the nonce, which is eight bytes. Then we put the index. This is the counter zero, one, two, three, four, which is another eight bytes. Then we put another constant tau two, which is another four bytes. Then we put the key again, this is another sixteen bytes. And then finally we put the third constant, tau three, which is another four bytes. Okay so as I said, if you sum these up, you see that you get 64 bytes. So basically we've expanded the key and the nonce and the counter into 64 bytes. Basically the key is repeated twice I guess. And then what we do is we apply a function, I'll call this functional little h. Okay, so we apply this function, little h. And this is a function that's one to one so it maps 64 bytes to 64 bytes. It's a completely invertible function, okay? So this function h is, as I say, it's an invertable function. So given the input you can get the output and given the output you can go back to the input. And it's designed specifically so it's a- easy to implement in hardware and b- on an x86, it's extremely easy to implement because x86 has this SSE2 instruction set which supports all the operations you need to do for this function. It's very, very fast. As a result, Salsa has a very fast stream cipher. And then it does this basically again and again. So it applies this function h again and it gets another 64 bytes. And so on and so forth, basically it does this ten times. Okay so the whole thing here, say repeats ten times, so basically apply h ten times. And then by itself, this is actually not quite random. It's not gonna look random because like we said, H is completely invertable. So given this final output, it's very easy to just invert h and then go back to the original inputs and then test that the input has the right structure. So you do one more thing, which is to basically XOR the inputs and the final outputs. Actually, sorry. It's not an XOR. It's actually an addition. So you do an addition word by word. So if there are 64 bytes, you do a word-by-word addition four bytes at a time, and finally you get the 64-byte output, and that's it. That's the whole pseudo-random generator. So that, that's the whole function, little h. And as I explained, this whole construction here is the function big H. And then you evaluate big H by incrementing the counter I from zero, one, two, three onwards. And that will give you a pseudo-random sequence that's as long as you need it to be. And basically, there are no signifigant attacks on this. This has security that's very close to two to the 128. We'll see what that means more precisely later on. It's a very fast stream cipher, both in hardware and in software. And, as far as we can tell, it seems to be unpredictable as required for a stream cipher. So I should say that the eStream project actually has five stream ciphers like this. I only chose Salsa 'cause I think it's the most elegant. But I can give you some performance numbers here. So you can see, these are performance numbers on a 2.2 gigahertz, you know, x86 type machine. And you can see that RC4 actually is the slowest. Because essentially, well it doesn't really take advantage of the hardware. It only does byte operations. And so there's a lot of wasted cycles that aren't being used. But the E-Stream candidates, both Salsa and other candidate called Sosemanuk. I should say these are eStream finalists. These are actually stream ciphers that are approved by the eStream project. You can see that they have achieved a significant rate. This is 643 megabytes per second on this architecture, more than enough for a movie and these are actually quite impressive rates. And so now you've seen examples of two old stream ciphers that shouldn't be used, including attacks on those stream ciphers. You've seen what the modern stream ciphers look like with this nonce. And you see the performance numbers for these modern stream ciphers so if you happen to need a stream cipher you could use one of the eStream finalists. In particular you could use something like Salsa.