In this segment we will look at how to use block ciphers to encrypt multiple messages using the same key. This comes up in practice for example in file systems where the same key's used to encrypt multiple files. It comes up in networking protocols where the same key is used to encrypt multiple packets. So let's see how to do it. The first thing we need to do is to define what is mean for a cipher to be secure when the same key is used to encrypt multiple messages. When we use the key more than once the result of that is that the adversary gets to see many cyber text encrypted using the same key. As a result, when we define security, we're gonna allow the adversary to mount what's called a chosen plain text attack. In other words, the adversary can obtain the encryption of arbitrary messages of his choice. So, for example, if the adversary's interacting with Alice. The adversary can ask Alice to encrypt arbitrary messages of the adversary's choosing. And Alice will go ahead and encrypt those messages and give the adversary the resulting cipher texts. You might wonder why would Alice ever do this. How could this possibly happen in real life? But it turns out this is actually very common in real life. And in fact, this modeling is quite a conservative modeling of real life. For example, the adversary might send Alice an email. When Alice receives the email, the writes it to her encrypted disk, thereby encrypting the adversary's email using her secret key. If later the adversary steals this disc, then he obtains the encryption of an email that he sent Alice under Alice's secret key. So that's an example of a chosen plain text attack, where the adversary provided Alice with a message and she encrypted that message using her own key. And then later the attacker was able to obtain the resulting cipher text. So that's the adversary's power. And then the adversary's goal is basically to break semantic security. So let's define this more precisely. As usual, we're gonna define semantic security under a chosen plain text attack using two experiments, experiment zero and experiment one, that are modeled as a game between a challenger and an adversary. When the game begins, the challenger is gonna choose a random key K. And now the adversary basically gets to query the challenger. So the adversary now begins by submitting a semantic security query, namely, he submits two messages, M zero and M one. I added another index, but let me ignore that extra index for a while. So the adversary submits two messages, M zero and M one, that happen to be of the same length. And then the adversary receives the encryption of one of those messages, either of M zero or of M one. In experiment zero, he receives the encryption of M zero. In experiment one, he receives the encryption of M one. So, so far this would look familiar this looks exactly like a standard semantic security [inaudible]. However, plain text attack the adversary can now repeat this query again. So now you can issue a query with two other plain texts, again of the same length, and again you would receive the encryption of one of them. In experiment zero you would receive the encryption of M zero. In experiment one you would receive the encryption of M one. And the attacker can continue issuing queries like this. In fact we'll say that he can issue up to Q queries of this type. And then, remember, every time he issues a pair of messages. That happen to be of the same length and every time he either gets the encryption of the left side or the right side again in experiment zero he will always get the encryption of the left message in experiment one he will always get the encryption of the left message. And, then adversary's goal is, basically, to figure out whether he's in experimental zero or in experiment one. In other words, whether he was constantly receiving the encryption of the left message or the encryption of the right message. So, in some sense, this is a standard semantic security game just iterated over many queries that the attacker can issue to adaptively one after the other. Now the chosen plain text attack is captured by the fact that if the attacker wants the encryption of a particular message m. What he could do is, for example, use query J for sum J, where in this query J he'll set both the zero message and the one message to be the exactly same message M. In other words, both the left message and the right message are the same, and both are set to the message M. In this case, what he will receive, since both messages are the same, he knows that he's gonna receive the encryption of this message M that he was interested in. So this is exactly what we meant by a chosen [inaudible] attack. Where the advisory can submit a message m and receive the encryption of that particular message m of his choice. So some of his queries might be of this chose plain text flavor where the message on the left is equal to the message on the right, but some of the queries might be standard semantic security queries where the two messages are distinct and that actually gives him information on whether he's in experiment zero or in experiment one. Now by now you should be used to this definition where we say that the system is semantically secure under a chosen plain text attack. If, for all efficient adversaries, they cannot distinguish experiment zero from experiment one. In other words, the probability that, at the end, the output, B Prime, which we're gonna denote by the output of experiment B. This output will be the same whether [inaudible] experiment zero or experiment one. So the attacker couldn't distinguish between always receiving encryptions of the left messages, versus always receiving encryptions of the right messages. So in your mind, I'd like you to be thinking of an adversary that is able to mount a chosen plaintext attack, namely, be given the encryption of arbitrary messages of his choice, and his goal is to break semantic security for some other challenge cipher texts. And as I said in this [inaudible] model of the real world the attacker is able to fool Alice into encrypting for him messages of his choice and then the attacker's goal is to somehow break some challenge cypher text. So I claim that all the ciphers that we've seen up until now, namely deterministic counter mode or the one time pad, are insecure under a chosen plain text attack. More generally, suppose we have an encryption scheme that always outputs the same cipher text for a particular message M. In other words, if I ask the encryption scheme to encrypt the message M once. And then I ask the encryption scheme to encrypt the message m again. If in both cases the encryption scheme outputs the same cypher text, then that system cannot possibly be secure under a chosen plain text attack. And both deterministic counter mode and the one time pad were of that flavor. They always output the same cipher text, given the same message. And so let's see why that cannot be chosen plain text secure. And the attack is fairly simple, what the attacker is gonna do, is he's gonna output the same message twice. This just says. That he really wants the encryption of M0. So here the attacker is given C0 which is the encryption of M0. So this was his chosen plain text query where he actually received the encryption of the message M0 of his choice. And now he's going to break semantic security. So what he does is he outputs two messages, M0 and M1 of the same length, and he's going to be given the encryption of MB. But low and behold, we said that the encryption system. Always outputs the same cipher text when its encrypting the message, M0. Therefore, if B is=to zero, we know that C, this challenged cipher text, is simply=to CO, because it's the encryption of M0. However, if B is=to one. Then we know that this challenge cypher text is the encryption of M1 which is something other than C zero so all the attacker does is he just checks his C is = to C0 the output's zero in other words he outputs one. So, in this case, the attacker is able to perfectly guess this bit B, so he knows exactly [inaudible] given the encryption of M0, or the encryption of M1. And as a result, his advantage in winning this game is one. Meaning that the system cannot possibly be CPA secure. One is not a negligible number. So this shows that the deterministic encryption schemes cannot possibly be CPA-secure, but you might wonder well, what does this mean in practice? Well in practice this means again that every message is always encrypted to the same cipher text. What this means is if you're encrypting files on disk, and you happen to be encrypting two files that happen to be the same, they will result in the same cipher text and then the attacker by looking at the encrypted disk, will learn that these two files actually contain the same content. The attacker might not learn what the content is, but he will learn that these two encrypted files are an encryption of the same content and he shouldn't be able to learn that. Similarly, if you send two encrypted packets on the network that happen to be the same, the attacker will not learn the content of those packets, but he will learn that those two packets actually contain the same information. Think for example of an encrypted voice conversation. Every time there's quiet on the line, the system will be sending encryptions of zero. But since encryption of zero are always mapped to the same cypher text. An attacker looking at the network will be able to identify exactly the points in the conversation where there's quiet because he will always see those exact same cypher text every time. So these are examples where deterministic encryption cannot possibly be secure. And as I say formerly we say that the terministic encryption can not be semantically secure under a chosen plain text attack. So what do we do, well the lesson here is if the secret keys gonna be used to encrypt multiple messages, it had better be the case that given the same plain text to encrypt twice. The encryption algorithm must produce different cipher texts. And so there are two ways to do that. The first method is what's called randomized encryption. Here, the encryption algorithm itself is going to choose some random string during the encryption process and it is going to encrypt the message M using that random string. So what this means is that a particular message, M0 for example, isn't just going to be mapped to one cipher text but it's going to be mapped to a whole ball of cipher texts. Whereon every encryption, basically, we output one point in this ball. So every time we encrypt, the encryption algorithm chooses a random string, and that random string leads to one point in this ball. Of course, the decryption algorithm, when it takes any point in this ball, will always map the result to M zero. Similarly cipher text M one will be mapped to a ball, and every time we encrypt M one, we basically output one point in this ball. And these balls have to be disjoint, so that the encryption algorithm, when it obtains a point in the ball corresponding to M one, will always output the message M one. In this way, since the encryption algorithm uses randomness, if we encrypt the same message twice, with high probability we'll get different cipher texts. Unfortunately this means that the cipher text necessarily has to be longer than the plain text because somehow the randomness that was used to generate the cipher text is now encoded somehow in the cipher text. So the cipher text takes more space. And roughly speaking, the cipher text size is going to be larger than the plain text. By basically the number of random bits that were used during encryption. So if the plain texts are very big, if the plain texts are gigabytes long, the number of random bits is going to be on the order of 128. So maybe this extra space doesn't really matter. But if the plain texts are very short, maybe they themselves are 128 bits, then adding an extra 128 bits to every cipher text is going to double the total cipher text size. And that could be quite expensive. So as I say randomized encryption is a fine solution but in some cases it actually introduces quite a bit of costs. So let's look at a simple example. So imagine we have a pseudorandom function that takes inputs in a certain space r which is gonna be called a nonce space. And outputs, outputs in the message space. And, now, let's define the following randomize encryption scheme where we want to encrypt the message m with the encryption of whatever it's gonna do is first it's gonna generate a random r in this nonce space R. And then it's going to open a cypher text that consist of two components, the first component is going to be this value R and the second component is going to be an evaluation of the pseudo-random function at the point R XOR with the message M. And my question to you is, is this encryption system semantically secure under a chosen plain text attack. So the correct answer is yes. But only if the nonce space R is large enough so that little R never repeats with very, very high probability. And let's quickly argue why that's true. So first of all, because F is a secure pseudo-random function, we might as well replace it with a truly random function. In other words, this is indistinguishable from the case where we encrypt the message M, using the truly random function little F, evaluated to point R, and then XOR with M. But since this little r never repeats every cypher text uses a different little r what this means is that the values of F(r) are random uniform independent strings every time. So every time we encrypt a message, we encrypt it essentially using a new uniform random one time pad. And since XORing a uniform string with any string simply generates a new uniform string, the resulting cipher text is distributed as simply two random uniform strings. I'll call them r and r prime. And so both in experiment zero and in experiment one, all the attacker gets to see are truly uniform random strings r, r', and since in both experiments the attacker is seeing the same distribution, he cannot distinguish the two distributions. And so since security holds completely when we're using a truly random function it's also gonna hold when we're using a pseudorandom function. Okay, so this is a nice example of how we use the fact that the pseudo random function behaves like a random function to argue security of this particular encryption scheme. Okay, so now we have a nice example of randomized encryption. The other approach to building chosen plain text secure encryption schemes is what's called a nonce based encryption. Now, in a non-spaced encryption system, the encryption algorithm actually takes three inputs rather than two. As usual it takes the key and the message. But it also takes an additional input called a nonce. And similarly, the decryption algorithm also takes the nonce as input, and then produces the resulting decrypted plain text. And what is this nonce value n. This nonce is a public value. It does not need to be hidden from the adversary but the only requirement is that the pair (k,n) is only used to encrypt a single message. In other words, this pair (k,n) must change from message to message. And there are two ways to change it. One way to change it is by choosing a new random key for every message. And the other way is to keep using the same key all the time but then we must choose a new nonce for every message. And, and as I said, I wanna emphasize again, this nonce need not be secret, and it need not be random. The only requirement is the nonce is unique. And in fact, we're gonna use this term throughout the course. A nonce for us, means a unique value that doesn't repeat. It does not have to be random. So let's look at some examples of choosing an nonce, well the simplest option is simply to make the nonce of the accounter so for example the networking protocol you can imagine the nonce being a packet counter that's incremented every time a packet is sent by a sender or received by the receiver this means that the encrypter has to keep state from message to message mainly that he has to keep this counter around and increment it after every message is transmitted. Interestingly, if the decrypter actually has the same state then there is no need to include the nuance in the cipher text since the nuance is implicit. Let's look at an example. The https protocol is run over a reliable transport mechanism which means that packets sent by the sender are assumed to be received in order at a recipient. So if the sender sends packet #5 and then packet #6, the recipient will receive packet #5 and then packet #6 in that order. This means that if the sender maintains a packet counter, the recipient can also maintain a packet counter and two counters basically increment in sync. In this case there is no reason to include the nonce in the packets because the nonce is implicit between the two sides. However, in other protocols, for example, in IPsec, IPsec has a protocol designed to encrypt the IP layer. The IP layer does not guarantee in order delivery. And so the sender might send packet #5 and then packet #6, but those will be received in reverse order at the recipient. In this case it's still fine to use a packet counter as a nonce but now the nonce has to be included in the packet so that the recipient knows which nonce to use to decrypt the received packet. So as I say, nonce based encryption is a very efficient way to achieve CPA security. In particular if the nonce is implicit, it doesn't even increase the cipher text length. Of course another method to generate a unique nonce is simply to pick the nonce at random assuming the nonce space is sufficiently large so that with high probability the nonce will never repeat for the life of the key. Now in this case, nonce based encryption simply reduces to randomized encryption. However, the benefit here is that the sender does not need to maintain any state from message to message. So this is very useful for example if encryption happens to take place on multiple devices. For example, I might have both a laptop and a smart phone. They might both use the same key. But in this case if I require state full encryption, then my laptop and the smartphone would have to coordinate to make sure that they never reuse the same nonces. Whereas if both of them simply take nonces at random, they don't need to coordinate because it was very high probability they'll simply never choose the same nonce. Again assuming the nonce space is big enough. So there are some cases where stateless encryption is quite important, in particular where the same key is used by multiple machines. So I wanted to find, more precisely, what security means for nonce based encryption. And in particular, I want to emphasize that the system must remain secure when the nonce are chosen by the adversary. The reason it's important to allow the adversary to choose the nonces is because the adversary can choose which cipher text it wants to attack. So imagine the nonce happens to be a counter and it so happens that when the couter hits the value fifteen, maybe at that point it's easy for the adversary to break semantic security. So the adversary will wait until the fifteenth packet is sent and only then he will ask to break semantic security. So when we talk about nonce based encryption, we generally allow the adversary to choose the nonce and the system should remain secure even under those settings. So let's define the CPA game in this case and it's actually very similar to the game before. Basically the attacker gets to submit pairs of messages MI, MI0, and MI1. Obviously they both have to be of the same length. And he gets to supply the nonce. And in response, the adversary is given the encryption of either MI0, or MI1. But using the nonce that the adversary chose. And of course, as usual, the adversary's goal is to tell whether he was given the encryption of the left plain text or the right plain text. And as before the adversary gets to iterate these queries and he can issue as, as many queries as he wants, we usually let q denote the number of queries that the adversary issues. Now the only restriction of course, which is crucial, is that although the adversary gets to choose the nonces, he's restricted to choosing distinct nonces. The reason we force him to choose distinct nonces is because that's the requirement in practice. Even if the adversary fools Alice into encrypting multiple messages for him, Alice will never use the same nonce again. As a result, the adversary will never see messages encrypted using the same nonce and therefore, even in the game, we require that all nonce be distinct. And then as usual we say that the system is a nonce based encryption system that's, semantically secure under a chosen plain text attack if the adversary cannot distinguish experiment zero where he's given encryptions of the left messages from experiment one where he's given encryptions of the right messages. So let's look at an example of a nonce based encryption system. As before, we have a secure PRF that takes inputs in the nonce space R and outputs strings in the message space M. Now when a new key is chosen, we're going to reset our counter R to be zero. And now we encrypt the particular message M, what we will do is we will increment our counter R, and then encrypt the message M using the pseudorandom function applied to this value R. And as before, the cipher text is going to contain two components, our current value of the counter and then the one time pad encryption of the message M. And so my question to you is whether this is a secure, non-spaced encryption system. So the answer as before is yes, but only if the nuance space is large enough. So as we increment the counter R, it will never cycle back to zero so that the nuances will always, always be unique. We argue security the same way as before. Because the PRF is secure, we know that this encryption system is indistinguishable from using a truly random function. In other words, if we apply a truly random function to the counter and XOR the results with, the plain text M. But now since the nuance R never repeats, every time we compute this F of R, we get a truly random uniform and independent string so that we're actually encrypting every message using the one time pad. And as a result, all the adversary gets to see in both experiments are basically just a pair of random strings. So both the experiment zero and experiment one the adversary get's to see exactly the same distribution, namely, the responses to all this chosen plain text queries are just pairs of strings that are just uniformly distributed and this is basically the same in experiment zero and experiment one and, therefore, the attacker cannot distinguish the two experiments. And since he cannot win the semantic security game with a truly random function he, also, cannot win the semantics security game with the secure PRF, and, therefore, the scheme is secure. So now we understand what it means for a symmetric system to be secure when the keys used to encrypt multiple messages the requirement is that it be secure under a chosen plan of attack. And we said that basically, the only way to be secure under a chosen plain text attack is either to use randomized encryption, or to use, use nonce spaced encryption where the nonce never repeats. And then in the next two segments, we're gonna build two classic encryption systems that are secure when the key is used multiple times.