In the previous segment we saw how to build public key encryption from trapdoor functions, in this segment we're going to build a classic trapdoor function called RSA. But first let's quickly review what a trapdoor function is. So recall that a trapdoor function is made up of three algorithms. There is a key generation algorithm, the function itself, and the inverse of the function. The key generation algorithm outputs a public key and a secret key. And in this case, in this lecture the public key is going to define a function that maps the set X onto itself. Which is why I call these things trapdoor permutations, as opposed to trapdoor functions, simply because the function maps X onto itself, whereas a trapdoor function might map the set X to some arbitrary set Y. Now given the public key, the function, as we say, basically defines this function from the set X to the set X. And then given the secret key, we can invert this function. So this function F evaluates the function in the forward direction, and this function F inverse, which means the secret key SK, computes F in the reverse direction. Now we say that the trapdoor permutation is secure if the function defined by the public key is in fact a one-way function, which means that it's easy to evaluate, but without the trapdoor, the secret trapdoor, it's difficult to invert. Before we look at our first example of a trapdoor permutation, I want to do a quick review of some necessary arithmetic facts that we're gonna need. And in particular, let's look at some arithmetic facts modulo composites. So here we have our modulus N, which happens to be a product of two primes, and you should be thinking of P and Q as roughly equal size numbers, since particular P and Q are both roughly on the size of square root of N. So both are roughly the same size. Recall that we denoted by ZN the set of integers from zero to N minus one, and we said that we can do addition and multiplication module N. We denoted by ZN star the set of invertible elements in ZN. So these are all the elements, which have a modular inverse. And we said that actually X is invertible if and only if it's relatively prime to N. Moreover, I also told you that the number of invertible elements in ZN star is denoted by this function phi(N). So phi(N) is the number of invertible elements in ZN star, And I told you that when N is a product of two distinct primes, then in fact phi(N) is equal to (P - 1) times (Q - 1) and if you extend that out, you see that this is really equal to (N - P - Q + 1). Now remember that I said that P and Q are on the order of square root of N which means that P + Q is also on the order of square root of N. Which means that really phi(N) is on the order of N minus two square root of N. So, in other words, it's very, very close to N. Basically, subtracting the square root of N from a number, this is from, N is going to be a huge number in our case, like 600 digits. And so subtracting from a 600 digit number, the square root of that 600 digit number, namely a 300 digit number, hardly affects the size of the number. Which means that phi(N) is really, really, really close to N, and I want you to remember that, because we will actually be using this now and again. And so the fact that phi(N) is really close to N, means that if we choose a random element modulo N It's very, very, very likely to be in ZN star. So it's very unlikely that by choosing a random element in ZN, we will end up choosing an element that is not invertable. Okay. So just remember that, that in fact almost all elements in ZN are actually invertible. And the last thing that we'll need is Euler's theorem, which we covered last week, which basically says that for any element X in ZN star, if I raise X to the power of phi(N), I get one, in ZN. So in other words, I get 1 modulo N. I'll say it one more time because this is gonna be critical to what's coming. Again X to the phi(N) is equal to 1 modulo N. So now that we have the necessary background we can actually describe the RSA trapdoor permutation. This is a classic, classic, classic construction in cryptography that was first published in Scientific American back in 1977, this is a very well known article in cryptography. And ever since then, this was 35 years ago, the RSA trapdoor permutation has been used extensively in cryptographic applications. For example, SSL and TLS use RSA both for certificates and for key exchange. There are many secure email systems and secure file systems that use RSA to encrypt emails and encrypt files in the file system. And there are many, many, many other applications of this system. So this is a very, very classic, crypto construction, and I'll show you how it works. I should say that RSA is named for its inventors, Ron Rivest, Adi Shamir and Len Adleman, who were all at MIT at the time they made this important discovery. So now we're ready to describe the RSA trapdoor permutation. To do that, I have to describe the key generation algorithm, the function ƒ and the function ƒ−1. So let's see. So the way the key generation algorithm works is as follows: What we do is we generate two primes, P and Q, each would be, say on the order of 1000 bits, so, you know, roughly 300 digits, and then the RSA modulus is simply going to be the product of those two primes. The next thing we do is we pick two exponents, e and d, such that e times d is 1 modulo phi(N). What this means is that e and d first of all are relatively prime to phi(N) and second of all they're actually modular inverses of one another, modulo phi(N). And then we output the public key as the pair N,e and the secret key is the secret key N,d. I should mention that the exponent e, that the number e is sometimes called the encryption exponent and the exponent d is sometimes called the decryption exponent. And you'll see why I keep referring to these as exponents in just a second. Now the way the RSA function itself is defined is really simple. For simplicity I'm gonna define it as the function from ZN star to ZN star. And the way the function is defined is simply given an input X, all we do is we simply take X and raise it to the power of e in ZN. So we just compute X to the e, modulo N. That's it. And then to decrypt, what we do is we simply, given an input Y, we simply raise Y to the power of d, modulo N. And that's it. So now you can see why as I refer to e and d as exponents. They're the things that X and Y are being raised to. So let's quickly verify that F inverse really does invert the function F. So let's see what happens when we compute Y to the d. So suppose Y itself happens to be the RSA function of some value X. In which case, what Y to the d is, is really RSA of X to the power of d. While X by itself is simply gonna be X to the e modulo N, And therefore, Y to the d is simply X to the e times d modulo N. Again, just using rules of exponentiation, the exponents multiply, so we get X to the e times d. But what do we know about e times d? e times d we said is one modulo phi(N). And what that means is there exists some integer such that e times d is equal to K times phi(N) plus one. This is what it means that e times d is one modulo phi(N). So, we can simply replace e times d by K times phi(N)+1. So, that's what I wrote here. So, we have X to the power of K times phi(N)+1. But now again just using rules of exponentiation, we can re-write this as X to the power of phi(N) to the power of K times X. This is really the same thing as K times phi(N)+1 as the exponent. I just kind of separated the exponent into it's different components. Now, X to the phi(N) by Euler's theorem, we know that X to the phi(N) is equal to one. So what is this whole product equal to? Well since X to the phi(N) is equal to one, one to the K is also equal to one, so this whole thing over here is simply equal to one. And what we're left with is X. So what we just proved is that if I take the output of the RSA function and raise it to the power of 'd', I get back X. Which means that raising to the power of 'd' basically inverts the RSA function, which is what we wanted to show. So that's it, that's the whole description of the function, we've described the key generation. The function itself, which is simply raising things to the power of e modulo N, and the inversion function which is simply raising things to the power of d, also modulo N. The next question is, why is this function secure? In other words, why is this function one way if all I have is just a public key, but I don't have the secret key? And so to argue that this function is one way we basically state the RSA assumption which basically says that the RSA function is a one way permutation. And formally the way we state that is that, basically for all efficient algorithms, A, it so happens that if I generate two primes p and q, random primes p and q. I multiply them to get to modulus N and then I choose a random 'y' in ZN star. And now I give the modulus, the exponent and this 'y' to algorithm A, the probability that I'll get the inverse of RSA at the point Y, mainly I'll get Y to the power of one over E. That's really what the inverse is. This probability is negligible. So this assumption is called the RSA assumption. It basically states that RSA is a one way permutation just given the public [key?]. And therefore, it is a trapdoor permutation because it has a trapdoor. And makes this easy to invert for anyone who knows the trap door. So now that we have a secure trap door permutation, we can simply plug that into our construction for public key encryption, and get our first real world public key encryption system. And so what we'll do is we'll simply plug the, the RSA trapdoor permutation into the iso standard construction that we saw in the previous segment. So, if you recall, that construction was based on a symmetric encryption system that had to provide authenticated encryption. And it was also based on a hash function. Then mapped while transferring it into the world of RSA, it maps elements in ZN, into secret keys for the symmetric key system. And now the way the encryption scheme works is really easy to describe. Basically algorithm G just runs the RSA key generation algorithm and produces a public key and a secret key. Just as before. So you notice the public key contains the encryption exponent and the, secret key contains the decryption exponent. And the way we encrypt is as follows. Well, we're going to choose a random X in ZN. We're going to apply the RSA function to this X, we're going to deduce a symmetric key from this X by hashing it, so we compute H of X to obtain the key K, and then we output this Y along with the encryption of the message under the symmetric key K. And in practice, the hash function H would be just implemented just using SHA-256, and you would use the output of SHA-256 to generate a symmetric key that could then be used for the symmetric encryption assistant. So that's how we would encrypt and then the way we would decrypt is pretty much as we saw in the previous segment, where the first thing we would do is we would use the secret key to invert the header of the ciphertext. So we would compute RSA invert of Y, that would give us the value X. Then we would apply the hash function to X so that this would give us the key K. And then we would run the decryption algorithm for the symmetric system on the ciphertext and that would produce the original message m. And then we stated a theorem in the previous segment to say that if the RSA trapdoor permutation is secure, Es and Ds, the symmetric encryption scheme [inaudible] provides authenticated encryption. And as we said, H is just random oracle. It's, you know, kind of a random function from ZN to the key space. Then, in fact, this system is chosen cipher text secure, and is a good public key system to use. So now that we have our first example of a good public key system to use, I wanna quickly warn you about how not to use RSA for encryption. And this again something that we said in the previous segment. And I'm going to repeat it here, except I'm going to make it specific to RSA. So I'd like to call this, textbook RSA. Which basically is the first thing that comes to mind when you think about encrypting using RSA, namely, the secret key and the public key will be as before. But now instead of running through, a hash function to generate a symmetric key, what we would do is we would directly use RSA to encrypt the given message M. And then we would directly use the decryption exponent to decrypt the cipher text and obtain the plain text M. I'm going to call this textbook RSA, because there are many textbooks that describe RSA encryption in this way. And this is completely wrong. This is not how RSA encryption works. It's an insecure system. In particular, it's deterministic encryption, and so it can't possibly be semantically secure, but in fact there are many attacks exist that I'm going to show you an attack in just a minute, but the message that I want to make clear here, is that RSA, all it is, is a trap door permutation. By itself it's not an encryption system. You have to embellish it with this ISO standard for example, to make it into an encryption system. By itself, all it is, is just a function. So let's see what goes wrong if you try to use textbook RSA, In other words, if you try to encrypt a message using RSA directly. And I'll give you an example attack from the world of the web. So imagine we have a web server. The web server has an RSA secret key. Here's it's denoted by N and D. And here we have a web browser who's trying to establish a secure session, a secure SSL session, with the web server. So the way SSL works is that the web browser starts off by sending this client hello message saying, hey, I want to set up a secure session with you. The web server responds with a server hello message that contains the server's public key And then the web browser will go ahead and generate a random what's called a premaster secret K, it will encrypt the premaster secret using K and send the result in ciphertext over to the web server. The web server will decrypt and then the web server will also get K, so now the two have a shared key that they can use to then secure a session between them. So I want to show you what goes wrong if we directly use the RSA function for encrypting K. In other words, if directly K is encrypted as K to the e modulo N. Just for the sake of argument let's suppose that K is a 64-bit key. We're going to treat K as an integer in the range as zero to 2 to the 64. More precisely two to the 64 minus one, and now what we're going to do is the following. First of all, suppose it so happens that K factors into a product of roughly equal sized numbers. So K we can write as K1 times K2, where K1 and K2 are integers. And both are say less than two to the 34. So, it turns out this happens with probability roughly twenty percent so one in five times K can actually be written in this way. But, now if we plug this K, K=K1 x K2 if we plug that into the equation that defines the cipher text you see that we can simply substitute K by K1 x k2 and then we can move k1 to the other side. So then we end up with this equation here, namely C over K1 to the e is equal to K2 to the e. You notice if I multiply both sides by K1 to the e, I get that C is equal to K1 times K2 to the e, which is precisely this equation here. Okay, so all I did is I just replaced K by K1 times K2 and then divided by K1 to the e. So by now this should look familiar to you. So now we have two variables in this equation, of course. C is known to the attacker, E is known to the attacker, and N is known to the attacker. The two variables in this equation are K1 and K2, and we've separated them into a different side of the equation, and as a result we can do now a meet in the middle attack on this equation. So let's do the meet in the middle attack. What we would do is we would build a table of all possible values Of the left-hand side. So all possible values of C over K1 to the E, there are 2 to the 34 of them. And then, for all possible values on the right-hand side, [inaudible] for all possible values of K2 to the e, we're gonna check whether this value here lives in the table that we constructed in step one. And if it is then we found a collision, and basically we have a solution to this equation. So as soon as we find an element of the form K2 to the E that lives in the table that we built in step one, we've solved this equation and we found K1 and K2. And of course once we found K1 and K2, we can easily recover K simply by multiplying them. So then we multiply K1 and K2 and we get, the secret key K. Okay? So, we've broken, basically, this, this encryption system. And how long did it take? Well, by brute force, we could have broken it in time 2 to the 64, simply by trying all possible keys. But this attack, you notice, it took 2 to the 34 time for step number one. Well, it took a little bit more than 2 to the 34, 'cause we had to do these exponentiations. It took 2 to the 34 time for step two against slightly more than two to the 34 because of the exponentiations. So let's say that the whole algorithm took time two to the 40. The point is that this is much, much less time due to the 64. So here you have an example. Where if you encrypt using RSA directly, in other words you directly compute, K to the E, mod N, instead of going through the ISO standard, for example, then you get an attack that runs much faster than exhaustive search. You get an attack that runs in time two to the 40, rather than time two to the 64. Okay, so this is a cute example of how things can break if you use the RSA trapdoor permutation directly to encrypt a message. So the message to remember here, is never, ever, ever use RSA directly to encrypt. You have to use to go through an encryption mechanism. For example, like the ISO standard. And in fact, in the next segment we are going to see other ways of using RSA to build public key encryption.