In this segment, I want to tell you about another form of encryption, called format preserving encryption. This is again something that comes up in practice quite often, especially when it comes to encrypting credit card numbers. And, so, let's see how it works. Remember that a typical credit card number is sixteen digits, broken into four blocks of four digits each. You've probably heard before that the first six digits are what's called the BIN number, which represent the issuer ID. For example, all credit cards issued by VISA always start with the digit four; all credit cards issued by MasterCard start with digits 51 to 55, and so on and so forth. The next nine digits are actually the account number that is specific to the, to the particular customer and the last digit is a check sum that's computed from the previous fifteen digits. So there are about 20,000 BIN numbers and so if you do the calculation it turns out there are about 2 to the 42 possible credit card numbers which corresponds to about 42 bits of information that you need to encode if you want to represent a credit card number compactly. Now the problem is this. Suppose we wanted to encrypt credit card numbers, so that when the user swipes the credit card number at the point of sale terminal, namely at the, you know, the merchant's cash register. The cash register, this is called a point of sale terminal, goes ahead and encrypts the credit card number using a key k and it's encrypted all the way until it goes to the acquiring bank or maybe even beyond that, maybe even all the way when it goes to Visa. Now, the problem is that these credit card numbers actually pass through many, many, many processing points. All of them expect to get basically something that looks like a credit card number as a result. So if we simply wanted to encrypt something at the end point, at one end point, and decrypt it at the other end point, it's actually not that easy because if all we did was encrypt it using AES, even if we just used deterministic AES, the output of the encrypted credit card number would be 128 bit block. Sixteen bytes that would be, that would need to be sent from one system to the next, until it reaches its destination. But each one of these processors, then, would have to be changed, because they're all expecting to get credit card numbers. And so the question is, can we encrypt at the cash register, such that the resulting encryption itself looks like a credit card number. And as a result, none of these intermediate systems would have to be changed. Only the endpoints would have to be changed. And in doing so we would actually obtain end-to-end encryption without actually having to change a lot of software along the way. So the question then is, again, can we have this mechanism called format preserving encryption, where encrypting a credit card itself produces something that looks like a credit card? So that's our goal. The encrypted credit card number should look like a regular valid credit card number. So this is the goal of format preserving encryption. More abstractly what it is we're trying to do, is basically build a pseudo random permutation on the set zero to S minus one for any given S. So for the set of credit card numbers, S would be roughly, you know, two to the 42. In fact, it's gonna be, not exactly two to the 42. It's gonna be some funny numbers that's around two to the 42. And our goal is to build a PRP that acts exactly on the interval, zero to S minus one. And what we're given as input is some PRF, say, something like AES, that acts on much larger blocks, say, acts on sixteen byte blocks. So we're trying to, in some sense, shrink the domain of the PRF to make it fit the data that we're given. And the question is basically how to do that. Well, once we have such a construction we can easily use it to encrypt credit card numbers. What we would do is we would take the, a given credit card number. We would encode it such that now it's represented as a number between zero and the total number of credit card numbers. Then we would apply our PRP so we would encrypt this credit card number, you know, using construction number two from the deterministic encryption segment. And then we would map the final number back to be a regular, to look like val--, regular, valid credit card number and then send this along the way. So this is, this is again non expanding encryption. The best we can do, as we said before is to encrypt using a PRP except again the technical challenge is we need a PRP that acts on this particular funny looking set from zero to S-1 for this prespecified value of S. So my goal is to show you how to construct this and once we see how to do it, you will also know how to encrypt credit card number so that the resulting cipher text is itself a credit card number. So the construction works in two steps. The first thing we do is we shrink our PRF from the set {0,1} to the n, say {0,1} to the 128 in the case of AES, to {0,1} to the t where t is the closest power of two to the value S. So say S is some crazy number around two to the 41. T would basically be then 42 because that's the closest power of two that's just above the value S. So the construction is basically gonna use the Luby-Rackoff construction. What we need is a PRF F prime that acts on blocks of size t over two. So imagine for example in the credit card case, t would be 42 because two to the 42 we said is the closest power of two that's bigger than, than, than S. S is the number of credit, total number of credit cards. Then we would need a PRF now that acts on 21-bit inputs. So one way to do that is simply to take the AES block cipher, treat it as a PRF on 128-bit inputs, and then simply truncate the output to the least significant 21 bits. Okay, so we were given a 21 bit value. We would append zeros to it so that we get 128 bits as a result. We would apply AES to that and then we would truncate back to 21 bits. Now I should tell you that that's actually a simple way to do it but it's actually not the best way to do it. There are actually better ways to truncate a PRF on n bits to a PRF on t over two bits. But for our pur--, for the purposes of this segment, the truncation method I just said is good enough. So let's just use that particular method. Okay, so now, we've converted our AES block cipher into a PRF on t over two bits, say, on 21 bits. But what we really want is a PRP. And so what we'll do is we'll plug our new PRF, F prime, directly into the Luby-Rackoff construction. If you remember, this is basically a Feistel construction. We saw this when we talked about DES. It's a, Luby-Rackoff used three rounds, and we know that this converts a secure PRF into a secure PRP on twice the block size. In other words, we started from a secure PRF on 21 bits, and that will give us, and Luby-Rackoff will give us a secure PRF on 42 bits, which is what we wanted. Now, I should tell you that the error parameters in the Luby-Rackoff construction are not as good as they could be. And, in fact, a better thing to do is to use seven rounds of Feistel. So in other words, we'll do this seven times; every time we'll use a different key. So you notice, here, we're only using three keys. We should be using, we should be doing this seven times using seven different keys. And then there's a nice result, due to Patarin, that shows that the seven-round construction basically has as good an error terms as possible. So the error terms in the security theorem will basically be two to the T. Okay. So now we have a pseudo random permutation that operates on close power of two to the value of S. But that's not good enough. We actually want to get a pseudo random permutation on the set zero to S minus one. So step two will take us down from {0,1} to the T, to the actual set zero to the S minus one that we're interested in. And this construction is, again, really, really cute, so let me show you how it works. So, again, we're given this PRP that operates on a power of two. And we wanna go down to a PRP that operates on something slightly smaller. Okay, so here's on the construction works. Basically we're given input X in the set zero to S minus one that we want. And what we're going to do is we're going to iterate the following procedure again and again. So first of all we map X into this temporary variable Y. And now we're just gonna encrypt the input X and put the result into Y. If Y is inside of our target set, we stop and we output Y. If not we iterate this again and again and again and again and again until finally Y falls into our target set and then we output that value. So in a picture, let me explain how this works. So we start from a point in our target set. And now, now we apply the, the function E and we might be mapped into this point outside our target set, so we're not gonna stop. So now we apply the function E again and we might be mapped into this point and now we apply the function E again and now all of a sudden we map into this point, and then we stop, and that's our output. Okay, so that's how we map points between from zero to S minus one, to other points in the zero to S minus one. It should be pretty clear that this is invertible. To invert, all I'll do is I'll just decrypt and walk, kind of, in the opposite direction. So I'll decrypt, and then decrypt, and then decrypt, until finally, I fall into the set, zero to S minus one. And that gives me the inverse of the point that I wanted to. So this is, in fact, a PRP. The question is whether it's a secure PRP, and we'll see that in just a minute. But before that, let me ask you, how many iterations do you expect that we're gonna need? And I wanna remind you again, before I ask you that question, that, in fact, S is less than two to the T, but is more than two to the T-1. So in particular S is greater than two to the T over two. And my question to you now is how many iterations are we gonna need, on average, until this process converges? So the answer is we're going to need two iterations, so really just a small constant number of iterations. And so this will give us a very, very efficient mechanism that will move us down from {0,1} to the T to zero to the S-1. So now when we talk about security, it turns out this step here is tight. In other words, there is really no additional security cost to going down from two to the T to zero to S-1. And the reason that's true, it's actually again a very cute theorem to prove, but I, I won't do it here. Maybe I'll leave it as an exercise for you guys to argue. Which says that if you give me any two sets Y and X, where Y is contained inside of X, then applying the transformation that we just saw to a random permutation from X to X actually gives a random permutation from Y to Y. So this means that if we started with a truly random permutation on X, you'll end up with a truly random permutation on Y. And since, well, the permutation we're starting with is indistinguishable from random on X, we'll end up with a permutation that's indistinguishable from random on Y. Okay, so this is a very tight construction and as I said, the first step actually is basically, the analysis is the same as the Luby-Rackoff analysis. In fact, it's better to use as I said, the Patarin analysis using seven rounds. So actually here, there's a bit of cost in security. But, overall, we get a construction that is a secure PRP for actually, not such good security parameters, but nevertheless, this is a good secure PRP that we can construct that actually will operate on the range zero to S minus one. This in turn will allow us to encrypt credit card numbers such that the cipher text looks like a credit card number. And again, I want to emphasize that there's no integrity here. The best we can achieve here is just deterministic CPA security. No cipher text integrity, and no randomness at all. Okay. So this concludes this module. And as usual I want to point to a few reading materials that you can take a look at if you're interested in learning more about anything that we discussed in this module. So first of all, the HKDF construction that we talked about for key derivation is described in a paper from 2010 by Hugo Krawczyk. Deterministic encryption, The SIV mode that we described is discussed in this paper over here. The EME mode that we described that allows us to build a, a wider block pseudo random permutation is described in this paper here by Halevi and Rogaway. Tweakable block ciphers that actually led to the XDS mode of operation that's used for disk encryption is described in this paper here. And finally, a format preserving encryption is described in this last paper that I point to over here. Okay so this concludes this module. And in the next module we gonna start looking at how to do key exchange.