Now that we understand what is deterministic encryption, let's see some

constructions that provide security against deterministic chosen plaintext

attacks. So, first let me remind you that the deterministic encryption is needed,

for example, when encrypting a data base index and later we wanna look up records

using the encrypted index. Because the encryption is deterministic we're

guaranteed that when we do the lookup the encrypted index is gonna be identical to the

encrypted index that was sent to the database when the record was written to

the database. And so, this deterministic encryption allows us a very simple or fast

way to do lookups based on encrypted indices. The problem was that we said the

deterministic encryption can't possibility be secure against a general chosen

plaintext attack because if the attacker sees two cipher texts that are equal

it learns that the underlying encrypted messages are the same. So, then we defined

this concept of deterministic chosen plain text security which means that we have

security as long as the encryptor never encrypts the same message more than once

using a given key. In particular, this key, message pair is only used once, for

every encryption, either the key changes, or the message changes. And then as I

said, formally we define this CPA, deterministic CPA security game, and our

goal in this segment is to actually give constructions that are deterministic CPA

secure. So the first construction we're gonna look at is what's called SIV,

synthetic IV's. And the way this construction works is as follows. Imagine

we have a general CPA secure encryption system. So here's the key and here's the

message and we are gonna denote by R the randomness that's used by the encryption

algorithm. Remember that a CPA secure system that doesn't use nonsense has to be

randomized and so we're explicitly gonna write down this variable R to denote the

random string that's used by the encryption algorithm as it's doing the

encryption. For example if we're using randomized counter mode, r would be the

random IV that's used by randomized counter mode. And of course C is the resulting

ciphertext. Now, in addition, we're also going to need a pseudo random function, f,

that what it does, is it takes our arbitrary messages in the message space

and outputs a strings, R, that can be used as randomness for the CPA secure

encryption scheme. So, the little, r, over here is actually a member of the big R set. Okay.

And we're going to assume this is a, f is a pseudo random function that maps messages

to random strings. Now the way SIV works is as follows. It's going to use two keys

K1 and K2 to encrypt the message M. And what it does, is the first thing is going

to apply the pseudo random function f to the message M to derive randomness for the

CPA secure encryption scheme E. And then it's going to encrypt the message M using

the randomness that it just derived. This is going to give us a cipher text C and

then it's going to output this cipher text C Okay. So that's how this SIV mode works.

Basically it first derives the randomness from the message being encrypted, and then

it uses this derived randomness to actually encrypt the message to obtain the

cipher text. Now I'd like to point out for example, if the encryption scheme E

happened to be randomized counter mode. Then the randomness R would just be the

random IV which would actually be output along with the cipher text. So this means

that the cipher text is a little bit longer than the plain text. But the point

here isn't to generate a short cipher text. Rather the point here is to make

sure that the encryption scheme is deterministic, so if we encrypt the same

message multiple times, every time we should obtain the same cipher text. And

indeed every time, we'll obtain the same randomness, R, and as a result, every time

we'll obtain the same cipher text C. So it's fairly easy to show that this

encryption scheme really is semantically secure under the deterministic chosen

plaintext attack. The reason that's so is because we apply the pseudo random

function F to distinct messages. Well if we apply F to distinct messages then the

random string that F generates is going to look like just truly random strings. A

different random string for every message. And as a result the CPA secure encryption

scheme E is always applied using truly random strings. And that's exactly the

situation where it is CPA secure. So because these R's are just random

indistinguishable from brand new strings, the resulting system is in fact gonna be

CPA secure. So this is just the intuition for why this works and it's actually

fairly straightforward to actually formalize this into a complete proof. Now,

I should emphasize that this is actually well suited for messages that are more

than one AES block. In fact, for short messages, we're gonna see a slightly

different encryption scheme that's actually better suited for these short

messages. Okay, now the really cool thing about SIV is that, actually, we get cipher

text integrity for free. In fact we don't have to use a special Mac if we want to

add integrity. In a sense SIV already has a built in integrity mechanism. And let me

explain what I mean by that. First of all, our goal was to build what's called

deterministic authenticated encryption. Dae. Deterministic Authenticated

Encryption. Which basically means that deterministic CPA security and cipher text

integrity. Remember cipher text integrity means that the attacker gets to ask for

the encryptions of messages of his choice and then he shouldn't be able to produce

another cipher text that decrypts to a valid message. Okay, so I want to claim

that in fact SIV automatically gives a cipher text integrity without the need for

an embedded MAC or anything else. So let's see why. In particular, let's look at a

special case of SIV when the underlying encryption scheme is randomized counter

mode. Okay, so we'll call this SIV-CTR again to denote SIV where we're using

randomized counter mode. Alright. So let me remind you again how SIV works in this

case. Well, so we take our message, here, we take our message, and then we apply a

PRF to it. And that derives an IV. And then that IV is going to be used to

encrypt the message using randomized counter mode. Okay. So in particular,

we're gonna use this PRF FTCRr for F counter, for randomized counter mode and

essentially evaluate this FCTR at Iv, IV plus one up to IV plus L. And, then, we

had sorta that with the message. And that gives us the final ciphertext. Okay. So,

this is SIV with a randomized counter mode. Now, let's see how decryption is

gonna work. And during decryption we're gonna add one more check, and that's

going to provide ciphertext integrity. So let's see how decryption is going to work. So here

we have our input cipher text that contains the IV and the cipher text. Well,

the first thing we're going to do is we're going to decrypt the cipher text using the

given IV, and that will give us candidate plain text. Now we're gonna reapply the

PRF F from the definition of SIV to this message. Now if a message is valid, we

should be getting the same IV that the [adversary??] supplied as part of the cipher

text. If we get a different IV, then we know that this message is not a valid

message and we simply reject the cipher text. So this is really cute. It's a very

simple kinda built in check to make sure that the cipher text is valid, right. We

simply check that after decryption if we re-derive the IV we would actually get

the correct IV. And if not we reject the message. And I claim that this simple check

during decryption is enough to [actually??] provide ciphertext integrity. And therefore,

deterministic authenticated encryption. So I'll state this in a simple theorem.

Basically to say, that if F is a secure PRF, and in counter mode that's derived

from FCTR is CPA secure, then the result is in fact a deterministic authenticated

encryption system. Now the proof for this is not too difficult. In two sentences,

let me just show you the rough intuition for why this is true. So, all we have to

argue is just cipher text integrity. So we already argued before that the system has

deterministic CPA security, all we have to argue is just cipher text integrity. So to

argue that the system has cipher text integrity, let's think again how the

cipher text integrity game works. Adversary asks for the encryption of a

bunch of messages of his choice. And he gets the resulting cipher text. Then, his

goal is to produce a new valid cipher text. Well, if that valid cipher text upon

decryption, decrypts to some completely new message. Then when we plug this new

message into the PRF F, we're just going to get some random IV and it's very unlikely

to hit the IV that the adversary supplied in the cipher text that he output. So

therefore that says that when the advisory outputs his forged cipher text, the

message in that forged cipher text basically should be equal to one of the

messages in his chosen message queries. Otherwise, again the IV that you get is

simply not gonna be equal to the IV in the forged safe index with very high

probability. But if the message is equal to one of the messages that the adversary

queried us on, well then, the cipher text that we're gonna get is also gonna be

equal to one of the cipher texts that we supplied to the adversary. But then his

forgery is simply one of the cipher texts that we gave to him. And therefore, this

is not a valid forgery. He has to give us a new cipher text to win the cipher text

integrity game. But he gave us one of our old cipher texts. So, this is the rough

intuition. I hope, I kinda went through it quickly. I hope it kinda makes sense. If

it doesn't then it's not the end of the world. I'm gonna reference the paper that

describes SIV at the end of the module. And if you wanna see this in more detail

you can read and take a look inside of that paper. But, regardless, this is a

very cute idea that I wanted to show you where kinda the fact that we derive the

randomness for deterministic counter mode using a PRF. Also, gives us an integrity

check when we actually do the decryption. Okay. So this SIV is a good mode for doing

deterministic encryption when you need to, particularly if the messages are long. If

the messages are very short, say they're less than sixteen bytes, in fact there's a

better way to do it, and that's the method that I wanna show you now. So the second

construction is actually trivial. All we're gonna do is we're gonna use a PRP

directly. So here's what we do. So suppose (E, D) is a secure PRP. So E will encrypt, and

D will decrypt. And I claim that if we use E directly, that already gives us

deterministic CPA security. So let me show you very quickly why. So suppose F is a truly random invertible function from X

to X. Okay. So remember a PRP is indistinguishable from a truly random

invertible function, so let's pretend that we actually do have a truly random

invertible function. Now in experiment zero what the attacker is gonna see while

he submits a bunch of messages, you know the messages on the left. And what he's

gonna see is basically the evaluation of f on the messages on the left that he

supplied. Well, in the deterministic CPA game, all these messages are distinct, and

so all he's gonna get back are just q distinct random values in X. That's all he

sees. Yes, there's Q distinct random values in X. Now let's think about

experiment one. In experiment one, he gets to see the encryptions of messages on the

right, okay, the M11 to MQ1. Well, again, all these messages are all distinct so all

he gets to see are just Q random distinct values in X. Well these two distributions,

in experiment zero and experiment one, therefore are identical. Basically, in

both cases what he gets to see are just Q distinct random values in X. And as a

result, he can't distinguish experiment zero from experiment one. And since he

can't do this for a truly random function, he also can't do this for the PRP. Ok,

so that explains why directly encrypting with a PRP already gives us a CPA secure

system very, very, very simple to use. That says that if all we wanna do is

encrypt short messages, say, less than sixteen bytes, then all we need to do is

to directly encrypt them using AES and the result will, in fact, be deterministic CPA

secure. So, if your indices are less than sixteen bytes directly using AES is a fine

thing to do. Notice however, that's not gonna provide any integrity. And we're

gonna see how to add integrity in just a minute. But the question for you is, what

do we do if we have longer than sixteen byte messages? Well, one option is to use

SIV. But what if we actually wanted to use this construction too. It's actually an

interesting question. Can we construct PRP's that have message spaces that are

bigger than just sixteen bytes? If you remember in the past we constructed PRFs

on that had large message spaces from PRFs that had small message spaces. Here,

we're going to construct PRPs with large message spaces from PRFs that have small

message spaces. So, here's how we do it. So suppose E D is a secure PRP that

operates on N bit blocks. There's a standard mode called EME that actually

will construct a PRP that operates on capital N bit blocks, where capital N is

much, much bigger than little n. So this would allow us to do deterministic

encryption on much larger messages than just sixteen bytes. In fact it could be as

long as we want. So let's see how EME works. It's a bit daunting at first but

it's not a difficult construction. So, let's see how it works. So, EME uses two

keys, K and L, and in fact in the real EME L is derived from K. But for our purposes,

let's just pretend that K and L are two distinct keys. The first thing we do, is

we take our message X and we break it up into blocks. And then we XOR each block

with a certain padding function, okay? So we use the secret key L to derive a pad

using this, you know function P that I'm not gonna explain here. We derive a

different pad for each one of the blocks and we XOR that pad. Into the block. The

next thing we do is we apply the PRP E using the Key K, to each of these blocks,

and we're gonna call these outputs PP0, PP1, and PP2. The next thing we do is we

XOR all these ppp's together and we call the result MP. Actually this XOR

doesn't need to be there. We call the result of the XOR MP. The next thing we

do we XOR all these ppp's together and we call the result MP. We encrypt this MP

using E and the key K. We call the outputs of this encryption MC. Okay. So then we use

the XOR MP and MC and this gives us another PM, which we use to derive one

more pad, and then we XOR this output of this pad with all the PPP's to get

these CCC's, [laugh], and then we XOR all these C C C's together that gives us

a value of C C C zero, which we then encrypt one more time with all these E's,

we do one more padding with all these P's and that actually finally gives us the

output of EME. Okay, so like I said, this may look a little daunting, but in

fact there's a theorem that shows that if the underlying block cipher E is a secure

PRP, then in fact. This construction, EME, is a secure PRP on this larger block, you

know, zero, one to the capital N. The nice thing about this construction is you

notice that it's very parallel and actually that's why it's a little

complicated. Counted every block. It's encrypted in parallel, so if you have a

multi-core processor, you can use all your cores to encrypt all the blocks at the

same time and then there would be one kind of sequential step to compute this, check

sum of, all the outputs and then one more parallel around to encrypt one more time

and then finally you get the outputs. These function Ps, by the way, that

generate the pads, are very, very simple functions. They take constant time so

we're just going to ignore them in terms of performance. So the bottom line is

performance here. You notice it requires two applications of E per input block. And

it turns out this can actually be slower than SIV, if SIV is implemented properly

with a very fast PRF to derive the randomness. Then in fact SIV can be twice

as fast, as this particular mode of operation. For this reason I say that the

PRP construction is very good for short messages, whereas SIV is good if you h, if

you want to encrypt longer messages in a deterministic fashion. But now what if we

wanted to add integrity to this PRP-based mechanism? So, can we achieve

deterministic authenticated encryption using the PRP mechanism where we directly

encrypt the message using a PRP? And it turns out the answer is yes and it's

actually again, a very simple encryption scheme that you should know

about. Basically what we're going to do is we're going to take our message and we're

going to append a bunch of zeros to this message and then we're going to apply the

PRP and that's it. And that's going to give us the cipher text. Now, very, very

simple. Just append zeros and encrypt using a PRP. When we decrypt, we're going

to look at these least significant bits of the resulting plain text and if they're

not equal to zero, we're just going to reject the cipher text. And if they are

equal to zero, we're going to output the message as valid. Okay, so that's it,

that's the whole system, very, very simple. Simply append zeroes during

encryption, and then check that the zeroes are there when you decrypt. And I claim

that this very simple mechanism actually provides deterministic authenticated

encryption, assuming, of course, that you've appended enough zeroes. And in

particular, if you append N zeroes, you need one over two to the N to be

negligible. And if so, then, in fact, this direct PRP based encryption, in fact,

provides deterministic authenticated encryption. So let me see why. Well, we

already argued that the system is CPA secure, so all we have to argue is that it

provides cipher text integrity. And again, that's easy to see. Let's look at the

cipher text game. So the adversary is gonna choose let's say a truly random

permutation. So a truly random, invertible function, on the input space. In this case

the input space is the message space and the N zero bits. Now what does the

adversary get to do? Well he gets to submit q messages, and then he receives

the encryption of those Q messages. Mainly he receives the PRP at his chosen points

concatenated with N zeros. Okay, that's what it basically it means to query the

CPA challenger. So in case of a random permutation, he simply gets to see, the

value of this permutation at Q points of his choice, but only concatenated with N

zeroes. And now, what's his goal in the cipher text integrity game? Well, his goal

is to produce some new cipher text that's different from the cipher text that he was

given, that's gonna decrypt properly. Well, what does it mean that it decrypts

properly? It means that if, when we apply, Pi inverse To the cipher text C, it had

better be the case that the N least significant bytes of C are all zeros. And

the question is how likely is that to happen? Well, so let's think about this.

So basically we have a truly random permutation and the adversary got to see

the value of this permutation as Q points. How likely is he to produce a new point

that, when inverted, happens to have N zeroes as the least significant bits? What we're

doing here is basically we're evaluating the permutation Pi inverse at the point c.

And since Pi inverse is a random permutation, how likely is it to have its

n least significant bits be equal to zero? So it isn't hard to see that the answer

is, is, at most, the probability's at most, one over two to the N. Because,

again, basically, the permutation is gonna output a random element inside of, X times

0 1 to the N. And that element is gonna end with N zeroes, but basically

with the probability one over two to the N. And as a result, the adversary wins the

game with negligible probability, Because, this value is negligible. So that's the

end of this segment. I wanted you to see these two very cute deterministic

authenticated encryption schemes. The first one we called SIV, where I said you

would use randomized counter mode and you just arrived at randomness for randomized

counter mode from a PRF applied to the message. And the very cute idea here is

that during decryption you can simply recompute the IV from the, from the decrypted

message and verify that that IV is what's given to you in the cipher text. And that

simple check is actually enough to guarantee authenticated encryption or

rather deterministic authenticated encryption. So this mode is, is the way

you would encrypt an index in a database if the index was large. If the index

happens to be short, maybe say, its eight bytes if it's an 8-byte user ID, then you

can directly use a PRP and the way you would do is, is you would append zeros to

those eight bytes. And then those zeros will be used as an integrity check when

you decrypt and again if you append, are able to append enough zeros, then in fact

this also provides deterministic authenticated encryption. As an aside, I

showed you that there's a way to build the wide block PRP from a narrow PRP and that

particular mode of operation is called EME. So we're gonna refer EME

actually in the next segment.