0:00

Well, we're almost done with our discussion of symmetric encryption. There

are just a couple of odds and ends that I'd like to discuss before we move on to

the next topic. So the first thing I'd like to mention is how we derive many keys

from one key. And it, actually, this comes up all the time in practice, so I'd like

to make sure you know how to do this correctly. So what's the setting that

we're looking at? Well, imagine we have a certain source key that's generated by one

of, a number of methods. Imagine the source key is generated by a hardware

random number generator or perhaps is generated by a key exchange protocol

which we're going to discuss later. But anyhow, there are a number of ways in

which a source key might be generated between Alice and Bob, such that the

attacker doesn't know what the source key is. But now, as we said, in many cases, we

actually need many keys to secure a session, not just one single source key.

For example, if you remember, in TLS there were unidirectional keys and we

needed keys in each direction. And in fact, in each direction, we needed

multiple keys. We needed a MAC key, we needed an encryption key. We need an IV,

and so on. Similarly nonce based encryption, you remember, there were

multiple keys that were being used, and so on. And so, the question is, how do we use

the one source key that we just derived, either from a hardware process or

by key exchange, and generate a bunch of keys from it that we could then use to

secure our session. The way that's done, is using a mechanism called a key

derivation function, KDF. And I want to talk a little bit about how KDF's are

constructed. So first of all, suppose we have a secure PRF, that happens to have

key space K. And now, suppose that it so happens that our source key SK is uniform

in the key K. In this case, the source key is, in fact, a uniform random key for the

secure PRF F. And we can use it directly to generate keys, all the keys that we need

to secure the session. So in this case, the KDF is really simple. The key

derivation function would just work as follows. It would take as input the

source key. It would take an input, a parameter context, which I'm gonna

describe in just a minute. And then it's gonna take a length input as input as

well. And then what it will do is it will basically evaluate the PRF on zero. Then

it will evaluate the PRF on one. Then it will evaluate the PRF on two, up until L.

And I will talk about what this context is in just a second. And then, basically, you

would use as many bits of the output as you would need to generate all the keys

for the session. So, if you need unidirectional keys you would generate, you

know, one key in each direction where each key might include an encryption key and a MAC

key. And so, you would basically generate as many bits as you need and then finally

cut off the output at the time when you've generated enough keys to secure your

session. Okay so this is a fairly straight forward mechanism it's basically using the

secure PRF as a pseudo random generator. And the only question is what is its

context string. Well, I'll tell you that the context string is basically a unique

string that identifies the application. So in fact you might have multiple

applications on the same system that's trying to establish multiple secure keys.

Maybe you have SSH running as one process, you have a web server running as another process,

IPsec as a third process and all three need to have secret keys generated. And this

context variable basically tries to separate the three of them. So, let me ask you,

more precisely, what do you think the purpose of this context variable is?

3:19

So I guess I've given it away and this context variable is

supposed to basically separate applications, so that even if, for

example, the three services that we just talked about, SSH, web server, and IPsec,

if they all happen to obtain the same source key from the hardware random number

generator then the context since it's different for the three apps will make

sure that they still get three independent strings that they can then use to secure

the sessions. I just want you to remember that, even though this is actually fairly

straightforward, and we discussed this before, the context string is actually

important, and it does need to be specific to the application, so that each

application gets its own session keys, even if multiple applications happen to

sample the same SK. The next question is, what do we do if the source

key actually isn't uniform? Well, now we got a problem. If the source key is not a

uniform key for the pseudo random function then we can no longer assume that the

output of the pseudo random function is indistinguishable from random. In fact, if

we just use the KDF that we just described then the output might not look random to

the adversary and then he might be able to anticipate some of the session keys that

we'll be using and thereby break the session. So then we have a problem. Now

why would this source key not be uniform? Well there are many reasons why this

happened. For example if you use a key exchange protocol, it so happens typically

that key exchange protocols will generate a high entropy key. But the

high entropy key is gonna be distributed in some subspace of the key

space. So it's not going to be a uniform string. It will be uniform in some

subset of a larger set, And we'll see examples of that as soon as we talk about

key exchange protocols. And so KDFs have to kind of accommodate for the fact that

key exchange protocols actually don't generate uniform bit strings. The other

problem is, that, in fact, the hardware random number generator you're using might

actually produce biased outputs. We don't wanna rely on the non bias of the hardware

random number generator. And so all we want to assume is that it generates a high

entropy string, but one that might be biased. In which case, we have to somehow

clean this bias. And so this introduces this, this paradigm for building KDFs.

This is called the extract-then-expand paradigm, where the first step

of the KDF is to extract a pseudo random key from the actual source key. So in a

picture you can think about it like this. In some sense these are the different

possible values of the source key. This is the horizontal line and the vertical axis

is basically the probability of each one of these values, and you can see that this

is a kind of a bumpy function which would say that the source key is not uniformly

distributed in the key space. What we do in this case is we use what's called an

extractor. So an extractor is something that takes a bumpy distribution and makes

it into a uniform distribution over the key space. In our case we're actually just

gonna be using what are called computational extractors, namely

extractors that don't necessarily produce uniform distribution at the end but

they generated distribution that's indistinguishable from uniform.

6:22

Now extractors typically take as input something called a salt, and a salt just

like in a salad, it kind of adds flavor to things, what it does is basically kind of

jumbles things around, so that no matter what the input distribution is, the output

distribution is still going to be indistinguishable from random. So a salt

basically, what is it? It's a non-secret string, so it's publicly known. It doesn't

matter if the adversary knows what the salt is, and it's fixed forever. The only

point is that when you chose it, you chose one at random. And then the hope is that

the funny distribution that you're trying to extract from kinda doesn't inherently

depends on the salt that you chose and hence as a result using your salt, you

will actually get a distribution that looks indistinguishable from random. So

essentially the salt, you know, you can just bang it the keyboard a couple of

times when you generate it but it just needs to be something that's random

initially but then it's fixed forever, and it's fine if the adversary knows what

it is and nevertheless the extractor is able to extract the entropy and output a

uniformly random string K. In some sense the salt is only there to defend against

adversarially bad distributions that might mess up our extractor. Okay, so now that

we have extracted a pseudo random key. Now, we might as well just use it in a KDF

that we just saw using a secure pseudo random function to expand the key

into as many bits as we need to actually secure the session. Okay, so there are

these two steps. The first one is we extract a pseudo-random key, and then once

we have a pseudo-random key we already know how to extend it into as many keys as

we need using a pseudo-random function. So the standardized way of doing this is

called HKDF. This is a KDF, a key derivation function that's built from HMAC.

And here HMAC is used both as the PRF for expanding and an extractor for extracting

the initial pseudo-random key. So let me explain how this works. So in the extract

step, we're gonna use our salt which you remember is a public value just happened to

be generated at random at the beginning of time. And we use this salt as the HMAC

key. And then the source key we're gonna use as the HMAC data. So we're kind of

using a public value as a key. And nevertheless, one can argue that HMAC has

extraction properties, such that, when we apply HMAC, the resulting key is going to

look indistinguishable from random, assuming that the source key actually has

enough entropy to it. And now that we have the pseudo random key we're simply going

to use HMAC as a PRF to generate a session key of you know as many bits as we

need for the session keys. Okay. So that basically concludes our discussion of

HKDF. And I just want you to remember that, once you obtain a source key, either

from hardware or from a key exchange protocol, the way you convert it into

session keys is not by using that sample directly. You would never use a source key

directly as a session key in a protocol. What you would do is you will run the

source key through a KDF. And the KDF would give you all the keys and output

that you need, for, the randomness, for the random keys to be used in your

protocol. And a typical KDF to use is HKDF, which is actually getting quite a

bit of traction out there. Okay. The last topic I wanna talk about in this segment

is, how do you extract keys from passwords. These are called password based

KDFs or PBKDFs. The problem here is that passwords have relatively low

entropy. In fact, we're gonna talk about passwords later on in the course when we

talk about user authentication. And so, I'm not gonna say too much here. I'll just

say passwords generally have very little entropy estimated on the order of twenty

bits of entropy, say. And as a result, there is simply not enough entropy to

generate session keys out of a password. And yet we still need to do it very

frequently. We still need to derive encryption keys and MACs and so on out of

passwords, so the question is how to do it. The first thing is, you know, for this

kind of purpose, don't use HKDF. That's not what it's designed for. What will

happen is that the derived keys will actually be vulnerable to something called

a dictionary attack, which we're gonna talk about much later in the course when

we talk about user authentication. So, the way PBKDFs defend against this low entropy

problem that results in a dictionary attack is by two means. First of all, as

before they use a salt, a public, random value that's fixed forever. But in

addition, they also use what's called a slow hash function. And let me describe

kind of the standard approach to deriving keys from passwords. This is called PKCS#5,

and in particular, the version I'll describe is what's called PBKDF1. And I

should say that this mechanism is implemented in most crypto libraries so

you shouldn't have to implement this yourself. All you would do, you know, you

would call a function, you know, derived key from password. You would give the

password in as input, and you would get a key as output. But you should be aware of

course that this key is not going to have high entropy so in fact it will be

guessable. What these PBKDFs try to do is make the guessing problem as hard as

possible. Okay. So the way they work, first of all, is, as we said, they

basically hash, the concatenation of the password and the salt. And then the hash

itself is designed to be a very slow hash function. And the way we build a slow hash

function is by taking one particular hash function, say, SHA-256, and we

iterate it many, many times, C times. So you can imagine 1000 times, perhaps even a

million times. And what do I mean by iterating it? So, well, we take the

password and the salt. And we put them inside of one input to the hash function.

And then we apply the hash function, oops, let me write it like this. And then we

apply the hash function and we get an output, and then we apply the hash

function again and we get another output. And we do this again and again and again

maybe a thousand times or a million times depending on how fast your processors are

and then finally we get the final output that we actually output as, as the key

output of this key derivation function. Now what is the point here? Iterating a

function 10,000 times or even a million times is going to take very little time on

a modern CPU, and as a result, it doesn't really affect the user's experience. The

user types in his password, it gets hashed a million times and then it gets output.

And maybe that could even take, you know a tenth of a second and the user wouldn't

even notice it. However the attacker, all he can do is he can try all the passwords

in the dictionary, because we know people tend to pick passwords in dictionaries,

and so he could just try them one by one, remember the SALT is public, so he knows

what the SALT is. And so he can just try this hash one by one. However because the hash

function is slow, each attempt is gonna take him a tenth of second. So if he needs

to run through a dictionary, you know, with, with a 200 billion passwords in it,

because the hash function is slow, this is gonna take quite awhile. And by doing

that, we slow down the dictionary attack, and we make it harder for the attacker to

get our session keys. Not impossible, just harder. That's all this is trying to do.

Okay, so this is basically what I wanted to say about password based KDFs. As I

said, this is not something you would build yourself. All crypto libraries have

an implementation of a PKCS#5 mechanism. And you would just call the

appropriate function to convert a password into a key, and then use the resulting

key. Okay, in the next segment, we're gonna see how to use symmetric encryption

in a way that allows us to search on the cipher texts.