0:00

In the next three segments we will change gears a little bit and talk about the

definition of a PRG. This definition is a really good way to think of a PRG. And we

will see many applications for this definition. So consider a PRG with

keyspace A that ouputs N bit strings. Our goal is to define, what does it mean for

the output of the generator to be indistinguishable from random? In other

words, we're gonna define a distribution that basically is defined by choosing a

random key in the keyspace. Remember that a arrow with R above it means choosing

uniformly from the set script K. And then we output, basically, the output of the

generator. And what we'd like to say. Is that this distribution. This distribution

of pseudo random strings is indistinguishable from a truly uniform

distribution. In other words, if we just choose a truly uniform string in 01 to the

N and simply output this string, we'd like to say that these two distributions are

indistinguishable from one another. Now if you think about it, this sounds really

surprising because if we draw a circle here of all possible strings in 01 to the

N, then the uniform distribution basically can ouput any of these strings with equal

probability. That's the definition of the uniform distribution. However a

pseudo-random distribution generated by this generator G. Because the seed space

is so small, the set of possible outputs is really, really small, it's tiny inside

of, 01 to the N. And this is really all that the generator can output. And yet,

what we're arguing is that an adversary who looks at the output of the generator

in this tiny set can't distinguish it from the output of the uniform distribution

over the entire set. I think that's the property that we're actually shooting for.

So to understand how to define this concept of indistinguishability from

random, we need the concept of a statistical test. So, let me define what a

statistical test on 01 to the N is. I'm gonna define these statistical tests by

the letter A. And the statistical test is basically an algorithm that takes its

inputs and N bit string, and simply outputs zero or one. Now I'll tell you

that zero, we're gonna think of it as though the statistical test said, the

input you gave me is not random. And one, we're going to think of it as saying that

the imput you gave me actually is random. Okay, so all this statistical test does is

it basically takes the input x that was given to it, the n bit string that was

given to it, and decides whether it looks random or it doesn't look random. Let's

look at a couple of examples. So the first example basically will use the fact that

for a random string, the number of zeros is roughly equal to the number of ones in

that string. In other words, the statistical test is going to say one. If

and only if basically the number of zeros in the given string X minus the number of

1's in the given string X. These numbers are not too far apart. In other words, the

difference between the number of 0's and the number of 1's. Let's just say is less

than ten times square root of n. Okay If the difference is less than ten times, the

statistical test will say hey the string X looks random. If the

difference happens to be much bigger than ten times square root of n, that starts to

look suspicious and the test, hey the string you gave me does not

look random. A statistical test. Let's look at another similar example. We'll say

here, the statistical test will say one. If and only if say the number of times

that we have two consecutive zeros. Inside of X. But let's think about this for a

second. This basically again counts. In this string of, n bits. It counts a number

of times that we see the pattern zero, zero. Two consecutive zeros. Well for a

random string. We will expect to see 0,0 as probability one fourth. And there for

in a random string. We'll expect about N over four 0,0's. Yeah, N over four blocks

a 0,0. And so, what the statistical test will do is it will say, well, if the

number of zero zeros is roughly N over four. In other words, the difference

between the number and N over four, is, say, less than ten square root of n,

then we will say that X looks random. And if the gap is much bigger than N over

four, we'll say, hey, this string doesn't really look random. And then the

statistical test will output zero, okay? So here are two examples of statistical

tests that basically, for random strings, they will output one with very high

probability. But for strings that, you know, don't look random, for example,

think of the all zero string. So the all zero string, neither one of these tests

will output, one. And in fact, the all zero string does not look random. Let's

look at one more example of the statistical test just to kinda show you

the, basically statistical test can pretty much do whatever they want. So here's the

third example. Let's say that statistical test output one if an only if I say the

biggest blocks what we'll call this the Maximum Run of zero inside of the string

x, this is basically the longest sequence of zero inside of the string x. In a

random string you expect the longest sequence of zeros to be roughly of length

log N. So we'll say if the longest sequence of zero happens to be less than

ten times log N Then this test will say that X was random. But if, all of a

sudden, we see a run of zeros that, say, is much bigger than ten log N, then the

statistical test will say, the string is not random, okay? So this is another crazy

thing that the statistical test will do. By the way, you notice that if you give

this test, the all one string. So one, one, one, one, one. This test will also

output one. In other words this test will think that the all one string is random.

Even though it's not. Yeah, even though one string is not particularly random.

Okay, so statistical tests don't have to get things right. They can do whatever

they like. They can test, they can decide to output random or not. You know, zero or

one, however they like. And similarly, there are many, many, many, many other

statistical tests. There are literally hundreds of statistical tests that one can

think of. And I can tell you that in the old days, basically, the way you would

define. That something looks random. As you would say, hey, here's a battery of

statistical tests. And all of them said that this string looks random. Therefore,

we say that this generator that generated the string is good generator. In other

words, this definition, then, uses a fixed set of statistical tests, is actually not

a good definition for security, but more generally, for crytpo. But before we talk

about actually defining security, the next thing we talk about is how do we evaluate

whether a statistical test is good or not? So to do that, we define the concept of

advantage. And so let me define the advantage. So here we have a generator

that outputs N bit strings. And we have a statistical tests on N bit strings. And we

define the advantage of this generator, as denoted by advantage sub PRG,

the advantage of the statistical test A relative to the generator g. I'll define

it as follows, basically the difference between two quantities. The first quantity

is basically, we ask how likely is this statistical test to output one. When we

give it a pseudo random string just like here K is chosen uniformly from the C

space we ask how likely is the statistical test to output one when we give it a

pseudo random output generated by the generator verses now we ask how likely is

the statistical test to output one when we give it a truly random string. So here are

is truly random in zero random one to the n. Okay, and yeah. We look at the

difference between these two quantities. Now you realize because these are

differences of probabilities this advantage is always going to lie in the

interval zero, one. So let's think a little bit about what this advantage

actually means. So first of all if the advantage happens to be close to one. Well

what does that mean. That means that somehow, the statistical test A behaves

differently when we gave it pseudo-random inputs, when we gave it the output of the

generator, for when we gave it truly random inputs, right? It somehow behaved

differently. In one case, it output one with a certain probability. And in the

other case, it output one with a very different probability, okay? So somehow,

it was able to behave differently. And what they really means is that the

statistical test could basically distinguish the output of the generator

from random. Okay, so in some sense we'll say that this statistical test broke the

generator G because it was able to distinguish the output from random.

However if the advantage is close to zero Well what does that mean. That means that

basically the statistical tests behaves pretty much the same on pseudo random

inputs. As it does on truly random inputs. And basically there we would say that A

could not distinguish the generator from random. Okay, so this sum gives you a

little bit of intuition about why this concept of advantage is important. It

basically tells us whether A was able to break the generator, namely distinguish it

from random, or not able to break it. So let's look, first of all, at a very silly

example. Suppose we have a statistical test A that simply ignores its inputs and

always outputs zero. Okay. Always output zero. What do you think of the advantage

of this statistical test relative to a generator G? So, I hope everybody

say the advantage is zero, let me just explain, why that's the case,

well, if the statistical test, always outputs, zero, that

means, pseudo random inputs, it will never output one, so, the probability that

outputs one, is zero. Similarly, when we give a truly random input, it still will

never output one, and, so, the probability that outputs one, is zero,

and, so, zero minus zero is zero, so, its advantage is zero, so, basically, and, a

statistical test that ignores its, its input, does not able to distinguish, truly

random inputs, from, a pseudo random input, obviously. Okay, now let's look at

a more interesting example. So suppose we have a generator G that satisfies a funny

property. It so happens that for two-thirds of the keys The first bit

of the output of the generator happens to be one, okay? So if I choose a random key

with probability two-thirds, the generator will output one as its first bit, okay? So

that's the property of the generator that we're looking at. Now, let's look at the

following statistical test. The statistical test basically says, if the

most signifigant bit of the string you gave me is one, I'm gonna say one, meaning

I think it's random. If the most signigant bit of the stream you gave me is not one,

zero, I'm gonna say zero. Okay so now my question to you is what is the

advantage of this statistical test on the generator G? Okay, so remember I just

wrote down the definition here again. And I'll let you think about this for a

second. So let me explain. Suppose we give the statistical tests pseudo random

inputs. By definition of G, we know that with probability two-thirds, the first

bits in the inputs will start with the bit one. But if it starts with a bit one, then

the statistical test will output one. In other words, the probability that this

statistical test outputs one is exactly two-thirds. Now let's look at the case of

a random string. If I give you a random string, how likely is it that the most

signifigant bits of the random string is one? Well, for a random string, that

happens exactly half the time, and so in this case the statistical test will output

one, with probability one-half. And so the overall advantage is one-sixth, and

one-sixth is actually a non-negligible number, that's actually a fairly large

number, which basically means that this a was able to distinguish the output. We'll

say that a breaks the generator G with advantage 1/6. Okay, which basically

means that this generator is no good, is broken. Okay, so now that we understand

what statistical tests are, we can go ahead and define, what is a secure

pseudo-random generator. So, basically, we say that, as generator G is secure, if

essentially no efficient, statistical tests can distinguish its output from

random. More precisely, what we'll say is that, basically for all efficient

statistical tests, a... Statistical tests, a... It so happens that if I look at the

advantage. Of the statistical test E relative to G. This advantage basically is

negligible. So, in other words, it's very close to zero, and as a result,

this, statistical test was not able to distinguish the output from random, and

that has to be true for all statistical tests. So, this is a very, very pretty and

elegant definition, that says that a generator is secure, not only if a

particular battery of statistical tests says that the output looks random, but, in

fact, all efficient statistical tests will say the output looks random. Okay? One

thing I'd like to point out is, that the restriction to efficient statistical tests

is actually necessary. If we ask that all statistical tests, regardless of whether

they're efficient or not, not be able to distinguish the output from random. Then

in fact, that can not be satisfied. So in other words if we took out the

requirements that the test be efficient. Then this definition would be

unsatisfiable. And I'll leave this as a simple puzzle for you to think about. But

basically the fact is that restricting this definition into only efficient

statistical tests is actually necessary for this to be satisfiable. So now that we

have a definition, the next question is can we actually construct a generator and

then prove that it is in fact a secure PRG. In other words, prove that no

efficient statistical test can distinguish its output from random. And it turns out

that the answer is we actually can't. In fact, it's not known. If there are any

probably secure PRG's. Then I will just say very briefly the reason is that if you

could prove that a particular generator is secure that would actually imply that P

is not equal to NP. And I don't want to dwell on this. Because I don't want to

assume that you guys know what P and NP are. But I'll tell you as a simple fact

that in fact in P is equal to NP. Then it's very easy to show that there are no

secure PRGs. And so if you could prove to me that a particular PRG is secure, that

would imply that P is not equal to NP. Again, I will leave this to

you as a simple puzzle to think about. But, even though we can't actually

rigorously prove that a particular PRG is secure, we still have lots and lots and

lots of heuristic candidates, and we even saw some of those in the previous

segments. Okay now that we understand what is a secure PRG. I want to talk a little

bit about some applications and implications of this definition. And so

the first thing I want to show you is that in fact a secure PRG is necessarily

unpredictable. In a previous segment, we talked about what it means for a generator

to be unpredictable. And we said that, basically, what that means is that, given

a prefix of the output generator, it's impossible to predict the next bit of the

output. Okay, so we'd like to show that if a generator is secure, then necessarily,

it means it's unpredictable. And so the only way we're gonna do that is using the

contrapositive. That is, we're gonna say that if you give me a generator that is

predictable, then necessarily, it's insecure. In other words, necessarily, I

can distinguish it from random. And so let's see, this is actually a very simple

fact. And so let's see how we would do that. So suppose you give me a predictor.

In other words, suppose you give me an efficient algorithm, such that, in fact,

if I give this algorithm the output of the generator, but I give it only the first

I-bits of the outputs. It's able to predict the next bit of the output. In

other words given the first I-bit it's able to predict the I plus first bit. And

it does that with a certain probability. So let's say if we choose a random k. From

the keyspace. Then, clearly, a done predictor would be able to predict the

next bit with probability one-half, simply just guess the bits. You'll be right with

probability one-half. However, this algorithm A is able to predict the next

bit with probability half with epsilon. So it's bound to the way. From a half. And,

in fact, we require that this by true for some non negligible epsilon. So, for

example, epsilon =1/1000 would already be a dangerous predictor, because it can

predict the next bits, given a prefix, with non negligible advantage. Okay, so

suppose we have such an algorithm. Let's see that we can use this algorithm to

break our generator. In other words to show that a generator is distinguishable

from random and therefore, is insecure. So what we'll do is we'll define a

statistical test. So, let's define the statistical test B as follows. Basically,

B, given a string, x, what it will do, is it will simply run algorithm A on the

first I-bit of the string x that it was given. And, statistical test b is simply

gonna ask, was a successful in predicting the I-plus first bit of the string? If it

was successful, then it's gonna output one. And if it wasn't successful, then

it's gonna output zero. Okay. This our statistical task. Let's put it in a box So

we can take it wherever we like. And we can run the statistical test on any N bit

string that's given to us as inputs. So now, let's look at what happens. Suppose

we give the statistical test, a truly random string. So a truly random string R.

And we ask, what is the probability that the statistical test outputs one? Well,

18:07

is going to be equal to some random bit X I+1. Random independent bit X I+1, that

probability is exactly 1/2. In other words, algorithm a simply has no

information about what the bit X I+1 is, and so necessarily, the probability is

able to predict X I+1 is exactly one half. On the other hand, let's look at

what happens when we give our statistical tests a pseudo-random sequence, okay. So

now we're going to run the statistical test on the output of the generator, and

we ask how likely is it to output one. Well, by definition of A, we know that

when we give it the first I bits of the output of the generator, it's able to

predict the next bit with probability 1/2 + epislon. So in this case our

statistical test B will output one with probability greater than 1/2 + epsilon

And basically what this means, is if we look at the advantage of our

statistical tests over the generator G it's basically, the difference between

this quantity and that quantity. There's a difference between the two. You can see

that it's clearly greater than an epsilon. So what this means is that if algorithm A

is able to predict the next bits with advantage epsilon, then algorithm B is

able to distinguish the output of the generator with advantage epsilon. Okay? So

if A is a good predictor, B is a good statistical test that break the generator.

And as we said, the counter-positive of that is that if G is a secure generator,

then there are no good statistical tests. And as a result, there are no predictors.

Okay? Which means that the generator is, as we said, unpredictable. Okay, so, so

far, what we've seen is that if the generator is secure, necessarily, it's

impossible to predict the I+1 bit, given first I bits.

Now there's a very elegant and remarkable theorem Yao back in 1982. They

chose it, in fact the converse is also true. In other words, if I give you a

generator that's unpredictable, so you cannot predict the I+1 bits from the

first I bits, and that's true for all I. That generator, in fact, is secure. Okay,

so let me state the theorem a little bit more precicely. So here we have our

generator that outputs n bit outputs. The theorem says the following, basically for

all bit positions, it's impossible to predict I+1 bit of the output

given the first I bit. And that's true for all I. In other words, again, the

generator is unpredictable for all bit positions. Then, that, in fact, implies

that the generator is a secure PRG. I want paraphrase this in English,

and so the way to kinda interpret this result is to say that it's basically these

next bit predictors. These predictors that try to predict the I+1 bit given the

first I bits. If they're not able to distinguish G from random, then, in fact,

no statistical test is going to distinguish G from random. So kind of next

bit predictors are in some sense universal, predictors, when it comes to

distinguishing things from random. This theorem, by the way, it's not too

difficult to prove, but there's a very elegant idea behind its proof. I'm not

gonna do the proof here, but I encourage you to think about this as a puzzle, try

to kind of try to prove this theorem yourself. Let me show you kind of one cute

implication of this theorem. So let me ask you the following question. Suppose I give

you a generator and I tell you that given the last bit of the output. It's easy to

predict the first bit of the outputs, okay? So given the last end bits, you can

compute the first end bits. That's kind of the opposite of predictability, right?

Predictability mean given the first bit, you can produce the next bits. Here,

given the last bits, you can produce the first ones. And my question to you, does

that mean that the generator is predictable? Can you somehow, from this

fact, still build a predictor for this generator? This is kind of a simple

application of Yao theorem let me explain to you the answer is actually yes let me

explain why how do we build this generator well, actually we're not going to build it

I'm going to show you that the generator exists. Well because an over two bits

first an over two bits doesn't necessarily mean that the generator here let me write

them this way what it means is that g is not secure. Because just as we did before

it's very easy to build a statistical test that will distinguish the output of G from

uniform. So G is not secure. But if G Is not secure, by Yao's Theorem, that means that

G is predectible. So in other words, there exists some I for which given the first I

bits of the output, you can build the I+1 bits of the output. Okay, so

even though I can't quite point to you a predicter, we know that a predicter must

exist. So that's a one cute simple application of Yao theorem. Now before we

end the segment I want to kind of, generalize a little bit of what we did.

And introduce a little bit of important notation that's going to be useful

actually throughout. So, we're gonna generalize the concept of

indistinguishability from uniform, to indistinguishability of two general

distributions. So, suppose I give you p1 and p2, and we ask, can these two

distributions be distinguished? And so we'll say that the distributions are

computationally indistinguishable, and we'll denote this by p1, a squiggly p. P2.

This means that, in polynomial time, P1 cannot be distinguished from P2. And we'll

say that they're indistinguishable, basically, just as before

if basically for all, efficient, statistical tests, statistical tests. A it so happens

that if I sample from the distribution P1. And I give the output to A. Versus if I

sample from the distribution P2, and I give the sample to A. Then basically A

behaves the same in both cases. In another-wards the difference between these

two probabilities is negligible. And this has to be true for all statistical tests.

For all efficient statistical tests. Okay? So if this is the case then we say that,

well A couldn't distinguish it's advantage in distinguishing two distributions is

negligible and if that's true for all efficient statistical tests then we say

that the distributions are basically computationaly indistinguishable,

because an efficient algorithm cannot distinguish them. And just to show you how

useful this notation is, basically using this notation the definition of security

for PRG just says. That if I give you a pseudo-random distribution. In other

words, I choose K at random, and that outputs a G of K. That distribution is

computationally indistinguishable from the uniform distribution. So you can see this,

this very simple notation captures the whole definition of pseudo-random

generators. Okay, so we're gonna make use of this notation. In the next segment,

when we define, what does it mean for a cipher to be secure.