0:02

The next thing I want to do is show you the general attack on collision resistant

hash functions. If you remember when we talked about block cyphers. We saw a

general attack on block cyphers which we called exhaustive search. And that attack

forced the key size for a block cypher to be 128 bits or more. Similarly on

collision resistance there is a general attack called the birthday attack which

forces the output of collision resistant hash functions to be more than a certain

bound. So let me show you the attack and we will see what those bounds come out to

be. So here's the general attack that can work on arbitrary collision resistant hash

functions. So here we have our collision resistant hash functions, supposedly, but

lets suppose that it outputs N bit values. In other words, the output space is

roughly of size two to the N. Now, the message space is gonna be much, much

larger than N bits. Let's just say that the messages that are being hashed are

say, you know, a hundred times N bits. I wanna show you an algorithm that can find

the collision for this hash function H in time roughly two to the N over two. Okay,

so roughly the square root of the size of the outputs space. So here's how the

algorithms gonna work: what we'll do is we'll choose random two to the N over two

messages in our message space that's called an M-one to M-two to the N over

two. Now because the messages themselves are much bigger than N bits, they're

hundred times N bits, it's very likely that all these messages are distinct. So

they'll be distinct with high probability. But for each one of these messages we're

gonna apply the hash function and obtain the tag T sub I. So this is of course the

T sub I's are N-bit long strings. And now we're gonna look for a collision among the

T sub I's. In other words, we're gonna find an I and a J such that T sub I equals

to T sub J. And once we've done that we've basically found the collision because, as

we said, with very high probability, M-I is not equal to M-J. But the hash of M-I

is equal to the hash of M-J and therefore we found the collision on the function H.

Now if it so happens that we looked through all the two to the N over two T

sub I's and we don't find the collision, we go back to step one and try another set

of two to the N over two messages. So the question is how well will this work, in

other words how many times do we have to iterate this process until we actually

find the collision? And I wanna show you that in fact the number of iterations is

gonna be very, very small which means that this algorithm will find the collision in

time that's roughly proportional two to the N over two. So to analyze this type of

attack, I have to tell you a little bit about the birthday paradox. I imagine some

of you have already heard about the birthday paradox. Here stated as a

theorem, and I wanna prove it to you because everybody should see a proof of

the birthday paradox at least once in their lives. So here it is, so imagine we

have N random variables R-one to R-N in the interval one to B. And the only thing

I'm gonna assume about them is that they're actually independent of one

another. That's crucial that these N samples R-one to R-N in this interval are

independent of one another. And they also happen to be distributed identically. So

for example, they might all be uniform in the interval one to B, but again these

would be independently uniform variables. Now it so happens that if we set N to B

about the square root of B, in other words if we sample roughly the square root of B

samples from the interval one to B, And to be precise, it's one point two times the

square root of B. Then the probability that two of those samples will be the same

is at least a half. Okay, then it turns out in fact the uniform distribution is

the worst case for the birthday paradox. In other words, if the distribution from

which the R-I's are sampled from is non-uniform, that in fact fewer than one

point two times the square root of B samples are needed. The uniform

distribution is the worst case. So we're gonna prove this for the uniform

distribution and that basically also proves it for all other distributions. But

the proof that I wanna show you here will hold just for the uniform distribution.

Okay, so let's do the proof that's actually not difficult at all. So we're

asking what is the probability that there exists an I that is not equal to J such

that RI is equal to RJ. Well, let's negate that so it's basically one minus the

probability that for all I not equal to J we have that RI is not equal to RJ. This

basically means that no collision occurred among the N samples that we chose. Well

let's try to write this out more precisely. Well we're gonna write this as

one minus. And now when we choose R-one, basically it's the first one we choose so

it's not gonna collide with anything. But now let's look at what happens when we

choose R-two, when we choose R-two, lemme ask you, what is the probability that

R-two. Does not collide with R-one. Well, R-one takes one slot so there are B minus

one slots that if R-two falls into it's not gonna collide with R-one. So in other

words the probability that R-two does not collide with R-one is B minus one slot

divided by all B possible slots. Similarly, when we pick R-three What is

the probability that R-three does not collide with either R-one or R-two? Again,

R-one and R-two take up two slots. And so there are B minus two slots that remain

for R-three if it falls into either one of those B minus two slots it's not gonna

collide with either R-one or R-two. So I imagine you see the pattern now, so R-four

it's probability of not colliding with R-one, R-two, or R-three is B minus three

over B. And so on and so forth until we get to the very last R-N and the

probability that R-N does not collide with the previous R-I's, well, there are N

minus one slots taken up by the previous R-I's. So, if R-N falls into any of the

remaining B minus N plus one slots It's not gonna collide with any of the previous

R-one to R-N minus one. Now you notice that the reason I was able to multiply all

these probabilities is exactly because the R-I's are all independent of one another.

So it's crucial for the step That the R-I's are independent. So let me rewrite

this expression a little bit. Let me write it as one minus the product of I goes to N

minus one of one minus I over B. Okay. All I did is I just rewrote this as a big

product as opposed to writing the terms individually. So now I'm gonna use the

standard inequality that says that for any positive X, one minus X is less than E to

the minus X. And that's actually easy to see because E to the minus X, if you look

at it's Taylor expansion, is one minus X plus X squared over two minus and so on

and so forth. And so you can see that we're basically ignoring this latter part

of the Taylor expansion, which happens to be positive and as a result the left side

is gonna be smaller than the right-hand side. Okay, so let's plug this inequality

in, and what do we get? We get that this is greater than one minus the product of I

goes from one to N minus one of E to the minus I over B okay, simply plugged in X

equals I over B for each one of those terms. And the next thing about

exponentials is that we multiply them, the exponents add. As a result this is simply

equal to one minus E to the power of, here let me take the one over B out of the

parentheses, sum of I goes from one to N minus one of I. Okay. So, all I did was I

took the minus one over B out of the parentheses and we're left with simple sum

of one to N minus one. And so the sum of the integers from N to N minus one is

simply N times N minus one over two which it can bound by N squared over two. And so

really what I get at the end here is one minus E to the power of minus N squared

over two B. Okay, I literally downed it to sum here by N squared over two. Okay, very

good. So now what do we do about N squared over two B? Well, we can derive exactly

what N squared over two B is from the relationship here. So if you think about

it, let's look at N squared over two. Well, N squared over two is 1.2 squared

over two. 1.2 squared is 1.44 divided by two is .72 times the square root of B

squared which is B. Okay so N squared over two is .72B, and as a result, N squared

over 2B is just .72. So we get 1-E, which is a power of minus 0.72. Well so now you

just plug this in to your calculator and you get 0.53, which as far as i know, is

always bigger than F. So this proves the Birthday Paradox and you notice that it

was crucial to a string that the samples are independent of one another, we only

proved that for the uniform distribution. But as i said it is actually fairly easy

to argue now that any distribution that is away from the uniform distribution will

satisfy this with even a lower balance. So if you have. 1.2 squared of B, the theorem

will certainly hold for none uniform distributions. The reason it is called a

paradox is because it is very paradoxical that the square root function grows very

slowly. In particular if you try to apply this to birth dates, so lets assume that

we have a number of people in a room, and lets assume that their birth dates are

independent of one another and are uniform in their interval one through 365. Then

what the Birthday Paradox says is that we need roughly 1.2 times the square root of

365. Which i believe is something like 23, which says we need roughly 23 people in a

room, and then with probability one half, two of them will actually have the same

birth date. The reason it is called a paradox is because the number 23 seems

really small to people, and yet by this theorem we just proved, with probability

one half, there will be two people with the same birth date. By the way the

intuition for why this fact is true is because really what the collision is

counting is it is looking at all possible pairs of people. And if you have a square

root of B people, then there are square root of B squared pairs. Roughly, Which is

roughly B pairs total and so it's very likely that each pair collides probability

one over B and if you have B pairs, it's likely that one of the pairs will collide.

So hopefully this gives the intuition for where the square root comes from. Its

basically from the fact that if you have N people in the room, there are N squared

possible pairs. I should say by the way that even though the Birthday Paradox is

often applied to birth dates, birth dates are actually not uniform at all. If you

actually look at the birth dates of people, you see that there is a very clear

bias towards being born in September, and for me bazaar reason there is also a

biased towards being born on a Tuesday. And so when we apply the birthday paradox

to birthdays, in fact the actual bound is going to be smaller than one minus two

square root of B because we said that for non even form distributions you need even

fewer samples before you get a collision. So let me show you how to graph the

Birthday Paradox, its quite interesting how it behaves. So here set B to be a

million, in other words we are picking random uniform samples in the range one to

a million. And the X axis here, is the number of samples that we are picking, and

the Y axis, is the probability that we get a collision among those N samples. So we

see that the probability of one half happens around 1.2 square root of B.

Roughly twelve hundreds, 1.2 square root of B. And you see that if we look at

exactly the square root of B, the probability of a collisions is around .4

or .41. And you notice that the probability goes up to one extremely fast.

For example, already at roughly two square root of B, but the probability of a

collision is already 90%. Similarly, when we go bellow square root of B, the

probability of a collision very, very quickly drops to zero. Okay, so there is

kind of a threshold phenomena around square root of B, where the probability of

a collision goes to one very quickly, above square root of B drops to zero very

quickly below square root of B. So now we can analyze are attack algorithms, so let

me remind you, here we chose, two to the interval two random elements in the

message space, we hashed them. And so we started off with a random distribution in

the message base, after we hashed them, we end up with some distribution, this

distribution over tags might note be uniform, but we don't care, the point is

that all these tags T1 to T2 to the N over two, are independent of one another. If we

choose. Two to the N over two or 1.2 to the N over two, tags, the probability that

the collision will exist is roughly one half. So let me ask you then, in that

case, how many times do we have to iterate this algorithm before we actually find a

collision? Well then since each iteration is going to find a collision with

probability one half, we have to iterate about two times in expectation. And as a

result the running time of this algorithm is basically two to the N over two

evaluations of the hash function. Okay so notice also this algorithm takes a lot of

space but we're gonna ignore the space issue and we're just gonna focus on the

running time. Okay so this is kinda cool, this says that if your hash

function outputs N-bits outputs there will always be an attack algorithm

that runs in time two to the N over two. So for example if we output 128 big

outputs Then a collision could be found in time two to the sixty four, which is not considered

sufficiently secure. Okay, this is why collision resistant hash functions,

generally are not going to output 128 bits. So let me show you some examples.

The first three are actually federal standards, SHA-1, SHA-256, and SHA-512

and the fourth one is an example from the designers of AES, called Whirlpool,

and so you can see SHA-1 outputs 160 bits and as a result there is a generic

attack on it that runs on time two to the eighty. SHA-256 outputs 256

bits so the generic attack will run in time two to the 128 and so on and so

forth. I also listed the speed of these algorithms and you notice that the bigger

the block size actually the slower the algorithm is [and?] so there's a

performance penalty but nevertheless there's quite a bit of benefit in terms of

security. Now you notice the SHA function is actually greyed out. The

SHA function although nobody has found collisions for SHA-1 yet

It is still recommended that in a new project, and hopefully programing projects

in this class, you don't use SHA-1, instead you use SHA-256. In particular

there is actually a theoretical collision finder on SHA-1 that works in time two

to the 51. So it is probably just a matter of time until someone finds a collision

for SHA-1, and just kills altogether, but the current advice is that SHA-1 is

still a collision resistant hash function because nobody has found collisions for

it, but it is probably just a matter of time, just a few months, or few years,

until a collision is found. It is a result a new product and new projects SHA-1

should not be used and only use SHA-256 or if you wanna be even

more cautious then use SHA-512. So this is the end of this

segment, and now we are going to turn building collision resistant hash