0:00

Hi, in this video we will prove that the set of functions we suggested to you

Â with integers is really a universal family of hash functions.

Â This video will be heavy on math but it is optional.

Â We will use properties of prime numbers of modular arithmetics.

Â We will use the concept of one to one correspondence.

Â Properties of upper integral parts of real number and probabilities of course.

Â 0:24

So the theorem is that the set of functions was suggested polygraphic H with

Â index B is the set of all hash functions indexed by B,

Â A, and B where B is a prime number and A and B are parameters.

Â And those parameters generate hash functions.

Â So the formula for the hash function is on the side, and parameters a and

Â b are from one to p-1 and from zero to p-1, respectively.

Â So this is a family of hash functions which contains p multiplied by p-1,

Â different hash functions for different bayers ab.

Â So these functions is the universal family for

Â the universe of keys, which are integers from 0 to p- 1.

Â And of course, it will be also universal family for any sub-set of this universe.

Â 1:29

And fix some keys, x and y which are different, then, the intermediate values,

Â which would compute while computing the hash value of keys x and y.

Â So ax + b modular p and ay + b modular p,

Â those are the intermediate values before we take the value modular m and

Â get the actual value of the hash function from our family.

Â So those values are always different for different keys in the universe.

Â We will prove this Lemma by contradiction.

Â So suppose is equal to s or (ax + b) is equal to (ay +b) mod p.

Â We subtract the right part from the left part and

Â we see that a(x-y) is equal to 0 mod p.

Â And that means that b divides product of a and x minus y.

Â Now recall that b is a prime number and so

Â it means that either b divides a or that b divides x minus y.

Â But a is between 1 and p minus 1 so a is a positive integer less than p and

Â p is prime so p cannot divide a and thus p must divide x minus y.

Â 2:44

But x and y are between 0 and p minus 1.

Â So their difference is, by absolute value, less than p.

Â And p divides x minus y.

Â So, the only value for x minus y, for which it is possible is 0.

Â x minus y is 0.

Â And so x is equal to y.

Â But this contradicts our initial assumption that x and

Â y are different keys.

Â And so we prove by contradiction that r is not equal to s.

Â 3:19

which is without taking modular m, there are no collisions.

Â And an important corollary is that for

Â a hash function, which is before taking modulo m,

Â there are no collisions for the universe of integer keys from 0 to p minus 1.

Â Now we need another lemma.

Â That for any pair of different keys x and y,

Â there is a 1 to 1 correspondence between pairs of numbers a and

Â b which generally different hash functions in our family.

Â And pairs of numbers r and s, which are the intermediate values in

Â the computation of hash function of keys x and y.

Â 4:02

Let's prove that.

Â First know that different pairs a, b generate different pairs r, s.

Â And that is because if we have some pair r, s we can

Â uniquely solve for a and b module b with these formulas.

Â So if there is any other pair, a prime b prime which leads to the same r and

Â s, then actually a prime and b prime have to be equal to a and

Â b respectively because of these equations.

Â So now we know that different pairs a and b generate different pairs r, s.

Â 4:39

And we know that the total number of pairs a,b is p multiplied by p-1 because

Â there are p-1 values for a and independently p values for b.

Â So total number of pairs is p multiplied by p-1.

Â And also, it turns out that the total number of pairs r,s is also p multiplied

Â by p-1 Because the total number of pairs r,

Â s with values from 0 to p- 1 is p squared.

Â But r must be different from s so we must subtract p from that,

Â and that will be p squared- p which is the same as p multiplied by p- 1.

Â So the number of pairs a, b is the same as the number of pairs r, s.

Â And different pairs a,b led to different pairs (r,s) and that means that

Â there is a one to one correspondence between these pairs and these pairs.

Â And the corollary from that is that if we select some pair of different keys,

Â x and y and we select any hash function at random.

Â For all our family with equal probability, which is one of p multiplied

Â by p minus one, because they're all, p multiplied by p minus 1,

Â different hash functions in our family, then each pair of values are S.

Â Which are intermediate values, happen with equal probability,

Â again, one over P multiplied by p- 1 so why is that?

Â Well we know that there's a one to one correspondence between pairs a, b and

Â pairs r, s and selecting the hash function from our family is the same

Â selecting random pair a, b we know that the probability of any pair a,

Â b is one over p by p -1.

Â And so the probability of any pair r,s is the same as the probability of

Â the corresponding pair a,b.

Â So it is always 1/p(p- 1).

Â So we proved our corrollary.

Â Now let's prove the initial theorem.

Â The probability of collision for a fixed pair of keys, x and y, which are different

Â is the same as the probability that r module m is the same as s module m.

Â We know that r is different from s, but

Â they still can be equal module m because m can be less than Pm.

Â In most cases it is actually less than b.

Â 7:21

Know that for each fixed r from 0 to p minus 1, there are at most.

Â Upper integral part of p over m,

Â minus 1 such value of s that s is different from r and

Â r modular m is the same as s modular m.

Â So why is that?

Â Well, let's assume, without loss of generality, that r modular m is 0.

Â And look at the roll of numbers from 0-P-1

Â than the numbers which have the same remainder

Â module m as r are 0 M, 2M, 3M, 4M and so

Â on up to a paranted real part of P over M-1 multiplied by M.

Â We cannot multiply the next integer by m because that is already

Â bigger than p minus 1, so in total there are part of pure m,

Â different integers between 0 and

Â p minus 1 which have the same remainder modular m as r.

Â But, s also needs to be different from r.

Â So, the number of such s', less by 1.

Â 8:35

And now, the probability of collision is equal to probability r module m,

Â equal to s module m, and that is, at most, some of probabilities

Â of all such pairs are s for which this equality holds.

Â So we sum over all different values of r from 0 to p-1 and

Â 9:52

And then, when we make the common denominator, and remove

Â all the things that cancel out, we see that it is equal to 1 over m.

Â So we just proved that the probability of

Â r modular m being the same as s modular m is, at most, 1 over m.

Â And that proves that the probability of collision for

Â some fixed pair of different keys, x and y is, at most, 1 over m.

Â And that is the property of universal family that we needed to prove.

Â So we finally proved our theorem.

Â In conclusion, we've just proved that a suggested family is

Â really a universal family for integers, but now the important thing

Â that integers are not unbounded on the integers from 0 to p- 1.

Â So if you need to hash really, really big integers,

Â you will need to select a prime number that is bigger, even bigger than

Â the largest of those numbers and then you will get universal family for

Â that prime number and it will work for your integers.

Â We also proved in the previous video the upper bound for

Â expected chain length using universal family and constant amortized expected

Â running time of operations with hash table, also provided in the last lecture.

Â So now we have the whole picture.

Â We have a universal family for integers.

Â And we have proofs that if you use a universal family,

Â that your operations work fast.

Â So now we are confident that if you use the suggested family for

Â integers, then the operations with the hash tables storing

Â those integers will run fast in practice, at least on average.

Â