0:04

In section 1.2, we're going to talk about Hash Pointers and their application.

A hash pointer is a kind of data structure that turns out to be used a lot

in the systems that we're talking about.

And a hash pointer is basically a simple thing,

that we're going to take a pointer to where some information is stored.

And we're going to together with the pointer store a cryptographic hash of

the information.

So whereas a regular pointer gives you a way to retrieve the information.

A hash pointer is gonna let us ask to get the information back.

0:41

And we're gonna draw a hash pointer in diagrams like this.

That we're gonna have each of, and then an arrow that points to something.

So anything drawn like this, think of it as being a hash pointer to this thing.

It's a pointer to where it's stored and

it's also the hash of the value that this data had when we last saw it.

1:01

And we can take hash pointers and

we can use them to build all kinds of data structures.

So a key idea here, take any data structure or link lists or

binary search tree or something like that and

implement it with hash pointers instead of pointers as we normally would.

1:25

So just like a regular linked list where you have a series of blocks and

each block has data as well as a pointer to the previous block in the list,

here the previous block pointer will be replaced with a hash pointer.

So it says where it is and what the value of this entire previous block was.

And we're going to store, we're gonna remember the head of the list like this.

Just as a regular hash pointer.

And a use case for this for a block train like this is a tamper evident log,

that is if we wanna build a log data structure that stores a bunch of data.

So that we can add data onto the end of the log but if somebody goes later and

messes with data that is earlier in the log we're going to be detect it.

That's what temper evidence means.

So to understand why a block chain gives us this tamper evident property.

Let's ask what happens if an adversary wants to go back and

tamper with data later that's in the middle of the chain.

So let's assume that an adversary wants to tamper with this block down here.

He wants to change the data here.

And he wants to do it in such a way that we,

the holders of the hash pointer at the head here, won't be able to detect it.

2:34

So the adversary changed the contents of this block.

And therefore, the hash here which is a hash of this entire block

is not going to mash up because the hash function is collision free,

it must be the case that the hash of this block is now different.

And so we could detect the inconsistency between this data and

the hash pointer that we remembered before or

we could do that unless the advisory allows tampers with the hash pointer.

If he tampers with this hash pointer then he makes these two match up.

But now he's changed the content of this block.

And what that means is that when we come back later and

hash the contents of this block, it's not going to match the hash

that we remembered before because the contents of the block has changed.

And so we're going to detect the inconsistency between

the contents of this block and this hash,

unless the adversary also tampers with the block over here on the right.

But now, when he does that, the hash of this block is not going to match

the hash that we remembered up here and the hash that we're holding on to.

And this the adversary can't tamper with because this is the value that we

remembered as being the head of the list.

And so the upshot of this is that if the adversary wants to tamper with data

anywhere in this entire chain, in order to keep the story consistent he's going to

have to tamper with hash pointers all the way back to the beginning.

And he's ultimately going to run into a road block because he's

wont be able to tamper with the head of the list.

And so what this means is that just by remembering this hash pointer,

we've essentially remembered a kind of hash,

a tamper evident hash of the entire list all the way back to the beginning.

And so we can build a block chain like this containing as many blocks as we want

going back to some special block at the beginning of the list which

we might call the genesis block.

And that's a tamper evidence log built out of the block chamber.

4:22

Now, another useful data structure that we can build using hash pointers

is a binary tree.

We can build a binary tree with hash pointers and

this is called in the jargon, a Merkle tree after Ralph Merkle who invented it.

And the idea is this, suppose we have a bunch of data blocks which we'll draw

across the bottom down here.

We're going to take consecutive pairs of these data blocks and for

these two data blocks we're going to build a data structure here that has two hash

pointers, one to each of these blocks, and similarly all the way across.

We'll then go another level up and

this block here will contain a hash pointer of these two children down here.

And so on, all the way back up to the root of the tree.

And then just like before we're going to remember just the hash pointer

up here at the head of the tree.

And we can then,

if we want traverse down through the hash pointers to any point in the list.

And we can make sure that the data hasn't been tampered with.

Because just like I showed you with the block chain,

if an adversary tampers with some block down here at the bottom with the data

that will cause the hash pointer that's one level up to not match.

So he'll have to tamper with that.

And therefore, he'll have to tamper with the hash pointer one level up from there.

And eventually he'll get up to the top,

where he won't be able to tamper with the hash pointer that we've remembered.

So again, any attempt to tamper with any piece of data across the bottom

will be in short against, by just remembering the hash pointer at the top.

5:50

Now, another nice feature about Merkle trees,

is that unlike the block chain that we built before, that if someone wants

to prove to us that a particular data block is a member of this Merkle tree.

All they need to show us is this amount of data.

So if we remember just the root and someone wants to convince us that this

block is in the Merkle tree, they need to show us this block.

And we can verify that the hash matches up.

And then they need to show us this block and

we can verify that the hash of this matches that.

They can show us this block.

We verify that the hash of this block matches this hash pointer.

And then they show us the data.

And just by verifying the hashes up to the root, we can ensure,

we can verify that this data block was in the Merkle tree.

So that takes about log n items that we need to be shown, and

it takes about log n time for us to verify it.

And so at the very large number of data blocks in the Merkle tree,

we can still verify proven membership in a relatively short time.

6:54

So Merkle trees have various advantages.

One advantage of course, is the tree holds many items but

we just need to remember the one root hash which is only 256 bits.

We can verify membership in a Merkle tree in logarithmic time and logarithmic space.

That's nice.

And there's a variant which is a sorted Merkle tree.

That's just a Merkle tree where we take the blocks at the bottom and

we sort them into some order.

Say alphabetical, lexicographic or numeric order or some order that we agree on.

Having done that, once we've sorted the Merkle tree now,

it's possible to verify non-membership in a Merkle tree.

That is, we can prove that a particular block is not in the Merkle tree.

And the way we do that is simply by showing a path to the item that's just

before where that item would be and just after where it would be.

And then we can say look, both of these items are in the Merkle tree,

they're consecutive.

And therefore there is no space in between them.

There is nothing in between them and so

the thing that we are trying to prove non-membership of can't be there.

Merkle tree is binary search tree,

built with hash pointers, we can do logarithmic time membership proofs,

non-membership proofs if we sort the tree and it is very efficient.

8:05

More generally, it turns out that we can use has pointers in

any pointer-based data structure as long as the data structure doesn't have cycles.

If there are cycles in the data structure,

then we won't be able to make all hashes match up.

If you think about it in an acyclic data structure,

we can sort of start near the lees or near the things that

don't have any pointers coming out of them compute the hashes of those and

then work our way back, sort of towards the beginning.

But in a structure with cycles, there's no end that we can start with and

compute back from.

So for example, a directed acyclic graph, out of hash pointers and

we'll be able to verify membership in that day very efficiently.

And it'll be easy to compute.

So this is a general trick you'll see over and

over throughout the distributed data structures and throughout the algorithms

that we talk about later in this lecture and in subsequent lectures.