0:00

And this hybrid model between random and regular graph is called Watts Strogatz

graph. It's actually parametrized by two numbers,

c and p. Remember were using p to parametrize

random graphs and c to parametrize regular ring graph.

And this is the probability of a link appearing and this is the deterministic

degree for each of all the nodes. Now we're gonna use both c and p.

We start out constructing Watts Strogatz graph by constructing a regular graph.

So, you've got a lot of short range neighbors. Just like before,

0:57

And then, here is the randomization step. Randomization has been used in the past in

our lectures and will be used a couple more times in later lectures.

In today's lecture, the randomization happens by saying that give me any short

range link that's existing in this regular graph.

What I am going to do is to look at this short range link and delete it and instead

of this short range link I'm going to have a draw from probability, probability p,

that any pair of nodes, including those far apart will be connected by a link.

1:51

And in the analysis that we will be doing, we'll make a slight modification to say

that the existing links will still be there. Okay.

So that means we do not delete, we just add our links. This will help make the

analysis in the advance material part of the video a little easier. And therefore,

if we have 100 short range lengths that give us 100 chances to establish possibly

long range lengths. So the question is, how much randomization

do we need? Now we know that, if there is too much

randomization, then it's going to look like a random graph.

And therefore it will not have a large clustering coefficient.

But if it's too little randomization, the number of long range lengths will be so

small that will be so close to a regular ring graph.

3:04

So as it turns out that a little bit of randomization,

A small p suffices. Okay? So here, I'm plotting p on log scale

from 100%. To ten percent to one percent and so on,

0.1%.. As you can see that when p is sufficiently

large, then if you look at these two curves, this is the clustering

coefficient, capturing triad closure, then would be too small.

Whereas, on the other hand if randomization is too small, l, the median

shortest path will still be too high. What we want is for the c curve to bend

over, to maintain at a high spot without bending over and we want l curve to bend

over. So we want and, and everywhere this part

of the curve. And around this part of the curve, so the

intersection is somewhere in-between. This space is a good range.

Okay? Now, this is a numerical example for n600,

= 600, 600 nodes. And c6. = six, meaning that each node has a degree

six in the regular ring graph that start at Watts Strogatz graph,

Meaning three on the left and three on the right. Okay?

For this particular example, it turns out that p in the range of point one to 0.1,

5:20

. Now, is it a coincidence that, for this

numerical example this range of p point, from a 0.1 to 0.01 suffices to give us

both of the desirable futures. It turns out that's not the case.

We can actually write down, as we were doing in the advanced material part of the

section the following expressions for c and l.

Turns out c is basically behaving like this for Watts Strogatz graph that we just

talked about. Okay. So, function of both c and p, and is

impact, the impact of p is the clustering coefficient of large C, so you can see it

goes roughly like, one over one plus p, 1p is small.

On the other hand, l, is approximated by many ways.

Over of them says it's approximately like a log of ncp over C squared p.

So, as a function of n, it grows like log n just as we desired.

7:11

extremal quantity. This is the median of the shortest

distance. I am not talking about all the path

between two nodes. I'm talking about the shortest path

between two nodes and looking at a median of that quantity.

In contrast, the C, the clustering coefficient is about an average quantity.

It is talking about the percentage of triad closure out of all the connected

triples in the entire graph divided by three for normalization.

I am not talking about an extremal quantity.

This is an average quantity, average across the entire graphs. And there lies

the fundamental reason Watts Strogatz graph idea or idea of little bit

randomization or little bit of random long range lengths suffices.

8:16

The reason is because a little of randomization leading to some long range

link is not enough to destroy an average quantity. and therefore. you still

maintain the large clustering coefficient in a regular graph.

And yet is suffices to shorten some path between any two node and that's enough to

reduce the shortest the distance and, l is only about the shortest the distance.

Most of the path are still long, that does not matter.

We now got some shortcut long range lengths going from this to this for

example. So, instead of hopping all the way,

locally I can have one shortcut to put me very close to the destination.

So, I don't mind if the other long paths are still there, because l is only talking

about the median of the shortest path. And therefore, you may wonder, ,, since

that's Watts Strogatz model works to explain structural small world,

Why not we make L define L as the median of the average distances between nodes

pairs rather than the median of the shortest of distances?

Then it will be a fair definition, fair in the sense L and C will be standing on the

same footing. Well it turns out that we don't need to.

Why don't we need to? Because we don't need all the paths or the

average of the path, linked between two nodes to be short.

When we route, as long as we can find a short path and a smart enough routing

method, it's going to pick out that path. If the functionality implicitly modeled

here is routing and all we need is some short path.

But that means, beneath this definition of l is a very basic assumption that says,

you can actually have a routing algorithm that can find, such a short path.

Maybe not a shortest, but a very short path.

Now we just talked about a greedy routing method on social network.

We call that greedy social search based on some composite distance metric scores.

And let's say if we run this kind of greedy routing method on a social graph,

what is the expected length? Let's denote this by little l.

Big L is talking about the existence of short path.

Little l is talking about the ability to route according to a greedy methid and end

up with a short path. This is existence result.

This is a computability or searchability, navigability result.

It says that local information used in greedy fashion can lead you to some short

path. If this is true, If this little l is also small, maybe not

exactly as small as big l, then we're okay cuz it says that, you know what, I don't

need to define l as the median of average distances because shortest the distance is

going to be discovered, those paths would be discovered in a practical, greedy,

distributed, routing method. The question is, is that true?

Well, it turns out that is true as observed in many cases.

Now, the question is, how can we reverse-engineer that searchability

result? In other words, how can we have an

explanatory model to generate an, an algorithmic small world?