0:00

Now that we understand [COUGH], we, we understood the problem of

Â the small world effect and what we, what is, what is it that is asked of us.

Â And second, that we formulated it cleanly as a graph theoretic problem where

Â the input is a graph, the output is a distribution of the realized distances.

Â We move to the third step, which is designing the algorithm.

Â In the, in algorithmic thinking again, the third step is designing the algorithm.

Â [COUGH] So now, the question is, how do we come up with an algorithm,

Â that will compute distances between every pair of nodes?

Â Okay?

Â And we will start with the first attempt, which I would argue is the,

Â probably the first attempt that anyone would take at such a problem.

Â Since we have defined what distance means in a graph we can just follow that

Â definition, and try to turn it into an algorithm.

Â Remember definitions said that the distance between two nodes, i and g,

Â is the smallest k for which there is a path of length k between these two nodes.

Â The smallest k.

Â Again, since we don't allow self loops,

Â we are usually looking at the smallest of k of 1.

Â If I have two nodes that are different, the smallest distance between them can be

Â 1, because they can be connected by an edge.

Â So, here's the simplest algorithm, that probably anyone can come up with.

Â Given the two nodes, i and g, that are different.

Â 1:27

If no, if the answer is no, then we have to increment that key, and

Â say is it a path of length two between i and k.

Â If yes, since there is no path of length one, and there's a path of length two, and

Â the distance is two.

Â We can return two and we are done.

Â If no, then we ask is there a path of length k, three.

Â If the answer is yes, we say the distance is three.

Â We are done.

Â If no, we try four.

Â 1:55

And so on.

Â And here's the algorithm.

Â So we, our first problem in this course, here's a very simple algorithm.

Â It, I would say that it is very easy to argue it is correct.

Â Because I'm just following the definition.

Â Someone defined the distance for

Â me as the smallest k, for which there is a path of length k.

Â I am doing, I'm, I'm going in a brute force manner.

Â And I'm trying to check every possible case starting from one, going to two,

Â to three, to four.

Â So an algorithm is very simple to try.

Â It is very simple to implement,

Â I would argue, and it's very clearly, it's clear, it's clearly correct.

Â Right?

Â So the question is [COUGH] am I done, can I start implementing this algorithm?

Â The problem with the algorithm we just described is,

Â 2:41

what is the maximum value of k that we should try?

Â So, I try a path of length one.

Â The answer is no.

Â Then two, the answer is no.

Â Then k equal 3, the answer is no.

Â And I keep incrementing, incrementing, incrementing.

Â Where do, am I going to eventually hear a yes?

Â 3:13

We, we would want to use algorithms that eventually terminate.

Â I'm not interested in any algorithm that might run forever,

Â in some instance of the problem.

Â So I need to guarantee, that my algorithm terminates on whatever input I give it.

Â So, again, I give the, the,

Â the algorithm as input, a graph and a pair of nodes and I say compute the distance.

Â It goes through this repetitive iteration of checking if there is

Â a path of length one, two, three, four, and so on.

Â And again, the first question I asked, is there always,

Â is there always a path of a finite length, between any pair of nodes?

Â And the answer is no.

Â 3:49

Because here's an example of a graph.

Â [COUGH] If we follow the algorithm

Â we just described on this graph, and the two nodes zero and three for example,

Â the algorithm is going to say, is there a path of length one between zero and three?

Â The answer is no.

Â Then we said,

Â the algorithm says, is there a path of length two between zero and three?

Â The answer is no.

Â Is there a path of length three?

Â The answer is no.

Â Four? No.

Â Five? No.

Â 4:20

And in fact, there is no path of any finite length, between zero and three.

Â So the question is, where do we stop?

Â When do I stop and say, no, there is no algo, there is no path of a finite length?

Â Or that the distance between, zero and three is infinity?

Â 4:38

If we use mathematical scales here.

Â We can reason about it and get to the conclusion that, if I have four nodes,

Â if I have four nodes, then if there is a path between zero and

Â three that path, the length, I can always find a path whose length is at most three.

Â Okay, so there is, again a path between 0 and 3, it's length, has to be,

Â there has to be a path whose is at most 3.

Â Or more generally if I have a graph within nodes, and I take any

Â two nodes from the graph, there is, if there is a path between these two nodes.

Â I can always find the path whose length is at most in minus one.

Â I encourage you to think about why that is the case, and this is our first

Â encounter of one math and math skills, are very important when we design algorithms.

Â We cannot really decouple the two,

Â we cannot say we want to only study algorithms, we don't care about math.

Â We have to use mathematical tool, to be able to design a good algorithm.

Â So in this case, there will be cases, input, for

Â which there is no paths between two, a pair of nodes; which means that if

Â I continue with that algorithm I will never terminate.

Â I have to put the bond,

Â at which I can be certain that there is no path at that point and I can say.

Â No, there is no path, and the distance is infinity.

Â Using some mathematical reasoning, I can get to a point where if the graph is,

Â has N nodes, I can say try n minus 1 at most, a path of length n minus 1.

Â If you fail with one, two, three, four, all the way to n minus 1,

Â you can terminate safely and say there is no path.

Â Of length, of any finite length and the, the distance is infinity.

Â 7:12

Now, notice that we go back to the definition, how do we define distance?

Â How did we say a path of length two?

Â We said, if, if we are looking at path of length two.

Â It, basically, if you go back to the finish, and

Â it says, is there a node that I can put between these two nodes, 0 and 3?

Â If I take 0 and 3, can I put some node here?

Â Such that there's an edge from 0 to this node, and an edge from this.

Â Because if there exists such a node, then this path between 0 and 3 exists.

Â What node can this be?

Â I don't know, we can try each one of them.

Â I can put 0, 1, 2, 3.

Â Is there a path from 0 to 0, then 0 to 3?

Â No. 0 to 1, 1 to 3?

Â No. 0 to 2, 2 to 3?

Â No. 0 to 3, 3 to 3?

Â No.

Â Therefore we say that there is no path.

Â So.

Â How do we, again without thinking too much,

Â how do we check if there is a path of length k between 0 and 3?

Â The idea is very simple, going back to the definition it tells me check,

Â if there is a subset of k minus 1 of nodes, such that there is

Â an arrangement of them between these two nodes that will make a path appear.

Â Okay? So mathematically,

Â I can say if you have a node, a pair of nodes, and you want to check if there is

Â a path of length k between them, try every possible sub-set of k minus 1 nodes.

Â 8:32

And try every arrangement, or permutation of these k minus 1 nodes.

Â And put them in between.

Â So try every sub-set of k minus 1, and for each one of

Â these sub-sets try every possible arrangement or permeatation of them.

Â If you get to a point where you could find,

Â these k minus 1 nodes and arrange them, in such a way that there is an edge

Â between every pair of consecutive one, you say yes there is a path of length k.

Â If you tried every subset of size k minus 1, and

Â if you tried every permutation of every sub-set, you still

Â couldn't find a situation where you have and edge between every consecutive peer.

Â Then at that point you say there is no, there is no path of length k.

Â And as we said before, you wrap this,

Â this procedure by another layer of looping over k from 1, to n minus 1.

Â You first write k equal 1, then 2, then 3 and you can go all the way to n minus 1.

Â If you find any such thing on the way, then you say, yes,

Â here is the path and here's k.

Â If you don't, if after trying k equal 1 and 2 and 3, all the way to n minus 1.

Â And you failed, then at that point you can terminate.

Â The algorithim can terminate and say there is such past, or

Â that the distance is infinity.

Â The pseudocode, or

Â the description of this algorithm is given in its entirety on the slide.

Â 10:00

This algorithm can be then to every pair of nodes.

Â And will compute the distance, between every pair of nodes.

Â The nice thing about this algorithm is that, it is very easy to describe.

Â It is very easy to, to implement.

Â It is, it belongs to a category of algorithms that we call brute force.

Â It's an algorithm,

Â that literally follows the definition of the problem to try to solve it.

Â It is, I would, again as I say, it is often the first technique that we

Â try when we, when we are, we encounter a problem for the first time.

Â The, the brute force algorithms are, as I said, easy to describe, easy to implement.

Â And it's very easy to argue that the algorithm I described is correct.

Â Because it's just following definition.

Â It's trying every possible key, and every possible subset of side skill.

Â And every possible arrangement of that subset.

Â So I'm not missing anything.

Â The, and, and So it's easier to,

Â to establish the correctness and implement it.

Â What is the problem with that algorithm, or brute force algorithms in generals?

Â The, the Achilles heel of, of brute force algorithm is usually efficiency.

Â If you think about this, or

Â if you implement this algorithm, and you start running it.

Â On, on Facebook graphs.

Â If you run it on a Facebook graph that has five nodes, it's going to run,

Â it's going to give you results.

Â However, as like the size of the graph starts growing and you start running it on

Â graphs or on networks of size ten, ten individuals, 20, 50, 100.

Â Before you reach 100, before you reach 50, you will notice that this algorithm will

Â take years and years to run on that data, so the good

Â thing about brute force algorithms is, easy to implement, easy to describe.

Â The major problem with them is efficiency.

Â And this brings me to our next topic, which is algorithm efficiency.

Â How do I analyze the running time of this algorithm, and

Â show that it is actually not efficient and it's not worth implementing?

Â