0:00

Okay, so now we have analyzed the running time of BFS and

Â we got to a function that is quadratic in the number of nodes.

Â Again, if we look at this algorithm, we use this calculation here,

Â we got to this function O of n squared, where n is the number of nodes.

Â If you look at this, at this function carefully you will notice that m,

Â the number of edges, is not playing any part in this function.

Â Okay. It's not playing any part.

Â Now let's, let's think about some some, some extreme cases.

Â I'm going to raise some extreme case.

Â Suppose I have a graph, suppose I have a graph that has million nodes.

Â It has million nodes.

Â And, again, I'm going to focus on an extreme case.

Â Where I will say that this graph has one edge, okay?

Â It has one edge.

Â 0:52

How long do you think BFS is going to take on this, on this on this graph?

Â Is it going to take million squared?

Â Is it going to do million squared operations?

Â Or fewer than that?

Â If you think about the algorithm carefully, suppose, let's just assume for

Â simplicity, that again for this graph I have node 0, and

Â all the way to the million nodes, and I have node 1.

Â It is connected, but then nothing else is connected.

Â 1:27

So then we have, all the way to node 99,999.

Â So these are all the nodes in the graph.

Â It has only one edge, and I'm going to assume that it is 0 to 1.

Â How long is this algorithm going to take?

Â [SOUND] Is going to come and then do the initialization.

Â It's going to assign, assign infinity to each one of them.

Â So it's going to take on the order of million operations just to assign,

Â to initialize distances.

Â Okay? So it's doing million operations to

Â assign these distances.

Â Then, it's going to come and

Â initialize the [SOUND] distance to 0, to node 0, to be 0.

Â It's going to put node 0 on the queue and it's going to start working.

Â So the first time it works it's going to remove node 0 from the queue and

Â look at all it's neighbors.

Â How many neighbors does it have?

Â It has only node one.

Â Okay? It's going to look at node 1.

Â It's distance is not,

Â it's infinity it's going to update it's distance now d1 becomes 1.

Â 2:55

So how many operations has the algorithm done?

Â Of course, it did the million operations to initialize distances to all,

Â to all nodes.

Â But then after that the number of operations it did was very, very small.

Â In fact, it was proportional to the, to the number 1, which is one edge.

Â Okay?

Â So it is in this case saying that BFS is going to take million squared.

Â Million squared is, is is a very big number right?

Â I mean we have 12 zeroes, and again even with with fast computers this is

Â basically going to say that this algorithm is going to take a very long time.

Â All right?

Â However, again for this case I have million nodes, and

Â the algorithm is going to be very fast.

Â 3:39

Okay?

Â It's going to do million operations again for initialization and just, one or two or

Â three operations here to deal with node 1 and it's, it's done, okay.

Â So.

Â In other words, the n squared, the o of n squared running time is not very tight.

Â It is not very tight when we have scenarios like this.

Â Okay? When we have scenarios like this.

Â 4:03

What is this scenario?

Â This is a scenario where the graph is extremely sparse, extremely sparse.

Â It has just one edge.

Â Okay?

Â But, we don't have to invoke this extreme scenario for n square to be also not tied.

Â I can have million edges and

Â still the algorithm is going to do much fewer operations than millions squared.

Â Okay?

Â So, it's not that I am invoking some extreme case with million nodes and

Â one edge and I am saying our analysis is not tied.

Â What actually I am saying is that if you have a reasonable number of edges, you can

Â still do much fewer operations, many fewer operations than million squared.

Â Okay?

Â So, this leads us to thinking about the algorithm differently and

Â start thinking about how does the number of edges affects the running time.

Â Or how can we analyze the running time with keeping the number of

Â edges in mind, okay?

Â 5:00

Now, notice that even in this extreme case I could not do anything about this.

Â I could not do anything about this.

Â I had still to initialize all nodes to infinity.

Â And.

Â There's, this is not something that has anything to do with the number of edges.

Â So now I will start thinking, reasoning a bit more carefully about these,

Â and try to see, you know, how does the number of edge, number of edges m,

Â factors into the running time of this algorithm.

Â As I said, again, all the way to here.

Â This is still going to be O of n.

Â So, in that extreme case of million nodes, we are going to still do on the order of

Â million operations, because going to be dictated by this, by this one.

Â Now, we start thinking about these group.

Â Instead, instead of thinking about you know, order of n,

Â order of n, and let's multiply them.

Â Let's think carefully about each edge in the graph,

Â and think about this algorithm, what it's doing.

Â It looks at a node then it looks at all its neighbors.

Â Okay. Looks at a node, looks at it,

Â at all its neighbors.

Â Neighbors, the concept of neighbor is determined by edges.

Â Node i has neighbor j if there is an edge between i and j.

Â So now the edges start coming into the picture.

Â Now I can start asking, if I think about a pair of

Â nodes that are connected by an edge, i and j are connected by an edge.

Â 6:25

How many times is that edge going to be looked at in this algorithm?

Â Okay?

Â Now if we look at, at let's look at again this even this extreme case here.

Â How many times is this edge going to be looked at?

Â Okay?

Â Notice that when we, when I was, when j was 0 and h is on.

Â We looked at this edge here, right?

Â Because, how did we know that one is a neighbor of 0?

Â Because of this edge.

Â So this edge was looked at once already when j was 0 and h was 1.

Â 7:00

Is this edge looked at again any, any more?

Â The answer is yes.

Â When j is 1, next time we come to the queue and

Â remove j from the queue, when j is 1, one of its neighbors is 0.

Â So h is going to j is 1, h is 0, we are going to look again at this edge.

Â Or in other words, for every pair of nodes that are connected by an edge,

Â how many times do we look at them?

Â We look at them for a, for a pair of nodes i j.

Â That are neighbors.

Â We look at them twice, once when j was I.

Â 7:33

Sorry once when the first note was I and the second note was j.

Â The second time when the first note was j and the second was I, okay.

Â So every edge we'll look at it twice.

Â Are we going to look at it anymore than that?

Â The answer is no, because once you have dealt with this node,

Â compute the distance to this node and you are looking at its neighbors.

Â This has been removed from the queue already.

Â Once you look at the neighbors of this and you are done,

Â this is removed from the queue.

Â You will never again reach this node.

Â 8:03

So what does that tell us?

Â That if I have n edges, this loop here, this loop.

Â All of them the,

Â the two nested loops, are going to be doing on the order of m, 2m operations.

Â Okay?

Â So, I don't need now to look at n nodes times n nodes and so on.

Â No, now I need to think about it that if I have ten edges,

Â the number of operations here is going to be on the order of 20, 2 times 10.

Â If I have m edges, this is now going to be done on the order of m.

Â So now, using this reasoning by thinking about every edge how many times it is,

Â it is used or looked at now this is, we can say that this is o of m.

Â Okay?

Â Because there was, there was actually when we looked at this operation here, it

Â is true that this, we are going to do this n times, and there is no escape of that.

Â We are going to do this n times.

Â But the problem here is that n, we multiplied it by m, and

Â this is not accurate because not every node is going to have n neighbors,

Â n neighbors, n neighbors, or n minus 1.

Â Note, it total all nodes are going to have m neighbors.

Â Okay?

Â Therefore, we can think about this as taken m.

Â And using this analysis, now we say that the running time of BFS is o of m plus n.

Â O of m plus n, okay.

Â 9:30

So using this type of analysis, more careful analysis, we got to a running

Â time that is linear in the number of nodes and linear in the number of edges, okay.

Â Now let's contrast the two.

Â We have the first attempt we got to O of n squared,

Â the second one we got to O of m plus n, okay.

Â Again, using the extreme example I showed with million nodes and one edge,

Â the running time is really going to be million in the order of million plus 1.

Â On the order of million.

Â It's not going to be on the order of million squared.

Â If, when, when would these two running times actually be very similar?

Â Let's think about another extreme example, that the graph is complete.

Â Every pair of nodes has an edge between them.

Â Then, it doesn't matter that we did this analysis because m is going to

Â be actually on the order of n squared.

Â Okay?

Â Now, so keep, keep this in mind that, you know,

Â looking at the graph, if the graph is very sparse,

Â this is is the more accurate representation of the running time.

Â If the graph gets to, close to, becomes very dense, gets close to

Â a complete graph, this, the n squared becomes really the dominant term.

Â Okay?

Â When we look at real life applications,

Â most interesting graphs that we are going to face.

Â They are going to have actually more edges than n okay,

Â because if you think about it, if you have the number of edges is smaller than n,

Â this is a very sparse graph.

Â Usually the graphs we deal with are not that sparse.

Â If we have m is greater than n,

Â then this running time here, we can actually simplify it and say it is O of m.

Â So sometimes you might see in textbooks or, or

Â somewhere else that the running time of BFS is just O of m.

Â This you can think about it as when we assume that the number of

Â edges is greater than n, then we can simplify this through all of them.

Â Okay.

Â So there is no contradiction between two, these two running times.

Â Okay.

Â Because the both of them are correct asymptotically.

Â This is tighter than this.

Â When we get to a very dense graph almost complete, it's going to be closer to this.

Â