0:00

So in this lecture we're going to drill down into our first specific, search

Â strategy for graphs and also explore some applications. Namely, breadth first

Â search. So let me remind you the intuition and applications of breath first search.

Â The plan is to systematically explore the nodes of this graph beginning

Â with the given starting vertex in layers. So let's think about the following example

Â graph. Where S is the starting point for our breadth first search. So to start

Â vertex S will constitute the first layer. So we'll call that L zero. And then

Â the neighbors of S are going to be the first layer. And so those are the vertices

Â that we explore just after S. So those are L one. Now the second layer is going

Â to be the vertices that are neighboring vertices of L one but are not

Â themselves in L one or for that matter L zero. So that's going to be

Â C and D. That's going to be the second layer. Now you'll notice for example S is

Â itself a neighbor of these nodes in layer one, but we've already counted that in a

Â previous layer so we don't count S toward L two. And then finally the

Â neighbors of L two, which are not already put in some layer is E. That will

Â be layer three. Again notice C and D are neighbors of each other but, they've

Â already been classified in layer two. So, that's where they belong, not in layer

Â three. So that's the high level picture of breadth first search you should have. We'll

Â talk about how to actually precisely implement it on the next slide.

Â Again just a couple other things that you can do with breadth first search which we'll

Â explore in this video is computing shortest paths. So it turns out shortest path

Â distances correspond precisely to these layers. So, for example if you had that as

Â S, you had that as the Kevin Bacon node in the movie graph, then Jon Hamm would pop

Â up in the second layer from the breadth first search from Kevin Bacon. I'm also

Â going to show you how to compute the connected components of an undirected

Â graph. That is to compute its pieces. We'll do that in linear time. And for this

Â entire sequence of videos on graph primitives, we will be satisfied with

Â nothing less than the holy grail of linear time. And again, remember in a graph you

Â have two different size parameters, the number of edges M and the number of nodes

Â N. For these videos I'm not going to assume any relationship between M and N.

Â Either one could be bigger. So linear time's gonna mean O of M plus N. So let's

Â talk about how you'd actually implement breadth first search in linear time. So

Â the sub routine is given as input both the graph G. I'm gonna explain this as if

Â it's undirected, but this entire procedure will work in exactly the same way for a

Â directed graph. Again, obviously in an undirected graph you can traverse an edge

Â in either direction. For a directed graph, you have to be careful only to traverse

Â arcs in the intended direction from the tail to the head, that is traverse them

Â forward. So as we discussed when we talked about just generic strategies for graph

Â search, we don't want to explore anything twice, that would certainly be

Â inefficient. So we're going to keep a boolean at each node, marking whether

Â we've already explored it or not. And by default, I'm just, we're just going to

Â assume that nodes are unexplored. They're only explored if we explicitly mark them

Â as such. So we're going to initialize the search with the starting vertex S. So we

Â mark S as explored and then we're gonna put that in what I was previously calling

Â conquered territory the nodes we have already started to explore. So to get

Â linear time we are gonna have to manage those in a slightly non naive but, but

Â pretty straightforward way namely via a queue, which is a first in first out data

Â structure that I assume you have seen. If you have never seen a queue before, please

Â look it up in a programming textbook or on the web. Basically a queue is just

Â something where you can add stuff to the back in constant time and you can take

Â stuff from the front in constant time. You can implement these, for example, using a

Â doubly linked list. Now recall that in the general systematic approach to

Â graph search, the trick was to, in each iteration of some while loop, to add one

Â new vertex to the conquered territory. To identify one unexplored node that is now

Â going to be explored. So that while loop's gonna translate into one in which we just check

Â if the queue is non-empty. So we're assuming that the queue data structure

Â supports that query in constant time which is easy to implement. And if the queue is

Â not empty we remove a node from it. And because it's a queue, removing nodes from

Â the front is what you can do in constant time. So call the node that you get out of

Â the queue V. So, now we're going to look at V's neighbors, vertices with which it

Â shares edges, and we're gonna see if any of them have not already been explored.

Â So, if W's something we haven't seen before, that's unexplored, that means it's

Â in the unconquered territory, which is great. So, we have a new victim. We can

Â mark W as explored. We can put it in our queue and we've advanced the frontier and

Â now we have one more explored node than we did previously. And again, a queue by

Â construction, it supports adding constant time additions at the end of the queue, so

Â it's where we put W. So, let's see how this code actually executes in this same graph

Â that we were looking at in the previous slide. And what I'm gonna do is I'm gonna

Â number the nodes in the order in which they are explored. So, obviously

Â the first node to get explored is S. That's where the queue starts. So now, when we

Â follow the code, what happens? Well in the first iteration of the while loop we ask

Â is the queue empty? No it's not, because S is in it. So we remove in this case the only

Â node of the queue. It's S. And then we iterate over the edges incident to S. Now there are two

Â of them. There's the edge between S and A and there's the edge between S and B. And

Â again this is still a little under specified. In the sense that the algorithm

Â doesn't tell us which of those two edges we should look at. Turns out it doesn't

Â matter. Each of those is a valid execution of breadth first search. But

Â for concreteness, let's suppose that of the two possible edges, we look at the

Â edge S comma A. So, then we ask, has A already been explored? No, it hasn't. This

Â is the first time we've seen it, so we say, oh goody. This is sort of new grist

Â for the mill. So, we can add A to the queue at the end and we mark W as, sorry

Â mark A as explored. So, A is gonna be the second vertex that we mark. So, after

Â marking A as explored and adding it to the queue, so now we go back to the for loop,

Â and so now we move on to the second edge. It's into S, that's the edge between S and

Â B. So, we ask, have we already explored B? Nope, this is the first time we've seen

Â it. So, now we have the same thing with B. So, B gets marked as explored and gets

Â added to the queue at the end. So the queue at this juncture has first a record

Â for A, cause that was the first one we put in it after we took S out. And then B

Â follows A in the queue. Again, depending on the execution this could go either way. But

Â for concreteness, I've done it so that A got added before B. So at this point, this

Â is what the queue looks like. So now we go back up to the while loop, we say is

Â the queue empty? Certainly not. There's actually two elements. Now we remove the

Â first node from queue, in this case, that's the node A that was the one we put in

Â before the node B. And so now we say, well, let's look at all the edges incident

Â to A. And in this case A has two two incident edges. It has one that it

Â shares with S and it has one that it shares with C. And so, if we look at the

Â edge between A and S, then we'd be asking an if statement. Has S already been

Â explored? Yes it has, that's where we started. So, there's no reason to do

Â anything with S. That's already been taken out of the queue. So, in this for

Â loop for A, there's two iterations. One involves the edge with S, and that one we

Â completely ignore. But then there's the other edge that A shares with C, and C we

Â haven't seen yet. So, at that part of the for loop, we say ahah. C is a new thing,

Â new node we can mark as explored and put in the queue. So, that's gonna be our

Â number four. So now how has the queue changed. Well, we got rid of A. And

Â so now B is in the front and we added C at the end. And so now the same thing

Â happens. We go back to the while loop, the queue is not empty, we take off the first

Â vertex, in this case that's gonna be B. B has three incident edges, it has one

Â incident S but that's irrelevant, we've already seen S. It has one incident to C,

Â that's also irrelevant, that's also irrelevant, because we've already seen C.

Â True, we just saw it very recently, but we've already seen it. But the edge

Â between B and D is new, and so that means we can take the node D, mark it as explored

Â and add it to the queue. So D is going to be the fifth one that we see. And now the

Â queue has the element C followed by D. So now we go back to the while loop and we take C off

Â of the queue. It again has four now edges. The one with A is irrelevant, we've already

Â seen A. The one with B is irrelevant, we've already seen B. The one with D is

Â irrelevant, we've already seen D. But we haven't seen E yet. So, when we get to the

Â part of the for loop, or the edge between C and E, we say, aha, E is new. So E will

Â be the sixth and final vertex to be marked as explored. And that will get added at

Â the end of the queue. So then in the final two iterations of the while loop

Â the D is going to be removed, we'll iterate through its three edges, none of

Â those will be relevant because we've seen the three endpoints. And then we'll go

Â back to the while loop and we'll get rid of the E. E is irrelevant cause it has two

Â edges we've already seen the other endpoints. Now we go back to the while loop.

Â The queue is empty. And we stop. That is breadth-first search. And to see how this

Â simulates the notion of the layers that we were discussing in the previous slide

Â notice that the nodes are numbered according to the layer that they're in, so

Â S was layer zero. And then the two nodes that S caused to get added to the queue, the A

Â and the B, are number two and three, and the edges of layer three are precisely the

Â ones, sorry the edges of layer two are precisely the ones that got added to the

Â queue, while we were processing the nodes from layer one. That is, C and D are

Â precisely the nodes that got added to the queue while we were processing A and B.

Â So, this is level zero, level one, and level two. E is the only node that got

Â added to the queue while we were processing level, layer two. The vertices

Â C and D. So E will be the third layer. So, in that sense, by using a first in first

Â out data structure, this queue, we do wind up kinda processing the nodes according to

Â the layers that we discussed earlier. So, the claim that breadth first search is a

Â good way to explore a graph, in the sense that it meets the two high level goals

Â that we delineated in the previous video. First of all it finds everything findable,

Â and obviously nothing else, and second of all, it does it without redundancy. It

Â does it without exploring anything twice, which is the key to its linear time

Â implementation. So a little bit more formally, claim number one. At the end of

Â the algorithm, the vertices that we've explored are precisely the ones such that

Â there was a path from S to that vertex. Again this claim is equally valid, whether

Â you're running BFS in an undirected graph or a directed graph. Of course in an

Â undirected graph, meaning an undirected path from S to V, whereas a directed graph

Â in a directed path from S to V. That means a path where every arc in the path gets

Â traversed in the forward direction. So, why is this true? Well, this is true, we

Â basically proved this more generally for any graph search strategy of a certain

Â form of which breadth first search is one. If it's hard for you to see the right way

Â to interpret breadth-first search as a special case of our generic search

Â algorithm, you can also just look at our proof for the generic search algorithm and

Â copy it down for breadth-first search. So it's clear that you're only gonna,

Â again, the forward direction of this claim is clear. If you actually

Â find something, if something's marked as explored, it's only because you found a

Â sequence of edges that led you there. So the only way you mark something as

Â explored is if there's a path from S to V. Conversely, to prove that anything with an

Â S to V, for with a path from V will be found, you can proceed by contradiction:

Â you can look at the part of the path from S to V that, that BFS does successfully

Â explore, and then you gotta ask, why didn't it go one more hop? It never

Â would've terminated before reaching all the way to V. So, you can also just copy

Â that same proof that we had for the generic search strategy in the previous

Â video. Okay? So, again, the upshot. Breadth first search finds everything

Â you'd wanna find. Okay? So, it only traverses paths, so you're not gonna find

Â anything where there isn't a path to it. But it never misses out. Okay? Anything

Â where there's a path, BFS, guaranteed to find it. No problem. Claim number two is

Â that the running time is exactly what we want and I am gonna state it in a form

Â that will be useful later when we talk about connected components. So the running

Â time of the main while loop, ignoring any kind of pre processing or initialization is

Â proportional to what I am gonna call NS and MS which is the number of nodes that

Â can be reached from S and number of edges that can be reached from S. And the reason

Â for this claim it just becomes clear if you inspect the code which we'll do in a

Â second. So let's return to the code and just tally up all the work that gets done.

Â So I'm gonna ignore this initialization. I'm just gonna focus on the main while

Â loop. So we can summarize the total work done in this while loop as follows. First

Â we just think about the vertices so in this search we're only gonna deal, ever

Â deal with the vertices that are findable from S, so that's NS. And what do we do

Â for the given node, well we insert it into the queue and we delete it from the

Â queue. Alright? So we're never gonna deal with a single node more than once. So

Â that's constant time overhead per vertex that we ever see, so that's the proportion

Â of the NS part. Now, a given edge, we might look at it twice. So, for an edge V

Â W, we might consider it once when we first look at the vertex V, and we might

Â consider it again when we look at the vertex W. Each time we look at an edge we

Â do constant work. So that means we're only gonna do constant work per edge. Okay. So

Â we look at each vertex at most once. We look at each edge findable from S at most

Â twice. We do constant time, constant work when we look at something. So the overall

Â running time is going to be proportional to the number of vertices findable from S

Â plus the number of edges findable from S. So, that's really cool. We have a linear

Â time of implementation of a really nice graph search strategy. Moreover we just need

Â very basic data structures, a queue, to make it run fast with small constants. But

Â it gets even better. We can use breadth first search as a work horse for some

Â interesting applications. So, that's what we'll talk about in the rest of this video.

Â