Okay. First, we're gonna look at the search algorithm for, digraphs and this is the finding paths, what are all the vertices that we can get to from a given vertex along a directed path. and again, this is, little more complex for a digraph it would seem, than for a graph. so in this case, those are the, that's the set of vertices that you can get to, from the given vertex x, s. Notice that, this set is characterized by every edge crossing the boundary, it goes in. if there were an edge that went out, that would give, another member of the set.. well actually, looks more complicated to a human, but to the computer, it looks exactly, precisely the same. in fact. the method that we looked at for undirected graphs is actually a digraph processing algorithm. it treats, every connection between two vertices as two directed edges, one in each direction. So, DFS, that we looked at last time is actually a digraph algorithm. And we used precisely the same code. So to visit a vertex V, we mark the vertex as visited, and recursively visit all unmarked vertices W, that you can get to from V. let's look at the demo, just to, . Reinforce that. So, here's a sample digraph with the edges over at the right. let's look at that first search on that digraph. so, we're going to look at the vertices that we can get to from vertex zero in this digraph. Again, we have two vertex index to raise. one called marked, which says whether we can get there from V, and the other called edge two, which gives us the vertex that took us there. with that we can recover the paths from vertex zero to each vertex that can be reached from vertex zero. So we start off by visiting vertex zero and now check the edges that are adjacent to it with directed edges going out. so there's five and then there's gonna be one. but five is unmarked so we have to re-cursedly visit five. So we mark five, and we say we got there from zero. so the path from, to five is zero to five. and so now we're gonna recursively visit all the unmarked vertices pointed to from five . In this case it's just four. My four is unmarked so we're gonna recursively visit four and say we got there from five. and now recursively we have to check all the unmarked vertices pointing from four. there's three. and two, first we do three and that's unmarked. so we gotta visit three. and say that we got there from four. and now to visit three. We've looked at all the vertices pointing from three. we can check five. We've already been to five that's marked so we don't have to do anything. and then we check two. two is unmarked so we continue with the depth first search and visit two. so now to visit we mark two, and say we've got there from three. and now we check the vertices that we can get two from two. In this case it's zero, which we've already been to. And three, which we've already been to. so. now we're done with vertex two. and we can return and continue the search from three well actually that was the last one from three, so we're done with three as well. so now we're at four. We still have checked the edge from four to two. so now we do that. And of course we've been to two, so we don't have any further processing. and we're done with four. the, , four was the only edge we get to from five. So we're going to be done with five as well. and then, what about zero, well we have to check one. 1's not visited, so we visit one, mark it. and we turn and then we're done with zero, and that gives, the set of all vertices that are reachable from zero, and not only that the edge to array. Gives the information that we need to reconstruct the path from any of those, from zero to any of those vertices using precisely the same method that we used before. We get the four from five, we get the five from zero, so zero, five, four is the path to four. And we can do that for any vertex in that cell. Okay. So what about the code? the code is exactly the same as for , undirected graphs. That's the code for undirected graphs. that we looked at last time to get the code for digraphs, we just changed the name, its the s ame code, otherwise. the recursive, the constructor, builds the array of marked vertices, and also builds edge too, just to, avoid clutter, left that one off this slide, and then it makes the call to DFS. then the recursive DFS does the work. It marks the vertex and for every adjacent vertex. If its not marked, it does the DFS. and then the client can ask whether any ver-, any given vertex is reachable from s after the constructor has done its work. that's depth-first search in directed graphs, actually, we already did it. now here's just a couple of applications where this kind of code is used. one is, a so called program control flow analysis. actually every program can be viewed as a digraph. where, the vertices are basic blocks of instructions that are just executed one after the other with no conditionals. and then edges represent, a, jump. If there's an if statement, vertex left, two edges going out of it, or, or a loop, which involves a conditional. So, , analyzing a program, people write systems in analyse program, to look at their structure by studying their diagrams, for example. one thing that happens often is there's unreachable code. another, another thing you might want to do is determine whether you can get to this exit or not, by doing this digraph processing. so that's actually a widely used technique in, in developing software, software development. to try to improve code by doing this kind of digraph-processing. And ofcourse these digraphs can be huge. another classic use of depth for search in digraphs is garbage collection that is used in systems like Java where data structures or digraphs. we build objects and then we create references to other objects. and so the data that any programs use is really set as a digraph. So there's the idea of roots, so your program has e-. Some. live objects, that it can access through, whatever state, it's in, but, a language like Java, there's, automatic garbage collection, which means the programmer, when it's done with an object, maybe it overwrites one of these pointers or something. there's gonna be some, blocks, that, are not directly accessible by the program. and so, what's interesting is, the set of reachable objects that, can be indirectly access. By the program starting and following a chain of pointers. so those are the ones that can't be collected or reclaimed by the system for reusing the memory. But all the other ones, the gray ones that can't be reached by the program there's no reason to . Keep them live, you may as well collect them and return them for use, re-use. So there's a so-called marked and sweep out rhythm that actually dates back to 1960, where. We run DFS to mark all ritual objects. and then go through and sweep through all possible objects. And if it's object is unmarked it's garbage so add it to the list of free memory. and that's a classic method that's still widely used. it uses an extra bit per object'cause you have to have to have that for the mark. But still, it's effective and useful digraph solution. So DFS with reachability that we've just showed and path finding is similar. And there's a couple of other simple digraph problems that we'll consider. These are so far examples. But it's also interesting that DFS is the basis for solving digraph problems that are not so simple or immediate to solve. And this was pointed out 40 years ago by Bob Tarjan in a seminal paper that showed that. First search, can allow us to solve problems that seem pretty complicated actually, in linear time, and we're gonna look at an example of that later on. so that's depth for search. what about breadth for search. Well in the same way that we saw for depth for search every undirected graph is actually a digraph that has edges in both direction. So bfs is really a directed graph algorithm and we can use exactly the same code to find shortest paths from a source to any given vertex. So we use a que. We put the source on a que and mark it as visited and. And as long as the queue is non-empty, we remove the least recently added vertex and add to the que ue and mark as visited all the unmarked vertices that you can get to from that vertex. And the same proof shows that BFS computes shortest paths, the ones with the fewest number of edges from S to each other vertex in the digraph in time proportional to in linear time. So you want the GPS in your car, you BFS when you're driving around lower Manhattan. So, let's look at the demo again just to see, the distinction between, breadth first search, in digraphs and see how it works. So this is a, a small graphing, a smaller digraphic and with six vertices and eight edges. we take a Q and we take a source vertex and put on the Q to get started. then, Q is not empty, so remove zero and we check all, all vertices that are adjacent that we get to. So. we're gonna, in zero was zero distance, from zero, so first we will check two. and that one is not marked, so we mark it and put it on the queue. and then we'll check one. and that one's not marked, so we mark it and put it on the queue. then we're done with zero. now queue's not empty so we pull the least recently added off, that's two. and now we're going to check the vertices, you can get from two. I noticed both one and two are distance one from zero. And now, since we're going from two, everything that we encounter will be distance two from the source. So we find four, it's distance two from the source, and we get there from vertex two. Unmarked, so we fill in those data structures and put it on the queue. and then we're done with two, so we go back to the queue, and 1's on the queue. So we pull one off and it's distance one from zero. Remember the first showed that everything in the queue is one of two distance, either k or k plus one. In this case, we've got one at distance one, four at distance two. So now we're going to pull one off the queue. M- and look at the edges you can get to, places you can get to from one. Now we check two but that's already marked so we ignore it. and then we're done with one. Now four is left on the Q so we pull it off and check adjacent vertices. In this case three, it's unmarked so we put it on the Q. Then we're done with four. Then from three we check five and that's unmarked and it's one more distance from the source so we put it on the Q. And then finally. Oh we check two which we already visited so we don't have to, to do anything. And then finally we pull five off the Q. check. Or you get two from five and it's zero, which is marked, so we're done. and so that's breadth-first search whig, which gives us this directed tree from the source. Which gives the shortest path to all the vertices that you can get to from that source. you can use a version of this to solve a more general problem known as the multiple-source shortest paths problem. In this problem you're given a digraph and a set of source vertices, and you want to find the shortest path from any vertix in the set to each other vertix. So for example, in this case if the set is one, seven, and ten, what's the shortest path to four? From one of those vertices. Well, it turns out in this case to be seven, six to four. Shortest path to five is seven, six, zero, five. Shortest path to twelve is ten to twelve. That's a more general problem but it's actually easy to solve. How do we implement this? We just use a different constructor. We just use BFS but initialize by, put all the source vertices on the queue to get started. So that is every vertex is, so you put those on the queue and they're zero from the desired source. And then any vertex you can get to from any one of those is going to be. One and so forth, so the results still gives away. The edge to array will still give a way to get from any vertex, the shortest way to get from any vertex to each of the sour-, source vertices. here's an application of depth-first search. Let's say you want to crawl the whole web, well, all the web that you can access from some starting web page, say like Princeton's starting webpage. Again, the digraph model, each vertex is a webpage, each edge is a link on that webpage to some other webpage. and so all we want to do is get to all the other vertices on the web. And, so solution is, well, we don't actually build the digraph we just use an implicit digraph, because for every web page we can find the links to other web pages on it and we'll just build those as we encounter them. So we're gonna start with a source, which is the root web page. We're gonna have a queue of the sites that we still need to explore. what we're going to do is also have a set of discovered websites, this corresponds to our marked array but since we don't know how many vertices there are on the web all we're going to do is keep track of the ones that we've been to. so this is, don't use the vertex indexed array we don't even bother with because we'll just use the vertex names and do the look up as indicated we could do. So all we do is In the, is but for a search the same method is if the queue's not empty. Take a website off of the cue. And just add to the queue all the websites to which it links. but all of those websites, you check whether they're in the set of the ones that you've seen already. now, you might run out of space, for this set before you get to the whole web. but anyway, conceptually, this is a way that you could go. one thing to think about is why not use DFS for this. well One reason is you, you're gonna go far away in your search for the web. Maybe, maybe that's what you want, but the real problem, in point of fact is there's some web pages that would trap such a search by creating new web pages and make links to'em the first time that you visit'em. So, they, trap searches like that by, cuz DFS would always go to a new web page like that and it'd keep creating new ones and you wouldn't get very far. With breadt-first search you're taking a wide search of the pages that are close. and that's often maybe what you want for such a search. and, just to how simple the idea is, this is complete code for a, it's kind of a bare bones web crawler but it'll get to a lot of websites. so let's look at, do this example because again it really indicates the power of using appropriate abstractions to implement the algorithms that we're interested in. so this one we're gonna use a cue. Queue of strings so that's the websites that we have to still yet to go to. and then a set of strings is gonna be the ones that we've already been to that's equivalent to the marked array. we'll start with Princeton's website. add it to the queue of places we have to go and also mark it. Discovered that ad just means mark it. so now, while the queue's not empty. What we're going to do is read the raw HTML from the next website in the queue. So this is code using our input library that gets that job done. And then, this is a little fooling around with regular expressions. We'll talk about algorithms like this later on, that essentially tries to find all URLs within the text of that website. And for all of those URL's. And we'll. Look at workers behind this code later on in this course. For all those URL's it's gonna check. If it's marked that's discovered doc contains and if it's not marked it'll say it will mark it and put it on the queue. a very s- simple program with a very powerful effect that illustrates breadth-first search.