0:02

Now, we're going to look at depth-first search, which is a classical graph

processing algorithm. It's actually maybe one of the oldest

algorithms that we study, surprisingly. One way to think about depth first search,

is in terms of mazes. It's a pretty familiar way to look at,

look at it. And, so, if you have a maze like the one

drawn on the left, you can model it with a graph.

By creating a vertex for every intersection.

In the maze. And an edge for every passage connecting

two intersections. And.

So. If you're at the entrance to this maze and

you want to find a pot of gold somewhere. What you're gonna need to do is.

Explore every intersection. Or explore, explore every edge.

In the maze. So.

We're gonna talk about the. Explore every intersection.

Option. So that's our goal.

To have an algorithm for doing that. By the way, this is a famous graph that

some of you might recognize. That's the graph for the Pac-man game.

Okay, so one method classic method that predates computers for exploring a maze is

called the Tr maux maze exploration algorithm.

The idea is to think about having a ball of string.

And what you do is when you walk down a passage, you unroll the string behind you.

And you also, mark, every place that you've been.

So actually, I have a ball of string and some chalk, maybe.

So in this case, maybe we walk down this passage here.

And now we have some choices about where we might go.

So say we go down here. So we unroll our ball of string and mark

it. And so now, the next time, At this

intersection, we have no choice but to go up here.

We go up here and we say, oh, we've already been there.

So we're not gonna go there. And we come back.

And, we have our ball of string. So we can unroll it to figure out where we

were. And we go back until we have some other

choice. Which is this, this place, now.

And we mark that we've been in these other places, And so now, we take another option

and say, go down this way. And here, we take another option, go that

way. And then finally again we go up this a way

and we see that we've been there so we back up and take the last option and then

that gets us to the last vertex in the graph.

So mark each visited intersection and each visited package, passage, and retrace our

steps when there's no unvisited option. Again this is a classical algorithm that

was studied centuries ago and in fact some argued the first youth was when Theseus

entered the Labyrinth and was trying to find the Minotaur and, And Rodney didn't

want ''em to get lost in the maze. So she instructed Theseus to use a ball of

string to find his way back out. That's the basic algorithm that we're

gonna use. And has been studied by many, many

scientists in the time since Theses. And in fact, Claude Shannon, founder of

Information Theory, did experiments on mazes with mice to see if they might

understand maze exploration, this might help.

Okay, so here's what it look like in its typical maze.

3:57

Now one of the things to remember is in a computer representation normally we're

just looking at the vertices and the set of associated edges.

We don't see anything other than that. Though, it's sometimes frustrating

watching me you know that it turned the wrong way and it's gonna get trapped here.

But, the computer doesn't really know that, so it has to back up along here now

and it continues to back up to find another option untill it gets free again.

And finds a some place to go. Sometimes its very frustrating.

Its seems to be quite close to the goal like appear and it turns a wrong way.

So we an see its gonna take a long way but no way the program could really know that.

Again, all the programs we're working with is vertex instead of edges associated with

that vertex and there it finally get to the goal.

Here's a bigger one going faster. The king thing is not so much, its getting

lost and the key thing is not going anywhere twice.

And that's the whole thing. We have to have the string to know to go

back where we came from. And we have to be able to mark where we

have been. And with those two things we are,

algorithm is, able to avoid going the same place twice.

If you weren't marking, if you tried to do this randomly or some other way it might

take you a while to get to the goal. So it doesn't seem like much of

accomplishment maybe for a maze but actually to be able to get there with

going, without going any place thrice, twice is sort of a, profound idea and

leads to an efficient algorithm. Okay.

So, our idea is, given in this, medicode, to do, depth research, that is, to, visit,

all the places you can get to from a vertex, V.

What we're gonna do is this simple re, recursive algorithm.

Mark the vertex as visited and then recursively visit all unmarked vertices, W

that are adjacent to V. That's a very simple description, and it

leads to very simple codes. It's so simple actually, it really belies

the profound idea underneath this algorithm.

So, again, There's lots of applications. And, for example, this is one way to find

whether there exists a path between two vertices.

Or to find all the vertices connected to a given source vertex.

And we'll consider some less abstract applications, once we've looked at the

code. So, so how to implement.

Well here's what we're gonna do for our design pattern for graph processing.

It's our first example. So what we did, when we defined an API for

graph was to decouple the graph data type from graph processing.

The idea is we're gonna create a graph object using that API which we know allows

us to represent a big graph within the computer.

And gives us the basic operations that we're gonna need for graph processing.

And then we use that API within a graph processing routine.

And the basic idea is that, that graph, graph processing routine will go through

the graph and collect some information. And then a client of that routine will

query the it's API to get information about the graph.

So in the case of, depth first search. Here's a potential possible API.

So the idea is that what this, what we're gonna implement is a program that can find

paths in a graph from a given source. So we give a graph and a vertex.

And that the constructor is gonna do what it needs in order to be able to answer,

these two queries. First one is, give a vertex,

Client will give a vertex. Is there a path in the graph, from the

source to that vertex? You wanna be able to, answer that

efficiently. And then,

The other thing is to just give the path. What's the path from, has to be giving all

the vertices on the path, in time proportional to its length.

So here's a client of, this, API. So, it's gonna take a source, a source

vertex S. And it's gonna build a pathfinder, or a

path object. And that object is gonna do the processing

it needs to be able to efficiently implement hasPathTo. And then what this

does is for every vertex in the graph. If there's a path from s to that vertex.

It'll print it out. So it prints out all the vertices

connected to x. And that's just one client, of this, data

type. You could, print out the pass or whatever

else you might. So that's our design pattern that we're

gonna use over and over again, for, A graph processing routine.

And it's important to understand why we use a design pattern like this.

We're decoupling the graph representation from the processing of it.

As I mentioned, there's hundreds of routines for, or algorithms that have been

developed for processing graphs. An alternative might be to put all of

those algorithms in one big data type. That's a so called fat interface.

And that would be a, a bad plan, cuz these things maybe are not so well related to

each other. And actually all of them really are just

iterating through the graph, and doing different types of processing.

With this way we're able to separate out. The, and I articulate what the graph

processing clients are doing. And then the real applications can be

clients, of these graph processing routines.

And everybody's taken advantage of an efficient representation, that we already

took care of. Okay.

So now let's look at a demo of how depth-first search is gonna work and then

we'll take a look at the implementation. Okay.

So here's a demo of depth-first search in operation on our sample graph.

Again, to visit a vertex we're gonna mark it, and then recursively visit all

unmarked verti-, vertices that are adjacent.

So this is a sample graph. And so the first thing we do is realize

that we're gonna need a vertex index array to keep track of which vertices are more.

So that will just be an array of bullions and we'll initialize that with all false.

We're also gonna keep another data structure.

A, a vertex indexed array of ints. That for every vertex gives us the vertex

that took us there. So, let's get started and you'll see how

it works. So this is depth-first search staring at

vertex zero. So now to visit vertex zero, we wanna mark

it so that's, our mark zero is true. That's the starting points we know

anything with Edge two. And now what we're gonna do is.

We're going to need to check all the vertices that are adjacent to zero.

So that's six, two, one, and five. The order in which they're checked depends

on the representations in the bag. We don'tr really, necessarily care about

that. Most of the algorithms are going to check

them all. And it doesn't matter that much about the

order. Although, in some cases it's wise to be

mindful. And maybe use a bag that takes them out in

random order. Okay this is zero.

Now we have to check, six, two, one, and five so, lets go ahead and do that.

So in this case six, six is the first thing to get checked.

And so now, we mark, six is visited so now we are going to recursively do a search

starting from six. The other difference when we visit six

from zero. We're going to put a zero in this edge to

entry to say that when we first got the six the way we got there, was from zero.

And that's going to be the data structure that'll help us, implement the client

query and give us the path back to zero from any path.

From any vertex. So okay what do we have to do to visit

six. Well six has two adjacent vertices zero

and four. So we're gonna have to check them.

So first we check zero and that's already marked.

So we don't really have to do anything. We're only suppose to recursively visit

unmarked vertices. And then we check four.

And four is unmarked, so we're going to have to recursively visit is.

The next thing we do is visit four. Mark four as having been visited.

Where the true and the marked array, Fourth entry is a marked array.

And we fill an edge two saying we got to four from six.

And so now to visit four, we have to recursively check five, six and three, and

again, that order is where they happen to be in our bag.

So first we check five. Five is not marked so we're going to visit

five. We're gonna mark it.

Say we got there from four, and then go ahead and visit three, four and zero, in

that order. From first we visit three.

That one also is not yet marked, so we're gonna recursively visit it.

So it's marked three. Say we got there from five and then go

ahead and to visit three recursively, we have to check five and four.

Check five. Well we just came here it's mark, so we

don't have to do anything. Check four, that's also, been marked so we

don't have to do anything. So now finally, this is the first time and

that requires a call that we're ready to return, we're done with that first search

from three. So now we're done with three.

And we can unwinding the recursion, we can now continue our search from five.

And the next thing we have to do from five, we had already checked three, so now

we're gonna check four. And we've already visited four, so we

don't have to do anything. That's already marked.

And we checked zero, and that one's already marked.

So now we're done with five, and we can back one more level up in the recursion.

So now, for four, we have to go through, and look at six and three.

Six is mar, marked, so we don't have to do anything.

Three is marked, so we don't have to do anything, and so we're gonna be done with

four. So that after finishing four we're done

with six. And so now we're in the recursion back at

zero. And we've already checked six.

So now we gotta check two next. We checked two, and so we rehearse and go

there. Mark two, and then say we got there from

zero, and now to visit two, all we check is zero and that's a marks, so we don't

have to do anything, and we're done with two.

Then check one, visit one, that's the last vertex we're visiting.

Check zero, it's already marked so we don't do anything.

We return. Now, we're at the last, step is to, from

zero, five is on it's list, we have to check if we've been there.

We can see that it's marked and we have been there so we're one with zero.

So that's a depth-first search from vertex zero, and we have visited all the vertices

that are reachable from zero. Number one and number two for each one of

those vertexes we kept track of how we got there from zero.

So if we now want to know for any one of those vertices how to get back to zero we

have the information that we need. For example say we want to find the path

from five back to zero. We know we got the five from four, we know

we got the four from six, we know we got the six from zero so we can go back

through using that edge to array to find. The path, so the depth for search

calculation built in data structures, and now clients, whose data structures built

in a constructor serve as the basis for, being able to.

Officially answer client queries. That's a depth-first search demo.

So, this is just a summary of the thing I talked about, during that demo.

Our goal is to find all the vertices connected to different vertex at.

And also a path, in order to be able to answer client query.

On the algorithm we're going to use is based on like maze exploration where we

use excursion, mark each vertex, keep track of the edge we took to visit it and

return when there's no unvisited options. We're using, two data structures, to

implement this. Both vertex indexed arrays.

One named Mark that will tell us which vertices we've been to.

And another one, edge two that maintains that tree of paths.

Where edge 2W = V means that VW was taken, the first time that we went to W.

So now, let's look at the code, that, given all of this background.

The code for implementing depth first search is remarkably compact.

So here's our private instance variables. The marked and edgedTo vertex and mix

arrays, and the source s. And the constructor just goes through and,

creates, the arrays and initializes them. And we won't repeat that code.

And so here's the, the last thing the constructor does after it creates the

arrays, is does a DFS on the graph, from the given source.

And it's a remarkably, compact implementation to do depth-first search,

from a vertex V. What we do is mark V, let's say mark it

true. Then for everybody adjacent to V.

We check if it's marked. If it's not marked, then we do a recursive

call. And we set edge to w equals v.

Again remarkably compact code that gets the job done.

19:57

So now let's look at some of the properties of depth-first search.

So first thing is we wanna be sure that convince ourselves that it marks all the

vertices connected to S in time proportional to some of their degrees,

well, depth-first graph is going to be small.

So s-, so first thing is, convince yourself that if you mark the vertex, then

there has to be a way to get to that vertex from S.

So well that's easy to see, because the only way to mark vertex is get there

through a sequence of recursive calls, and every recursive calls corresponds to an

edge on a path from SVW. But you also have to be able to show that

you get to, every vertex that's connected to S.

And that's a little more intricate. And this diagram is, supposed to help you

out in understanding that. If you had, some unmarked vertex, Then,

maybe there's, a bunch of unmarked vertices.

And so... And it's connected to S and it's not

marked, so that means there has to be an edge on a path from S to W, that goes from

a marked vertex to an unmarked one. But the design of the algorithm says that

there's no such edge if you're on a marked vertex then you're gonna go through and

look at all the adjacent ones and if it's not marked, you're gonna mark it.

So that's an outline of the proof that DFS marks all the vertices.

And in the running time, it only visits each marked vertex once or each vertex

connected as once. And so, for each one of them, it goes

through all the adjacent vertices. So that's the basic properties of

depth-first search. So now, the other thing that is important

is that a client who has, uses this algorithm after the depth-first search,

after the constructor has done the depth-first search and built these data

structures, Client can find the vertices connected to

the source in constant time. And can find a path, 2S if one exists in

time proportional to its length. Well, the marked array provides the first

part. And the second part is Just the property

of the edge to array. It's a, what's called a parent link

representation of a tree rooted at S. So if a vertex is connected to S then its

edge two is parent in a tree. So this code here.

Is going to for a given, well has path too so that's just return mark, that's the

first part. And then to actually get the path to a

given vertex so, here's the code for doing that.

We actually use a stack to keep track of the path'cause we get it in reverse order.

If there's no path, we return null. Otherwise we keep a variable X and we just

follow up through the edge to array Pushing the vertex on to the stack and

then moving up the tree in the ray, then finally push, push as itself on to the

path and then we have a stack which is edible which will give us our path.

So that's in time, time proportional to the length of the path and forth while to

check your understanding of how stacks in real works, irreversible to take a look at

this code to see that it does the job. So that's depth-first search.

Now it's not. Of the optimal graph searching method for

all applications. And here's an, an amusing representation

of how depth first search can maybe create problems sometimes.

So, I'm getting ready for a date, what situations do I prepare for?

Well, medical emergency, dancing, food too expensive.

Okay, what kind of medical emergencies could happen?

Well, I could be snake bite or a lightning strike or a fall from a chair.

Well, what about snakes, I have to worry about corn snakes or garder.

Say for copperhead. And then well, I better make a straight.

I better study snakes. And then the date says, I'm here to pick

you up. You're not dressed.

And well, so I really need to stop using depth-first search.

So we're gonna look at other graph searching algorithms.

But if you always try to expand the next thing that you come to, that's depth-first

search. And there's a lot of natural, situations

where that naturally comes to mind. Here's another example.

I took this photo of the Taj Mahal a couple of years ago and I didn't like the

color of the sky. So I used Photoshop's magic wand to make

it more blue. And the implementation, now this is a huge

graph. This picture's got millions of pixels.

And the way that the flood filled the magic wand works, is to build, from a

photo, what's called a grid graph, where every vertex is a pixel and every edge

connects two pixels that are the same color, approximately the same color.

And it builds a blob of all the pixels that have the same color as the given

pixel. So when I click on one, it does the

depth-first search to find all. All the connected pixels and therefore

change them to the new color that's a fine example of depth per search on a huge

graph that people use everyday. So that's our first nontrivial graph

processing algorithm depth-first search.