0:03

Next we're going to look at a slightly different graph processing application.

Â But it's a, a, a graph processing algorithm that's useful in many

Â applications we'll see in a minute. And slightly different than, depth and

Â breadth first search. It's called computing connected

Â components. Now I mentioned this a little bit when we

Â talked about basic definitions. So the idea is that if there's a path

Â between two vertices we say they're connected.

Â And what we want to do is reprocess the graph that is, build a data type that can

Â answer queries of the form, is V connected to W in constant time.

Â Now, we want to be able to do that for a huge, sparse graph of the type that

Â appears in practice. So we can't use the, if we could use the

Â adjacency matrix data structure, maybe we could do that but we can't.

Â So we're going to build a class that uses our standard representation, that will

Â enable clients to find connective components.

Â It's really interesting to think about this one.

Â We're getting the job done that we could get done if we get a huge, sparse matrix.

Â But if we have billions of vertices, there's no way we can have billions

Â squared in the matrix. So we have to find another way to do it.

Â So here's the data type that we want to implement.

Â So it's called and it's going to the constructor is going to build the data

Â structure that finds the connected components in the given graph to be able

Â to efficiently answer these connectivity queries.

Â It'll also be able to count the number of connective components.

Â And it also assigns an identifier from zero to count minus one, that identifies

Â the connective component that every vertex is in.

Â 2:08

Now if you maybe, this sounds a little bit similar to what we did for the union sign

Â problem. So when you union find problem we're,

Â we're taking edges and we wanted to have queries that, that, well, well union is

Â like adding edge to the graph and then find is like are these things connected

Â yet. Now with union find, we found that we

Â couldn't quite answer the thing in constant time.

Â Are members are very slow growing on questions constant practical terms, but

Â not, not really. So it's less efficient than the algorithm

Â that we're going to talk about cause it doesn't quite get constant time.

Â On the other hand in another way it's better than the algorithm we're going to

Â talk about because you can either mix the unions and fines.

Â And in this case, because we're working with the graph it's like we're taking all

Â the unions and then we are handling fine requests.

Â So anyway what we're going to use to implement this is a depth first search

Â approach. So we'll do the depth first search, we'll

Â just keep different data then we did, than when we were findings paths.

Â So the algorithm is based on the notion that connection is an equivalence

Â relation. So recall an equivalence relation has

Â these three properties. Every vertex is connected to itself.

Â If v is connected to w then w is connected to v.

Â And if v is connected to w and w to x then v is connected to x.

Â And those basic operations underlie the algorithm that we're going to talk about.

Â So the equivalence relation is a, a general mathematical concept that implies,

Â in graph theory in this case. And as I already mentioned, in the case of

Â graph, it implies that. The vertices divide up into connected

Â components which are maximal sets of connected vertices.

Â So our sample graph has three connected components.

Â And what we'll do is assign identifiers to each one of the components in that will

Â for every vertex. Give us an identifier number for that

Â vertex. And that's the data structure then our

Â depth for search is going to built, and that immediately gives the constant time

Â implementation of the connectivity queries.

Â Given two vertices you look up their ID and if they're equal, they're in the same

Â connected component, if they're different they're not.

Â So that's the data structure that we're going to build it's like a unifying tree,

Â where there's just, all the trees are flat.

Â 8:43

Okay so, now, we have to visit three. To visit three we have to check five and

Â four. Five was marked and nothing to do.

Â Four was marked, nothing to do. So we're done with three.

Â Done with three, we can continue the depth per search from five.

Â We have to check four and zero. Four was marked.

Â Zero was marked. So we're done.

Â And now we could continue with the depth per search of four.

Â We have to check six- three. Six was marked.

Â Three was marked and we're done. And we can,

Â Now six is done. And now we can continue with zero and we

Â have to check two. Check two it's not marked.

Â So we mark it and give it a connected component number of zero.

Â To visit two, all we do is check zero which is marked, so we're done.

Â And then we do the same thing with one, Unmarked.

Â So we visit it, I guess assign it zero. And then visit one.

Â And to do that, we check zero, which is marked.

Â So we're done with one. And then, to finish zero we have to check

Â five. And that's, marked so we don't have to do

Â anything. And we're done with zero.

Â So, now we're done with the first connected component.

Â But, we're not done with the whole graph. So what we want to do is, Go look for, so

Â that's a connected component, corresponding to zero.

Â Now we want, what we want to do is go look for an unmarked vertex.

Â So, we started at zero and we check one, two, three, four, five, and six.

Â And they're all marked. And so the next unmarked vertex that we

Â find in the graph is seven. So once we return from the depth first

Â search for zero, we Increment, counter. Which is our, how many connected

Â components have we seen. And now, we're going to assign that number

Â to everything that's connected to seven on the depth-first search on seven.

Â So what do we do to depth per research and sum we check eight.

Â 8s on the mark. And so we go visit it.

Â So we assign it, connect the component of number of one same as seven.

Â And mark it. And then go ahead and recourse to visit

Â eight. We check seven which is marked.

Â So there is nothing to do. We're done with eight.

Â And then we're done with seven. So now we're done with all the vertices

Â that were connected to seven. We increment our counter of number of

Â components to two, and look for another vertex.

Â So now we check eight, which we already know is marked, connected to seven.

Â So now nine is unmarked so we're going to do a DFS from nine.

Â So everybody connected to nine is going to get assigned a connected component number

Â of two. So at nine, we have to check eleven and

Â that was unmarked. So we visit it.

Â Give it a two. To visit eleven, we have to check nine,

Â which is marked. So nothing to do.

Â And twelve, which is unmarked. So we visit it and, give it a number of

Â two. To visit twelve, we have to check eleven,

Â which is marked and nine which is also marked.

Â And then we're done with twelve. And then we're done with eleven.

Â And then, to finish doing nine, we have to check ten and twelve.

Â Ten is unmarked. So we mark it and give it a number of two.

Â To visit ten, we check nine which is marked, so we're done.

Â And then finally, to finish, the DFS. We check, twelve from nine, and that's

Â marked. So we're done with nine.

Â And now we keep looking. And we find that, ten, eleven, twelve, are

Â all marked. So we've completed the.

Â Computation. And for every vertex we have a connected

Â component number. And for any given query we can test

Â whether their in the same connected component simply by looking up that number

Â and seeing if it's equal. That's a demo of connected components

Â computation. Okay, so here's the code for finding

Â connected components with DFS. Which is, another straightforward DFS

Â implementation. Just like the other one.

Â And it just keeps, slightly, different, data structure.

Â So, The We keep the marked data structure, which is the vertices that we visited.

Â And then we keep this vertex index array ID which gives the identifier, the

Â component containing V, I think we call it on the demo.

Â And then a count of the number of components that we've seen.

Â So the constructor creates, the, marked array and it creates this idea array.

Â But now the constructor does more work than a single call on DFS.

Â What it does is it goes through, this is the constructor.

Â Goes through every vertex in the array in the graph.

Â And if it's not marked, it does the DFS. And that DFS will mark a lot of other

Â vertices. But when it's done that's all of those are

Â going to get assigned a value of count. And we're going to increment count, then

Â go and look for another unmarked vertex. Anything that wasn't marked by that first

Â DFX, DFS, we'll do a DFS from that one, and mark all its vertices with the next

Â value for the ID. So now let's look at the implementation of

Â DFS. It's recursive array just like the one

Â that we did. For, for path finding.

Â Except all that we do, when we mark a vertex.

Â We also simply set its ID to the current component name.

Â So all the vertices that are discovered in the same call of DFS have the same ID.

Â And to visit a vertex you go through all its adjacent vertices.

Â And any that are not marked you give a recursive DFS call.

Â Again, this code is amazingly compact and elegant.

Â When we're going through the demo step by step maybe you, you can see that the

Â underlying computation is actually kind of complex.

Â But, but, recursion and the graph-processing API that we set up

Â provides a compact and easy to understand implementation.

Â So that's using DFS to find connector components and then to return the idea of

Â a given vertex, just look at up in the array and to return then our components

Â just you can count and then you can build up the needed connectivity API from those.

Â So that's depth first search to find connected components.

Â I will just talk briefly about two applications from scientific applications.

Â So here's an application of sexually transmitted diseases at a high school.

Â And, simply the vertices are people, blue are men and pink are women.

Â And you I have an edge between if there was a contact.

Â And so it's obvious that you're going to be interested in connective components of

Â this graph. To be able to properly study sexually

Â transmitted diseases. These individuals had no contact with

Â these. Then, the, whichever one has the disease,

Â maybe it won't spread. Or if you add a new edge, then maybe you

Â have a problem. That's just one example of studying the

Â spread of disease. Here's another example that we use, for,

Â for that's similar to the flood fill example.

Â And this is processing data from a scientific experiment.

Â And in, in this case this image is comes from a photograph.

Â And the white things are particles that are moving.

Â And all we get is a image where it's a grey scale image.

Â And so, what we'll do to do this processing is, we want to identify the

Â movement of these particles over time. And the way we do it is build a grid

Â graph, like the one for the flood fill application and do an edge connecting two

Â vertices. If they're different,

Â He said their gray scale values is greater than less than some threshold.

Â And so then if you do that, and then find the connective components, then you can

Â identify blobs which correspond to real particles in this simulation.

Â And they do that every frame in a movie, then you can track moving particles over

Â time. So these are maybe fairly high resolution

Â images. These are graphs with lots and lots of

Â edges, and you need to be able to do this computation quickly in order to do this

Â scientific experiment. And we'd use this as an example.

Â Example in our first year programming course it is based on computing connected

Â components using depth-first search. So that's our third example of a graph

Â processing algorithm.

Â