0:03
Next we're going to look at a slightly different graph processing application.
But it's a, a, a graph processing algorithm that's useful in many
applications we'll see in a minute. And slightly different than, depth and
breadth first search. It's called computing connected
components. Now I mentioned this a little bit when we
talked about basic definitions. So the idea is that if there's a path
between two vertices we say they're connected.
And what we want to do is reprocess the graph that is, build a data type that can
answer queries of the form, is V connected to W in constant time.
Now, we want to be able to do that for a huge, sparse graph of the type that
appears in practice. So we can't use the, if we could use the
adjacency matrix data structure, maybe we could do that but we can't.
So we're going to build a class that uses our standard representation, that will
enable clients to find connective components.
It's really interesting to think about this one.
We're getting the job done that we could get done if we get a huge, sparse matrix.
But if we have billions of vertices, there's no way we can have billions
squared in the matrix. So we have to find another way to do it.
So here's the data type that we want to implement.
So it's called and it's going to the constructor is going to build the data
structure that finds the connected components in the given graph to be able
to efficiently answer these connectivity queries.
It'll also be able to count the number of connective components.
And it also assigns an identifier from zero to count minus one, that identifies
the connective component that every vertex is in.
2:08
Now if you maybe, this sounds a little bit similar to what we did for the union sign
problem. So when you union find problem we're,
we're taking edges and we wanted to have queries that, that, well, well union is
like adding edge to the graph and then find is like are these things connected
yet. Now with union find, we found that we
couldn't quite answer the thing in constant time.
Are members are very slow growing on questions constant practical terms, but
not, not really. So it's less efficient than the algorithm
that we're going to talk about cause it doesn't quite get constant time.
On the other hand in another way it's better than the algorithm we're going to
talk about because you can either mix the unions and fines.
And in this case, because we're working with the graph it's like we're taking all
the unions and then we are handling fine requests.
So anyway what we're going to use to implement this is a depth first search
approach. So we'll do the depth first search, we'll
just keep different data then we did, than when we were findings paths.
So the algorithm is based on the notion that connection is an equivalence
relation. So recall an equivalence relation has
these three properties. Every vertex is connected to itself.
If v is connected to w then w is connected to v.
And if v is connected to w and w to x then v is connected to x.
And those basic operations underlie the algorithm that we're going to talk about.
So the equivalence relation is a, a general mathematical concept that implies,
in graph theory in this case. And as I already mentioned, in the case of
graph, it implies that. The vertices divide up into connected
components which are maximal sets of connected vertices.
So our sample graph has three connected components.
And what we'll do is assign identifiers to each one of the components in that will
for every vertex. Give us an identifier number for that
vertex. And that's the data structure then our
depth for search is going to built, and that immediately gives the constant time
implementation of the connectivity queries.
Given two vertices you look up their ID and if they're equal, they're in the same
connected component, if they're different they're not.
So that's the data structure that we're going to build it's like a unifying tree,
where there's just, all the trees are flat.
8:43
Okay so, now, we have to visit three. To visit three we have to check five and
four. Five was marked and nothing to do.
Four was marked, nothing to do. So we're done with three.
Done with three, we can continue the depth per search from five.
We have to check four and zero. Four was marked.
Zero was marked. So we're done.
And now we could continue with the depth per search of four.
We have to check six- three. Six was marked.
Three was marked and we're done. And we can,
Now six is done. And now we can continue with zero and we
have to check two. Check two it's not marked.
So we mark it and give it a connected component number of zero.
To visit two, all we do is check zero which is marked, so we're done.
And then we do the same thing with one, Unmarked.
So we visit it, I guess assign it zero. And then visit one.
And to do that, we check zero, which is marked.
So we're done with one. And then, to finish zero we have to check
five. And that's, marked so we don't have to do
anything. And we're done with zero.
So, now we're done with the first connected component.
But, we're not done with the whole graph. So what we want to do is, Go look for, so
that's a connected component, corresponding to zero.
Now we want, what we want to do is go look for an unmarked vertex.
So, we started at zero and we check one, two, three, four, five, and six.
And they're all marked. And so the next unmarked vertex that we
find in the graph is seven. So once we return from the depth first
search for zero, we Increment, counter. Which is our, how many connected
components have we seen. And now, we're going to assign that number
to everything that's connected to seven on the depth-first search on seven.
So what do we do to depth per research and sum we check eight.
8s on the mark. And so we go visit it.
So we assign it, connect the component of number of one same as seven.
And mark it. And then go ahead and recourse to visit
eight. We check seven which is marked.
So there is nothing to do. We're done with eight.
And then we're done with seven. So now we're done with all the vertices
that were connected to seven. We increment our counter of number of
components to two, and look for another vertex.
So now we check eight, which we already know is marked, connected to seven.
So now nine is unmarked so we're going to do a DFS from nine.
So everybody connected to nine is going to get assigned a connected component number
of two. So at nine, we have to check eleven and
that was unmarked. So we visit it.
Give it a two. To visit eleven, we have to check nine,
which is marked. So nothing to do.
And twelve, which is unmarked. So we visit it and, give it a number of
two. To visit twelve, we have to check eleven,
which is marked and nine which is also marked.
And then we're done with twelve. And then we're done with eleven.
And then, to finish doing nine, we have to check ten and twelve.
Ten is unmarked. So we mark it and give it a number of two.
To visit ten, we check nine which is marked, so we're done.
And then finally, to finish, the DFS. We check, twelve from nine, and that's
marked. So we're done with nine.
And now we keep looking. And we find that, ten, eleven, twelve, are
all marked. So we've completed the.
Computation. And for every vertex we have a connected
component number. And for any given query we can test
whether their in the same connected component simply by looking up that number
and seeing if it's equal. That's a demo of connected components
computation. Okay, so here's the code for finding
connected components with DFS. Which is, another straightforward DFS
implementation. Just like the other one.
And it just keeps, slightly, different, data structure.
So, The We keep the marked data structure, which is the vertices that we visited.
And then we keep this vertex index array ID which gives the identifier, the
component containing V, I think we call it on the demo.
And then a count of the number of components that we've seen.
So the constructor creates, the, marked array and it creates this idea array.
But now the constructor does more work than a single call on DFS.
What it does is it goes through, this is the constructor.
Goes through every vertex in the array in the graph.
And if it's not marked, it does the DFS. And that DFS will mark a lot of other
vertices. But when it's done that's all of those are
going to get assigned a value of count. And we're going to increment count, then
go and look for another unmarked vertex. Anything that wasn't marked by that first
DFX, DFS, we'll do a DFS from that one, and mark all its vertices with the next
value for the ID. So now let's look at the implementation of
DFS. It's recursive array just like the one
that we did. For, for path finding.
Except all that we do, when we mark a vertex.
We also simply set its ID to the current component name.
So all the vertices that are discovered in the same call of DFS have the same ID.
And to visit a vertex you go through all its adjacent vertices.
And any that are not marked you give a recursive DFS call.
Again, this code is amazingly compact and elegant.
When we're going through the demo step by step maybe you, you can see that the
underlying computation is actually kind of complex.
But, but, recursion and the graph-processing API that we set up
provides a compact and easy to understand implementation.
So that's using DFS to find connector components and then to return the idea of
a given vertex, just look at up in the array and to return then our components
just you can count and then you can build up the needed connectivity API from those.
So that's depth first search to find connected components.
I will just talk briefly about two applications from scientific applications.
So here's an application of sexually transmitted diseases at a high school.
And, simply the vertices are people, blue are men and pink are women.
And you I have an edge between if there was a contact.
And so it's obvious that you're going to be interested in connective components of
this graph. To be able to properly study sexually
transmitted diseases. These individuals had no contact with
these. Then, the, whichever one has the disease,
maybe it won't spread. Or if you add a new edge, then maybe you
have a problem. That's just one example of studying the
spread of disease. Here's another example that we use, for,
for that's similar to the flood fill example.
And this is processing data from a scientific experiment.
And in, in this case this image is comes from a photograph.
And the white things are particles that are moving.
And all we get is a image where it's a grey scale image.
And so, what we'll do to do this processing is, we want to identify the
movement of these particles over time. And the way we do it is build a grid
graph, like the one for the flood fill application and do an edge connecting two
vertices. If they're different,
He said their gray scale values is greater than less than some threshold.
And so then if you do that, and then find the connective components, then you can
identify blobs which correspond to real particles in this simulation.
And they do that every frame in a movie, then you can track moving particles over
time. So these are maybe fairly high resolution
images. These are graphs with lots and lots of
edges, and you need to be able to do this computation quickly in order to do this
scientific experiment. And we'd use this as an example.
Example in our first year programming course it is based on computing connected
components using depth-first search. So that's our third example of a graph
processing algorithm.