0:06

Now we will look a Kruskal's algorithm for computing the MST this is an old algorithm

that dates back over 50 years but it is an effective way to solve the problem.

The idea is very simple. We are going to take the edges and we are

going to sort them by weight. And we are going to consider ''em in order

of ascending weight. And the algorithm is simply add the next

edge to the tree. Unless that edge would create a cycle.

If it, if the edge creates cycle, ignore it and go on to the next one.

Let's look at a demo of Kruskal's algorithm.

Okay. So we have the edges sorted by weight and

we're going to take them in ascending order of weight.

The smallest edge is in the MST. And so we add that to the tree it doesn't

create a cycle. Next smallest edge is two, three.

That doesn't create a cycle. Next one is one, seven that's fine.

Next one is zero, two. Next one, five, seven.

Now when we get to one, three that one creates a cycle and the algorithm says,

creates a cycle just ignore it. Next one is one, five that one also

creates a cycle. one, five, seven. Again, just ignore it.

Two, seven creates a cycle not in the MST. Four, five that, the next smallest edge

does not create a cycle so we add it to the MST.

One, two creates a cycle. Four, seven creates a cycle.

Zero, four creates a cycle. And finally we get to six, two which does

not create a cycle, so we add that to the tree.

At this point we have an MST we could check that we have V - one edges and stop

or we can just keep going and see that the other edges in the graph create a cycle.

And when we're done considering all the edges then we have computed the minimum

spanning tree. That's with Kruskal's Algorithm.

Consider the edges in descending order of weight, add the next edge to T, unless

doing so would create a cycle. Now let's look at what Kruskal's algorithm

does on a very large graph. You get a good feeling for what it's

doing. It's taking the small edges and they

coalless together in little clusters and eventually the edges get longer and longer

and they connect together the clusters. Initially it is broken up into the very

small edges and after a while if you look at the larger clusters eventually an edge

will come that will connect them together. It's not going to be so easy to see how

the algorithm ends. Though there's some, long edge there.

That's the longest edge in the MST that'll finally connect the graph.

3:08

This simulation really gives a good feeling for Kruskal's algorithm.

In fact when we first had personal computers I was fortunate to be in Xerox

Park. And I remember very well in the late 70s,

This was one of the first algorithms that I wanted to see visualized on a personal

computer. And I, I wrote a program at that time that

produced pretty much this image it's a very interesting way to look at MST

algorithms. Alright, so it's easy to implement we have

to prove that it computes the MST of course I and we've done a lot of work

setting that up. And.

3:54

We do so just by proving that its a special case of the greedy MST algorithm.

So what does that mean? Well.

It, suppose that crosco's algorithm. Colors a given edge black.

So say it's VW. So we'll define a cut.

That, is the set of vertices that are connected to V.

So it might be just a V, but if theres any black edges connecting V to other vertices

we put all of those in the cut. So for that cut there's no black crossing

edge. Cuz it's a, it's a component.

And the other thing is that there's no crossing edge with lower weight than VW.

And why is that? Well we haven't examined any of those

edges yet, and they're all longer because we are considering the edges in increasing

order of their weight. So this new edge that connects V somewhere

else is a crossing edge of minimal weight. And so therefore the algorithm always

finds a cut and colors a crossing edge of minimal weight for that cut black.

And it's an instance of the Greedy Algorithm.

Now we still have a bit away from having an implementation of Kruskal's algorithm.

So the question is, how do we know whether adding a new edge to the tree will cause a

cycle? So, you might think about how difficult

that is. I've, I've got a tree, that's represented

as the set of edges or however. And I have a new edge, and I want to know

whether it creates a cycle or not. How, how difficult is that going to be?

Well, it turns out way back in the first lecture of algorithms part one we had a

way. What one thing we could do is just run DFS

to check if we can get to W from V. That'd take time proportional to V.

But the union find data structure that we did in the first lecture absolutely does

the job. It's just testing whether this edge

connects anybody in, in the equivalence class corresponding to V with anybody in

the equivalence class corresponding to Y, for to W and if it did, it creates a

cycle. So union find is exactly what we need.

6:36

So we're going to what we're going to do is maintain a set for each of the

connected components in the spanning tree, that we built up so far and so if the new

edge connects vertices that are in the same set, then that means it would create

a cycle. And otherwise we're going to be adding the

edge to the tree. So then we just do the union operation and

merge the set containing V with the set containing W.

Those are the two things that we need to do and those are the two things that union

find does for us. So this leads to a very compact

implementation of Kruskal's Algorithm. Now to consider the edges in order, we'll

use a, a priority queue, a modern data structure.

So let's take a look the, the every line of this implementation of Kruskal's

algorithm. We are going to have a queue of edges.

That's how we are going to represent the MST.

It could be a stack or bag, but, but we'll use a queue.

And then, the, Cruskill MST algorithm takes a graph.

And it's going to compute the MST in this MST.

And to do that, it's going to use a priority queue, a min priority queue.

A minimum oriented priority queue of edges.

So we'll build that minimum, priority queue.

Now edges are comparable. We had a compare to a method so our, our,

8:16

generic priority queue code is going to work fine.

And so, what we'll do is we'll put all the edges in the graph into the priority

queue. So that's building a priority queue.

Containing all the edges in the graph. Alternatively, we could put the edges in

an array and sort the array, but priority queues.

A more elegant way to express this, and is a way to look at more efficient algorithms

as well. Okay, so that's priority queue, so that's

the first data structure from part one that we're going to make use of.

And then the second one is the union find data structure.

So we're going to build a union find data structure with for vertices because the

spanning force divides the vertices into equivalence classes.

So then we go into the main loop. And the main loop stops in two conditions.

We run out of edges is one, where we'd have a minimum spinning force.

Or the other condition is we get to V - one edges in the MST.

So as long as neither one of those is true, then we've got an edge in the

priority queue, and we want to take the smallest one.

We want to get its vertices V and W either and other.

And then we want to check using the union find algorithm if they're connected.

If they're connected we don't want to do anything, if they're not connected then we

want to merge them, and put that edge onto the MST.

Thats it. And then when the that's the full

implementation and then we have the edges method for the client to return the MST.

It's quite a compact implementation of a classic algorithm.

Using the basic data structures that we built up in algorithm part one data

structures and algorithms of priority queue.

And union find gives a, a, really fine implementation of this MST algorithm.

So what about the running time of this algorithm?

Well it's not hard to see that it's going to compute the MST in time proportional to

e log e. It's linearithmic in the number of edges.

Which means that we can use it for huge graphs.

So this is just, this table is just a proof that, summarizes the costs.

We're going to first build the priority queue.

And, we can do that in linear time using bottom-up, construction method.

We have to, every edge comes off the priority Q.

So there's E of them, and it takes log E per operation.

Snd then union operations, every vertex is involved in one, and connected operations,

every edge is involved in one. So the total time is dominated by the E

log E for the priority queue operations. One thing to note is that, if the edges

come in, in sorted order, for some reason, it's almost linear.

The order of growth is E log star of E. And, actually we don't have to always,

sort'em. We can, in real life situations, we can

stop when we get the V-1 edges in the MST. When we have those v - one edges we can

stop and usually that's for practical situations that's way before we see all

the edges in the graph. So that's Kruskal's algorithm for

computing the MST.