In addition to the graph of self the input includes a cost, for each of the

edges, we're going to use the notation C sebies of note the cost of a edge, E.

And in another contrast, to are discussion of shortest path problems,

we're actually not going to care if the edge cost are positive or negative, they

can be any number whatsoever. So no prizes for guessing what the

outputs supposed to be, it's right there in the problem definition, the output is

supposed to be a minimum cost spanning tree of the graph, but let's drill down

and explain exactly what we mean by that. So first of all what do we mean by the

cost of a tree or generally the cost of a sub graph, as a subset of the edges.

Well we're just going to be looking at summing up the edges in the tree that we

output. Now the other question is what do I mean

by a tree that spans all vertices? So let me tell you exactly what this

means, the sub graph T should have two properties, first of all there can not be

any cycles, there can not be any loops in this tree.

And by spanning all vertices, what I mean is that this sub graph is what's called

connected. That is, there's a path, using the edges

and t, from any vertex of the graph to any other vertex.

That's what it means to span all of the vertices.

So for example, consider the following graph with four vertices and five edges.

I've labeled each of the five edges with a cost, which in this case, is just an

integer between one and five. So, let's look at some example subgraphs,

let's start with the three edges, A, B, B, D and CD.

This sub-graph satisfies properties one and two.

That is, it has no cycles, there's no loops and it spans all of the vertices.

If you start at any one of these four vertices, you can get to any of the other

four vertices by using only red edges. So in that sense, this red sub-graph is a

spanning tree. However, it is not the minimum cost

spanning tree. There is another spanning tree which is

even cheaper, has a smaller sum of edge costs, namely the edges AC, AB, and BD.

This also has no cycles and it's also connected but the sum of the edge cost is

only seven, smaller than the eight of the previous spanning tree.

In fact, this pixograph is the unique minimum spanning tree of this graph.

There is a sub graph that has three edges which has an even smaller sum, of edge

costs, namely the triangle AB, BD and AD. But this light blue sub graph, this

triangle, is not a spanning tree. In fact, it fails on both counts.

It does obviously have a cycle. It has a loop.

That's, what it is by definition. It's also not connected, so there's no

way to get from C, the vertex, to any of the other three vertices by following

only light blue edges. It's disconnected, and so it fails

property one as well. So the MST problem in general is you're

given it under a graph, like, for example, this four note, five edge graph,

or presumably. something much larger and an interesting

problem and your suppose to quickly identify the minimum spanding tree like

in this example the pink subgraph. So what I want to do next is something

you're probably quite accustomed to me doing by this point, is I want to make a

couple of mild simplifying assumptions just among friends.

So these assumptions are not important in the sense that all of the conclusions of

these lectures will remain true, will remain valid even if these assumptions

are violated but it'll make the lectures a little bit easier.

It'll allow us to focus on the main points and not get distracted by less

relevant details so here are the two assumptions that we're going to make

throughout all of the lectures on minimum spanning trees.

The first assumption we're going to make is that the input graph G is itself

connected. That is G contains a path from any vertex

to any other vertex. So why am I making this assumption?

Well if this assumptions violated then the problem isn't even well defined.

If the graph isn't connected then certainly none of it's subgraphs are

connected so it has no spanning trees and it's not clear what we're trying to do.

So, those of you who still remember the stuff we covered in part one in

particular, graph search. Should recognize that this condition's

easy to check in a pre-processing step. Just run something like breadth first

search or depth first search. Remember, we know how to implement those

in linear time. And those will, in particular, tell you

whether or not the input graph is connected.

Now, another thing you might be wondering is, suppose it was disconnected.

Then what? Should be really just sort of throw up

our hands and give up? You can define a version of the minimum

spanning tree problem. A more general one called minimum

spanning forest. Where, basically you want the minimum

cost sub graph that spans as much stuff as possible.

Essentially, it's responsible for computing a spanning tree within each of

the connected components of the original graph.

And using the algorithms I'll show you here, Prim's algorithm, Kruskal's

algorithm, they're easily modified to solve the more general problem with

disconnected input graphs as well. But again, for simplicity among friends,

let's just focus on the connected graph case that contains all of the main ideas.

Our second standing assumption throughout all of the minimum of spanning tree

lectures will be that in the input graph the edge costs are distinct.

So you're already use to this sort of no ties kind of assumption from our foray

into scheduling algorithms, and we're going to do something similar here.

Now again this assumption is not important in the sense that the

algorithms that we cover prims algorithm crustgrals algorithm.

They remain correct even if the input has equal cost edges, irrespective of how

ties are broken. So the algorithms are correct as widely

as you would want. That's it.

I'm not going to actually prove for you that they are correct with ties.

Remember we had our scheduling, application it was a little bit easier to

get a proof of correctness without ties, I gave you that, and then optionally

there was a slightly more complicated argument that handled ties.

You can do the same thing here, but I'm just not going to give it to you.

I'll leave that for the keen viewer to work out for themselves.