0:03

So we're going to complete our study of MST algorithms by considering some context

both as an unsolved problem in theoretical computer science and as a practical

problem. So the unsolved problem in theoretical

computer science that has defied researchers for many decades is, it is

possible to find a linear-time MST algorithm?

Is there an algorithm that only examines each edge at most once on the average?

Now, this doesn't have. That much practical comp.

Consequence since the versions of Primm's algorithm that we've talked about can get

the running time down quite low for the sparse graphs, the huge sparse graphs that

we encounter in practice. But it's still, its...

Like union find, it's a tantalizing problem.

Unlike union find, it hasn't been resolved yet.

Union find remember, we couldn't get a linear algorithm but at least Tarjan

proved that no such algorithm exists. For MST, we're not even there yet.

And a lot of people have worked on it. So let's go with the simple model where

what you get to do is compare edges. Compare weights on edges.

In 1975 Yao proved that there exists an algorithm that's worst case running time

is e log, log v. In 1976, Cheriton and Tarjan came up with

another such algorithm. And then Fredman and Tarjan found the e

plus e log v algorithm or e log star v For MST's in'84.

2:40

That now. In 2002, Optimal, well, let's, talk about

that. They, they, they showed an, an algorithm

that it, better not to talk about the theory of that.

[laugh]. And that the, still, the open question is,

is there, an algorithm whose worst case running time is guaranteed to be

proportional to e? Or could, someone prove that no such

algorithm exists? It's one of the most, tantalizing open

questions in computer science. As we get into, graph algorithms in more

detail. We'll see some other examples of problems

for which we know pretty good algorithms but would like to know whether there are

better algorithms or not. And MST is a fine example of that.

That's the orange means Princeton. There is a randomized linear time

algorithm that was discovered in 1995, but that's not the same as solving it worst

case in linear time. So that's one context.

Mxt is an important problem that's still been studied by theoretical computer

scientist and we still don't know the best algorithm.

Here's another one, So-called Euclidean MST.

And this one is what's appropriate in some practical situations.

So now you have points in the plane and the graph is an implicit dense graph.

That is, we take as an edge in the graph, the distance between this point and every

other point. So if there's n points there's n squared

edges because every point's connected to every other point.

And what we want is, in that graph, we want to find the subset of edges that

connects all the points, that's minimal. That's actually, in a lot of practical

problems, that's what you want. So as it stands, the algorithms that we've

talked about are not useful for this beacause they're going to take quadratic

time, because e's quadratic. That's how many edges there are in the

graph. So you know, you could just go ahead and

build the graph with the N squared over two distances and run Prim's algorithm.

But that's not very satisfying for a huge number of points.

It's actually not Too difficult to exploit the geometry of

the situation. And get it done in time proportional to N

log N. What is typically done is to build a sub

graph, where each point is connected to a certain number of points that are close to

it. There's a particular graph called the

Voronoi diagram, or the Delaunay triangulation, that does that.

And it's been proved, number one that, that graph has linear number of edges not,

quadratic, and it's also the MST is a sub graph of the d-linear triangulation.

So you can get it done in linear arrhythmic time for Euclidean MST.

Separate problem related but still a very interesting in many practical

applications. Here's another application in se, several

scientific studies, there's the idea of clustering.

And so what we wanna do is, we have a set of objects and they're related by a

distance function that specifies how close they are and what we wanna do is divide

the objects into a given number k of coherent groups so that objects in

different clusters are far apart. So wanna see how things cluster together.

And here's a really old example of a application of this where there was an

outbook, outbreak of cholera deaths in. London in the 1850s and, if you plot where

all of the deaths happened, scientific study could find that they were clustered

around certain places. And, actually they were able to identify

well pumps that were leading to the, the cholera just by doing clustering study.

And, that is a very special case. There are many, many other applications

where clustering is an important process, an important thing to be able to compute.

So like mobile networks for web search there's an idea of the distance between

doc, documents and you wanna categorize them in clusters.

There's the, all the objects that have been discovered in the sky, you wanna

cluster them together in a reasonable way. And all kinds of. Of processing having to

do with huge data bases, trying to get information that seems close together to

be close together into a relatively small number of clusters.

So there's, a, a approach called single link clustering.

Where you talk about the single length, the distance between two clusters equaling

the distance between the two closest objects, one in each cluster.

And so, so-called single length clustering is given at integer K.

Find the K clustering that maximizes the distance between the two closest clusters.

So that's a well defined computational problem.

And there's a very well-known algorithm, in the science literature for this

problem, signal, signal-link clustering. Form of e-clusters, find the closest pair

of objects such that each object's in a different cluster, and merge the two

clusters. And repeat until they're exactly

k-clusters. You'll find this algorithm in the

scientific literature. What's amazing is that, this is

Crussical's algorithm, just stop when you found the k-connected components, so that,

the, Or another thing you could do is just run Prim's algorithm and then after you've

run Prim's algorithm get rid of the largest edges until you're left with

k-clusters. So out of all the efficient Algorithms

that we've talked about are gonna apply for single-link clustering.

And actually scientists who also know some computer science now are able to handle

huge problems that would not be possible without efficient algorithms.

This is just one, one example where a, a cancer study where experiments are

connecting genes with the way they're expressed in different parts of the body,

and trying to cluster together tumors in similar tissues.

And again, such experimental results can amount, result in huge amounts of data,

and MST algorithms are playing a role in scientific research of this type.

That's our context for minimal spanning trees.