0:00

So, in this set of lectures we'll be discussing the minimum cut problem and

graphs and we'll be discussing the randomized contraction algorithm.

Randomize algorithm which is so simple and elegant and it's almost impossible to

believe that it can possibly work but that's exactly at what we'll be approving.

So one way you can think about these set of lectures, as a segue of sorts,

between our discussion of randomization and our discussion of graphs.

So we just finished talking about randomization in the context of sorting

and searching.

We'll pick it up again toward the end of the class when we discuss hashing.

But while we're in the middle of randomization probability review,

I'm going to give you another application of randomization

in a totally different domain.

In particular to the domain of graphs, rather than to sorting and searching.

So that's one high level goal of these lectures.

The second one, is we'll get our feet wet talking about graphs, and

a lot of the next couple weeks,

that's what we're going to be talking about, fundamental graph primitives.

So this will give us an excuse to start warming up with the vocabulary,

some of the basic concepts of the graphs and what a graph algorithm looks like.

0:55

Another perk, although it's not one of the main goals, but I want to do,

I do want to point this fact, compared to most of this stuff that we're discussing

in this class, this is a relatively recent algorithm, the contraction algorithm.

By relatively recent I mean, it's 20 years old.

1:14

But at least that means most of us, I know not all of us, but

most of us at least were born at the time that this algorithm was invented.

And so just one quick digression.

In an intro course like this, most of what we're going to cover are oldies but

goodies, stuff from as much as 50 years ago.

And while it's kind of amazing, given how much the world and

how much technology has changed over the past 50 years,

that ideas in computer science from that long ago are still useful, they are.

It's just sort of an amazing thing about the stuff that

the first generation of computer scientists figured out.

It's still relevant to this day.

That said, algorithms is still a vibrant field with lots of open questions.

And when I have an opportunity, I'll try and give you glimpses of that fact.

So I do want to point out here that this is a somewhat more recent algorithm

than most of the other ones we'll see, which dates back to the 90s.

2:49

And there's two flavors of graphs and both are really important.

Both come up all the time in applications, so you should be aware of both kinds.

So there's undirected graphs and directed graphs and

that just depends on whether the edges themselves are undirected or directed.

So edges can be undirected by which I mean this pair is unordered.

There are just two vertices of an edge the two endpoints, say U and V,

and you don't distinguish one as the first and one as the second.

3:39

Now I think all of this is much clearer if I just draw some pictures.

Indeed one use to call graphs, dots and lines.

The dots would refer to the vertices, so here's four dots, or four vertices.

And the edges would be lines,

so the way you denote one of these edges is you just draw a line between the two

end points of that edge, the two vertices that it corresponds to.

So this is undirected graph with four vertices and five edges.

We can equally we'll have a directed version of this graph.

So let's still have four vertices and five edges, but

to indicate that this is directed graph and then each edge was first vertex and

the second vertex, were going to add arrows to the line.

So the arrow points to the second vertex, or to the head of the edge.

So, the first vertex is often called the tail of the edge.

So, graphs are completely fundamental,

they show up not just in computer science but in all kinds of different disciplines,

social sciences and biology being two prominent ones.

So, let me just mention a couple of reasons you might use them just off

the top of my head but literally there's hundreds or thousands of others, so

a very literal example would be road networks.

So imagine you type in asking for your driving directions from point A to point B

in some web application or software, or whatever, it computes a route for you.

What it's doing, is it's manipulating some representation of a road network,

which inevitably is going to be stored as a graph, where the vertices corresponds to

intersections and the edges correspond to individual roads.

The Web is often fruitfully thought of as a directed graph, so

here the vertices are the individual web pages, and edges correspond to hyperlinks.

So the first vertex in an edge detail is going to be the page that contains

the hyperlink.

The second vertex, or the head of the edge,

is going to be what the hyperlink points to.

So that's the Web as a directed graph.

Social networks are quite naturally represented as graphs.

So here the vertices correspond to the individuals in the social network.

And the edges correspond to relationships.

They have friendship links.

I encourage you to think about among the popular social networks these days,

which ones are undirected graphs and which ones are directed graphs,

we have some interesting examples of each of those..

6:06

So to say what I mean, you might think about,

let's say you're a freshman in college and you're looking at your majors,

you're a science major and you want to know what courses to take in what order.

And you can think about the following graph where each of the courses in your

major corresponds to a vertex And you draw a directed edge from course A to course B,

if course A is a prerequisite for course B.

That is, it has to be completed before you begin course B.

Okay, so that's a way to represent dependencies, sort of a temporal ordering,

between pairs of objects using a directed graph.

6:51

So, the definition of a cut of a graph is very simple, it's just a grouping,

a partition of the vertices of the graph into two groups, A and B, and

both of those two groups should be non-empty.

So, to describe this in pictures,

let me give you a cartoon of the cut in both the undirected and directed cases.

So for an undirected graph, you can imagine drawing your two sets, A and B.

And once you've defined the two sets A and

B, the edges then fall into one of three categories.

You've got edges with both of the endpoints in A.

8:12

Usually when we talk about cuts,

we're going to be concerned with how many edges cross the given cuts.

And by that I mean the following, the crossing edges of a cut (A,B)

are those that satisfy the following property.

So in the undirected case,

it's exactly what you think it would be, one of the endpoints is an A,

the other endpoint is in B, that's what it means to cross the cut.

8:34

Now in the directed case, there's a number of reasonable definitions you could

propose, about which edges crossed the cut.

Typically and in this course, we're going to focus on the case

where we only think about edges that cross the cut from the left to the right, and

we ignore edges which cross from the right to the left.

So that is the edges that cross the cut are those with tail in A and head in B.

So referring to our two pictures, our two corrections of cuts for the underrated

one all three of these blue edges would be the edges crossing the cut AB.

because they're the ones that have one end point on the left side and

one end point on the right side.

Now for the directed one, we only have two crossing edges.

9:36

Recall what is the definition of a cut, it's just a way to group

the vertices into two sets A and B, both should also be not empty.

So we have N vertices and essentially we have one binary degree of freedom for

each, for each vertex, we can decide whether or not it goes in set A or

it goes in set B, so two choices for each of the N vertices, that gives us a two

to the N possible choices, two to the N possible cuts overall.

Now that's slightly incorrect because we call that a cut.

You can't have a non empty set A or a non empty set B, so

those are two of the two to the N options which are disallowed.

So strictly speaking the number is two to the N minus two, but

two to the N is certainly the closest answer of the four provided.

10:17

Now, the minimum cut problem is exactly what you'd think it would be.

I give you as input a graph and among these exponentially, many cuts,

I want you to identify one for me with the fewest number of crossing edges.

So a few quick comments, so first of all the name for this cut is a min cut.

A min cut is one with the fewest number of crossing edges.

Secondly, to clarify, I am going to allow in the input what's called parallel edges.

There will be lots of applications where parallel edges are sort of pointless, but

for minimum cut actually it's natural to allow parallel edges.

And that means you have two edges that correspond to exactly the same

pair of vertices.

11:00

Finally, the more seasoned programmers among you are probably wondering what I

mean by, you're given the graph as input.

You might be wondering about how exactly that's represented, so

the next video's going to discuss exactly that,

the popular ways of representing graphs and how you're usually going to do it in

this course, specifically via what's called an adjacency list.

12:07

And I'll leave it for

you to check that there's no other edge that has as few as two edges.

Now in this case, we got a very balanced split when we took the minimum cut.

In general, that need not be true.

Sometimes even a single vertex can define the minimum cut of a graph, and

I encourage you to think about a concrete example that proves that.

So why should you care about computing the minimum cut?

Well, this is one problem among a genre called graph partitioning,

where you're given a graph and you want to break it into two or more pieces.

And these kinds of graph partitioning problem comes up all the time,

in a surprisingly diverse array of applications.

So let me just mention a couple at a high level.

So one very obvious one when your graph is representing its physical network,

when identifying something like a min cut allows you to do,

is identify weaknesses in your network.

Perhaps it's your own network, and

you want to understand where you soup of the infrastructure because it's,

in some sense, a hot spot of your network or a weak point.

Or, maybe there's someone else's network and

you want to know where the weak spot in their network.

In fact, there are some declassified documents about 15 years ago or so.

Which showed that the United States and Soviet Union militaries,

back during the Cold War, were actually quite interested in computing

minimum cuts, because they were looking for things like, for example, what's

the most efficient way to disrupt the other country's transportation network?

13:37

So the question is among the huge graph,

say the graph of everybody who is on Facebook or something like that.

How can you identify small pockets of people that seem tightly knit,

that seem closely related,

from which you like to infer that there are community of some sort?

Maybe they all go to the same school, maybe they all have the same interest,

maybe they're part of the same biological family whatever.

Now, it's to some degree still an open question how to best define communities

and social networks.

But as a quick and dirty sort of first order heuristic,

you can imagine looking for small regions, which on the one hand, are highly

interconnected among themselves, but quite weakly connected to the rest of the graph.

14:43

And there's a graph, which is very natural to define, given a 2D array of pixels.

Namely, you have an edge between two pixels if they are neighboring.

So for two pixels that are immediately next to each other left and

right or top to bottom, you put an edge there.

So that gives you what's called a grid graph.

And now unlike the basic minimum cut problem that we're talking about here,

in image segmentation it's most natural to use edge weights.

Where the weight of an edge is basically how likely you expect those two

pixels to be coming from a common object.

15:14

Why might you're expect to enabling pixels to come from the same object,

well perhaps their color maps were almost exactly the same and

you just expected that they're part of the same thing.

So once you've defined the screen graph which suitable edge ways now you run

a graph partitioning or maybe cut type separate team, and the hope is that

the cut that it identifies rips off one of the contiguous objects in the picture.

And then you do that a few times and

you get the major objects in the given picture.