[SOUND] In this session,

we were introduce Internal Measures for

Clustering Validation.

Remember, we just discussed several kinds of external measures.

The lucky thing for external measure is we do have [INAUDIBLE].

We have expert knowledge, we have a priors.

Unfortunately, in many cases, we do not have [INAUDIBLE].

That's the reason we have to rely on internal measures.

The internal measures based on the concept of clustering.

It means we want the points within the same cluster.

That means intra-cluster, they are very compact.

They are close to each other and

we wanted points in different clusters that means inter-cluster distance.

We want them as far apart as possible.

That means we want them to be far separated.

That's the reason we want to maximizing the intra-cluster compactness and

also maximizing the inter-cluster separation, but

we want a tradeoff between these things.

So let's look at how we define this.

Suppose we give a clustering C with k

clusters, C sub 1to C sub k.

For each cluster C sub i, it contains n sub i number of points.

Then we introduce the function W of S, R,

which is the sum of the weights of all the edges

with one verdicts in S, the other in R.

For example, you'd get a two points.

You say, this point is in this cluster.

The other point is in the other cluster.

You want to look at their weight.

The weight usually can be defined like a euclidean distance or

whatever you are using for clustering.

Then based on this, we can say, the sum of all the intra-cluster

weights over all clusters should be defined this way.

The reason is you can see this is a C sub i, this is a C sub i.

That means both vertices are in the same cluster, so that's why it's inter-cluster.

But at one point A and the other point B, you actually use it will calculate twice,

because A to B, B to A, you're actually calculating twice.

That's why you have to divide this by two.

Okay.

For all the clusters sum up together is i from one to

the k's clusters, then we define the sum of all the inter

cluster weights should be calculated this way.

There, which is very obvious, because one point,

one vertex is in the cluster C sub i and the other one is not in the C sub i.

Simply say, in the other clusters, that's why it's intro cluster.

But since the two points, you calculate it twice,

that's why you have the divide by 2.

Then we sum up all these essentially, it's i from one to k, all the clusters.

We get a Wout, okay?

That's the inter-cluster weights sum them together.

Now we also can see the number of distinct intra-cluster edges,

because for i's cluster if n sub i number of points

there intra-cluster link, you have end points choose two.

That's why you get this formula.

Then get all the clusters one to k, you sum up, that's why you get n sub m.

Then for the inter-cluster N sub out is notated by one

point is in cluster i, the other point is cluster j.

They never are in the same cluster.

So you want to calculate all the edges, among different clusters.

So that's the number of distinct inter-cluster edges.

Then for BetaCV measure is defined as Wi divided by N and

you can think this is the mean intra-cluster distance and

you get Wout divided by number of out,

you can think is a mean inter-cluster distance.

So the BetaCV essentially the ratio of mean inter-cluster

distance versus the mean intra-cluster distance.

Of course, for this measure is the smaller, the better clustering.

Because a smaller means, this is pretty small is quite a compact.

And this is the bigger, it means it's more separated.

That's a BetaCV measure defined, then we look at some other definition.

One popular use one called normalized cut.

Normalized cut is defined as follows.

The normalized cut of this ratio is defined by sum up of all the k clusters.

For every cluster, you calculate in this way or

more explicitly, we can interpret in this way.

This is ways of the inter-cluster.

This is a weights to sum up of, of the intra and inter-cluster weights.

And for this ratio, you actually sum from i from one to k,

you'll get these normalized cut ratio.

Of course, based on this, you want to the inter-cluster

to be far separated, comparing to the intro one.

That's why the bigger normalized cut value, the better the clustering.

Then there's another definition called modularity.

This is especially popular for graft clustering.

It is defined in this way.

You proceed, this part is the same as normalized cut, cut formula.

Okay.

Like this, but this part is the square of this.

Okay.

So that simply says, this one is the observed

the fraction of the weights within clusters.

This is the expected one.

So modularity, actually measured the difference between the observed one and

the expected one.

So the smaller the value, the better the clustering,

because the inter-cluster distance are lower than the expected one.

So this is the modularity definition

used for graph clustering.

[MUSIC]