Then if we want to merge two clusters, C sub j1 and

C sub j2, then the change in quality of the overall clustering

is one criterion.

So you probably can see this is the merged cluster, your generating new cluster,

and this originally two cluster disappear from the original set of clusters.

So that's the new clustering, that's the original clustering, because if you merge

this, essentially, the sum of the square arrows will be increasing.

So that's the reason you're trying to minimize that increasing.

That's the probability of P(C sub i).

Now, you get the probability of these two merged together, okay.

The you have to minus the original probability, i from 1 to m.

Then the distance between clusters C1 and C2 is, this is the probability of C1 and

C2 together as one cluster, these are the independent clusters.

If this distance is less than zero, we should merge these two clusters.

So that's the idea, or the essence,

of the probabilistic hierarchical clustering algorithm.

[MUSIC]