Hello and welcome. In this video, we'll be covering more details about hierarchical clustering. Let's get started. Let's look at agglomerative algorithm for hierarchical clustering. Remember that agglomerative clustering is a bottom up approach. Let's say our data set has n data points. First, we want to create n clusters, one for each data point. Then, each point is assigned as a cluster. Next, we want to compute the distance proximity matrix which will be an n by n table. After that, we want to iteratively run the following steps until the specified cluster number is reached, or until there is only one cluster left. First, merge the two nearest clusters. Distances are computed already in the proximity matrix. Second, update the proximity matrix with the new values. We stop after we've reached the specified number of clusters, or there is only one cluster remaining with the result stored in a dendogram. So in the proximity matrix, we have to measure the distances between clusters and also merge the clusters that are nearest. So, the key operation is the computation of the proximity between the clusters with one point and also clusters with multiple data points. At this point, there are a number of key questions that need to be answered. For instance, how do we measure the distances between these clusters and how do we define the nearest among clusters? We also can ask, which points do we use? First, let's see how to calculate the distance between two clusters with one point each. Let's assume that we have a data set of patients and we want to cluster them using hierarchy clustering. So our data points are patients with a featured set of three dimensions. For example, age, body mass index, or BMI and blood pressure. We can use different distance measurements to calculate the proximity matrix. For instance, Euclidean distance. So, if we have a data set of n patience, we can build an n by n dissimilarity distance matrix. It will give us the distance of clusters with one data point. However, as mentioned, we merge clusters in agglomerative clustering. Now the question is, how can we calculate the distance between clusters when there are multiple patients in each cluster? We can use different criteria to find the closest clusters and merge them. In general, it completely depends on the data type, dimensionality of data and most importantly, the domain knowledge of the data set. In fact, different approaches to defining the distance between clusters distinguish the different algorithms. As you might imagine, there are multiple ways we can do this. The first one is called single linkage clustering. Single linkage is defined as the shortest distance between two points in each cluster, such as point a and b. Next up is complete linkage clustering. This time, we are finding the longest distance between the points in each cluster, such as the distance between point a and b. The third type of linkage is average linkage clustering or the mean distance. This means we're looking at the average distance of each point from one cluster to every point in another cluster. The final linkage type to be reviewed is centroid linkage clustering. Centroid is the average of the feature sets of points in a cluster. This linkage takes into account the centroid of each cluster when determining the minimum distance. There are three main advantages to using hierarchical clustering. First, we do not need to specify the number of clusters required for the algorithm. Second, hierarchical clustering is easy to implement. And third, the dendrogram produced is very useful in understanding the data. There are some disadvantages as well. First, the algorithm can never undo any previous steps. So for example, the algorithm clusters two points and later on, we see that the connection was not a good one. The program can not undo that step. Second, the time complexity for the clustering can result in very long computation times in comparison with efficient algorithms such as K-means. Finally, if we have a large data set, it can become difficult to determine the correct number of clusters by the dendrogram. Now, lets compare hierarchical clustering with K-means. K-means is more efficient for large data sets. In contrast to K-means, hierarchical clustering does not require the number of cluster to be specified. Hierarchical clustering gives more than one partitioning depending on the resolution or as K-means gives only one partitioning of the data. Hierarchical clustering always generates the same clusters, in contrast with K-means, that returns different clusters each time it is run, due to random initialization of centroids. Thanks for watching.