So far, in this week, we talked about dimensional reduction in unsupervised learning. We looked at the PCA as a linear method and then showed how it can be generalized into a nonlinear method using a insulting whether neural network. Now, I would like to talk about another large cluster of Unsupervised Learning methods namely, clustering methods. So, what is clustering and why do you need to know about it. Generally put, clustering is a method of aggression of you data points into relatively homogeneous groups, such that items in the same group are more similar to each other than items belonging in different groups. For finance, your data points can be companies, stocks, bonds, credit card holders, and so on. Clustering methods can be used in many different applications. We can use them for visualization of data and for conceptualizing the resulting clusters. We can also use clusters for compact presentation of data. Similar to dimension reduction. But taking to the extreme of reducing the dimension to just one, which happens for some types of clustering. We can say that clustering or segmentation is one of the most important basic tasks not only in machine learning, but also in human learning. Indeed, one of the first cognitive function that a newborn learns is the ability to distinguish between me and not me. That is the outside world. This can be viewed as a binary clustering where me and not me, correspond to two categories of the data that newborn's brain perceives. So the output of a clustering method should be a discrete label for points in the data that show the clusters to which these points belong. Now, the way how this is done depends on the clustering method used. There exist several dimensions by which we can classify all clustering algorithms. First, there are two categories of clustering: flat clustering and hierarchical clustering. Let me explain what these terms mean. Flat clustering is the simplest form of clustering where a set of data points is partitioned into a relatively imagenious groups. The word flat here means that within each cluster, all points are equal in terms of their informational content. All points in such cluster get the same label which is the label that identifies this cluster. This will be an example of the dimension reduction to just one number. On the other hand, with hierarchical clustering, there is some structure within each cluster, with some sub-clusters and possibly sub sub-clusters and so on. As we will discuss later, many complex systems have hierarchical structure. So hierarchical clustering methods are quite useful in practice. A set of labels for a cluster sub-cluster and so on provides another example of dimension reduction that is implemented by hierarchical clustering methods. Another difference between different types of clustering is in how they treat cluster labels as fixed numbers so that each point can be only in one cluster or as probability so that each point with some probabilities can be in principle in different clusters. The first type of clustering is called hard clustering and the second type is called the soft clustering or probabilistic clustering. Now, these two types of classification or four clustering methods into flat and hierarchical, and hard and soft are not mutually exclusive. Therefore, a clustering method can be simultaneously flat and hard, or hierarchical and soft. In fact, the simplest and the most known clustering method the K-means clustering is exactly a flat and hard clustering method. Let's talk about the K-means clustering next.