In some pattern recognition problems, the training data consists of a set of input vectors x, without any corresponding target values. Unsupervised clustering attempts to group similar data points into clusters to determine how the data is distributed in the space, known as density estimation. This is the core of unsupervised learning.
Unsupervised Clustering DefinedUnsupervised clustering is an unsupervised learning process in which data points are put into clusters to determine how the data is distributed in space. This density estimation allows the algorithm to label and classify data, which is what powers unsupervised learning algorithms. There are four common unsupervised clustering algorithms: k-means clustering, fuzzy k-means clustering, hierarchical clustering and mixture of gaussians.
In mathematical terms, for n-sampled space x1 to xn, true class labels are not provided for each sample. That’s why it’s known as learning without a teacher. As a result, unsupervised clustering enables AI to learn without human supervision.
What Is Unsupervised Clustering?Clustering can be considered the most important unsupervised learning problem. It deals with finding a structure in a collection of unlabeled data. A loose definition of clustering could be the process of organizing objects into groups whose members are similar in some way. A cluster is therefore a collection of objects that are similar between them and are dissimilar to the objects belonging to other clusters.
Distance-Based ClusteringGiven a set of points, with a notion of distance between points, grouping the points into some number of clusters, such that:
More on AIWhat Is Artificial Intelligence (AI)?
How Unsupervised Clustering WorksThe goal of clustering is to determine the internal grouping in a set of unlabeled data. But how do you decide what constitutes good clustering? There is no absolute “best” criterion that would be independent of the final aim of the clustering. Consequently, it is the user who should supply this criterion, in such a way that the result of the clustering will suit their needs.
To find a particular clustering solution, we need to define the similarity measures for the clusters.
Proximity MeasuresFor clustering, we need to define a proximity measure for two data points. Proximity here means how similar/dissimilar the samples are with respect to each other.
There are various similarity measures that can be used.
A good proximity measure is application dependent. The clusters should be invariant under the transformations natural to the problem. Also, while clustering, it is not advised to normalize data that are drawn from multiple distributions.
Types of Unsupervised Clustering AlgorithmsUnsupervised clustering algorithms are classified into four different types:
In exclusive clustering, data are grouped in an exclusive way, so that if a certain data point belongs to a definite cluster, then it could not be included in another cluster. A simple example is a graph where the separation of data points is achieved by a straight line on a bi-dimensional plane.
Overlapping clustering uses fuzzy sets to cluster data, so that each point may belong to two or more clusters with different degrees of membership. In this case, data will be associated with an appropriate membership value.
A hierarchical clustering algorithm is based on the union between the two nearest clusters. The beginning condition is realized by setting every data point as a cluster. After a few iterations it reaches the final clusters wanted.
Finally, the last kind of clustering uses a completely probabilistic approach.
We will talk about four of the most used clustering algorithms:
Each of these algorithms belongs to one of the clustering types listed above. K-means is an exclusive clustering algorithm, fuzzy K-means is an overlapping clustering algorithm, hierarchical clustering is obvious and lastly, mixture of gaussians is a probabilistic clustering algorithm.
1. K-Means ClusteringK-means is one of the simplest unsupervised learning algorithms that solves the well known clustering problem. The procedure follows a simple and easy way to classify a given data set through a certain number of clusters, assume k clusters, fixed a priori. The main idea is to define k centers, one for each cluster. These centroids should be placed in a smart way, as different locations will create different results. So, the best choice is to place them as far away from each other as possible.
The next step is to take each point belonging to a given data set and associate it to the nearest centroid. When no point is pending, the first step is completed and an early groupage is done. At this point we need to recalculate k new centroids as barycenters of the clusters resulting from the previous step. After we have these k new centroids, a new binding has to be done between the same data set points and the nearest new centroid. A loop has been generated. As a result of this loop, we may notice that the k centroids change their location step by step until no more changes are done. In other words centroids do not move any more.
Finally, this algorithm aims at minimizing an objective function, in this case a squared error function. The objective function
Objective function equation. | Image: Sanatan MishraWhere:
Objective function equation defined. | Image: Sanatan MishraIs a chosen distance measure between a data point xi and the cluster center cj, is an indicator of the distance of the n data points from their respective cluster centers.
The algorithm is composed of the following steps:
Where, ‘ci’ represents the number of data points in ith cluster.
Although it can be proved that the procedure will always terminate, the k-means algorithm does not necessarily find the most optimal configuration, corresponding to the global objective function minimum. The algorithm is also significantly sensitive to the initial randomly selected cluster centers. The k-means algorithm can be run multiple times to reduce this effect.
K-means is a simple algorithm that has been adapted to many problem domains. As we are going to see, it is a good candidate for extension to work with fuzzy feature vectors.
The k-means procedure can be viewed as a greedy algorithm for partitioning the n samples into k clusters so as to minimize the sum of the squared distances to the cluster centers. It does have some weaknesses:
This last problem is particularly troublesome, since we often have no way of knowing how many clusters exist. In the example shown above, the same algorithm applied to the same data produces the following three-means clustering. Is it better or worse than the two-means clustering?
Unfortunately, there is no general theoretical solution to find the optimal number of clusters for any given data set. A simple approach is to compare the results of multiple runs with different k classes and choose the best one according to a given criterion, but we need to be careful because increasing k results in smaller error function values by definition, but also increases the risk of overfitting.
2. Fuzzy K-Means ClusteringIn fuzzy clustering, each point has a probability of belonging to each cluster, rather than completely belonging to just one cluster as is the case in the traditional k-means. Fuzzy k-means specifically tries to deal with the problem where points are somewhat in between centers, or otherwise ambiguous, by replacing distance with probability. This could be some function of distance, such as having probability relative to the inverse of the distance.
Fuzzy k-means uses a weighted centroid based on those probabilities. Processes of initialization, iteration, and termination are the same as the ones used in k-means. The resulting clusters are best analyzed as probabilistic distributions rather than a hard assignment of labels. K-means is a special case of fuzzy k-means when the probability function used is simply one, if the data point is closest to a centroid, and zero otherwise.
The fuzzy k-means algorithm is the following:
In the fuzzy k-means approach, instead, a given data point does not belong exclusively to a well defined cluster, but it can be placed in the middle. In this case, the membership function follows a smoother line to indicate that every data point may belong to several clusters with different extent of membership.
3. Hierarchical Clustering AlgorithmsGiven a set of N items to be clustered, and an N*N distance (or similarity) matrix, the basic process of hierarchical clustering is this:
There’s another way to deal with clustering problems: a model-based approach, which consists in using certain models for clusters and attempting to optimize the fit between the data and the model.
In practice, each cluster can be mathematically represented by a parametric distribution, like a Gaussian. The entire data set is modeled by a mixture of these distributions. A mixture model with high likelihood tends to have the following traits:
Main advantages of model-based clustering:
The most widely used clustering method of this kind is based on learning a mixture of Gaussians:
A mixture model is a mixture of k component distributions that collectively make a mixture distribution f(x):
Mixture of gaussians equation. | Image: Sanatan MishraThe αk represents the contribution of the kth component in constructing f(x). In practice, parametric distribution (e.g. gaussians), are often used since a lot work has been done to understand their behavior. If you substitute each fk(x) for a gaussian you get what is known as a gaussian mixture models (GMM).
The Expectation-Maximization AlgorithmExpectation-maximization assumes that your data is composed of multiple multivariate normal distributions. This is a very strong assumption, in particular, when you fix the number of clusters. EM is an algorithm for maximizing a likelihood function when some of the variables in your model are unobserved, i.e., when you have latent variables.
You might ask if we’re just trying to maximize a function, why don’t we just use the existing machinery for maximizing a function. Well, if you try to maximize this by taking derivatives and setting them to zero, you find that in many cases the first-order conditions don’t have a solution. There’s a chicken-and-egg problem in that to solve for your model parameters you need to know the distribution of your unobserved data, but the distribution of your unobserved data is a function of your model parameters.
Expectation-maximization tries to get around this by iteratively guessing a distribution for the unobserved data, then estimating the model parameters by maximizing something that is a lower bound on the actual likelihood function, and repeating until convergence:
The expectation-maximization algorithm:
There are a number of problems with clustering. Among them:
More on Machine LearningC-Means Clustering Explained
Unsupervised Clustering ApplicationsClustering algorithms can be applied in many fields, for instance:
Clustering is an unsupervised learning process. It attempts to group similar data points into clusters to determine how the data is distributed in the space. Those data groups become labeled, which is the core of unsupervised learning.
Is k-means clustering unsupervised?K-means clustering is an unsupervised clustering algorithm. The algorithm establishes k centroids for each data cluster. Each point is associated with the nearest centroid. When no point is pending, the first step is completed and an early grouping is done. At this point, new k centroids are established and the process continues. K-means is a good candidate for extension to work with fuzzy feature vectors.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4