Clustering

Application

1. Gaussian Mixture

2. K-Means

Steps

  1. Place the centroids randomly
  2. Label the instances according to the distance to the centroids
  3. Update the centroids
  4. Repeat step 2 and 3 until the centroids stop moving

Cons

* Prefer clusters of similar sizes
* Does not behave very well when the clusters have varying sizes, different densities, or nonspherical shapes

Finding the Optimal Number of Clusters

Mini-Batch K-Means

Semi-Supervised Learning

3. DBSCAN

4. Agglomerative Clustering

5. BIRCH

6. Mean-Shift

Affinity Propagation

7. Spectral Clustering

8. Ordering Points To Identify the Clustering Structure (OPTICS)

Clustering performance evaluation

1. Adjusted Rand index

2. Mutual Information based scores

3. Homogeneity, completeness and V-measure

4. Fowlkes-Mallows scores

5. Silhouette Coefficient

6. Calinski-Harabasz Index

7. Davies-Bouldin Index

8. Contingency Matrix

Reference