DM 352 Syllabus | DM 552 Syllabus
last updated 23-Jun-2021
Produces a set of nested clusters organized as a hierarchical tree
Can be visualized as a dendrogram
Do not have to assume any particular number of clusters
They may correspond to meaningful taxonomies
Two main types of hierarchical clustering
- Start with the points as individual clusters
- At each step, merge the closest pair of clusters until only one cluster (or k clusters) left
- Start with one, all-inclusive cluster
- At each step, split a cluster until each cluster contains an individual point (or there are k clusters)
Traditional hierarchical algorithms use a similarity or distance matrix
Most popular hierarchical clustering technique
Basic algorithm is straightforward
Key operation is the computation of the proximity of two clusters.
Different approaches to defining the distance between clusters distinguish the different algorithms
After some merging we can have
Intermediate situation--we want to merge the two closet clusters, C2 and C5, then update the proximity matrix.
How to update the proximity matrix?
Distance Between Centroids
Other methods driven by an objective function: e.g., Ward’s Method uses squared error
Proximity of two clusters is based on the two closest points in the different clusters. Determined by one pair of points, i.e., by one link in the proximity graph
Handling of non-elliptical shapes
Proximity of two clusters is based on the two farthest points of the closest clusters. Determined by one pair of points, i.e., by one link in the proximity graph.
Strength: less susceptible to noise and outliers
Limitation of MAX
Proximity of two clusters is the average of pairwise proximity between points in the two clusters.
Need to use average connectivity for scalability since total proximity favors large clusters
The above is actually Ward's Method. Below is the average
This is a compromise between Single and Complete Link
Strengths: Less susceptible to noise and outliers
Limitations: Biased towards globular clusters
Similarity of two clusters is based on the increase in squared error when two clusters are merged
Less susceptible to noise and outliers
Biased towards globular clusters
Hierarchical analog of K-means
O(N2) space since it uses the proximity matrix, where N is the number of points.
O(N3) time in many cases
Once a decision is made to combine two clusters, it cannot be undone.
No global objective function is directly minimized
Different schemes have problems with one or more of the following: