Sometimes when looking at a set of data, the number of clusters isn’t distinct. This is where agglomerativehierarchical clustering is useful.
Imagine that you just launched a new product and you don’t know how the customers are using it. You want to study this, but because it is a new product, you don’t have much knowledge on customer behavior.
Instead of being told how many clusters there should be, hierarchical clustering seeks the appropriate number of clusters first. By starting with a collection of unclustered data or observations, the points are then merged one by one until all observations are in the same cluster.
This is done by taking the nearest clusters in each step and grouping them together.
For example, imagine a hundred people spread out on a football field. At the beginning, they each stand by themselves, essentially meaning there are a hundred clusters (each with one person). The two people closest to each other are then asked to move together, which reduces the number of clusters to ninety-nine (one cluster has two people).
Eventually, if you repeat this instruction enough times, there will be single cluster with a hundred people. The same approach is used for customers when studying purchasing habits.
Each merging is then studied to determine the right number of clusters for the issue at hand. This is done by examining how far away the clusters were when they merged. On the football field, the first people clustered together were close already, but after a few more times merging with groups, the distance between the different clusters will grow.
You may have a large cluster at one end of the field, and another at the far end.
This system lets the data scientist decide how many clusters to use after establishing how far is too far for clusters to merge with each other. In marketing terms, this means it creates groups of different types of customers, but focuses on deciding how similar they have to be to target in the same way.
For example, you might target an ad for a barbeque at young families living in rural areas and older couples in a suburb because they share the attribute of having outdoor space for cooking. However, you would not target the ad to young families or older couples living in an apartment building.
Agglomerative hierarchical clustering does not require initial centroids or prior knowledge of the number of clusters, which reduces the amount of analysis required on the data before using this approach.
Although we won’t get into them all here, there are different techniques to choose from, depending on what is appropriate for the data set. This makes it more flexible and provides an opportunity to explore and find the appropriate algorithm. On the downside, the algorithm cannot handle large data sets, which makes it less useful for some tasks.
If you want to read all the related articles on the topic of AI algorithms, here is the list of all blog posts in this article series: