Mean Shift clustering

Last modified on 01 Oct 2021.

What?

Mean-Shift assigns the data points to the clusters iteratively by shifting points towards the mode (mode is the highest density of data points in the region, in the context of the Meanshift)
Non-Parametric Density Estimation.
The data points are sampled from an underlying PDF (Probability density function)^[ref] .

Data point density implies PDF.
Mean-shift built based on the idea of Kernel Density Estimation.
Mean shift exploits this KDE idea by imagining what the points would do if they all climbed up hill to the nearest peak on the KDE surface. It does so by iteratively shifting each point uphill until it reaches a peak^[ref] .

Points climb to the nearest hill.

Points climb to the nearest hill.

from sklearn.cluster import MeanShift
clustering = MeanShift(bandwidth=2).fit(X)

clustering.fit(X)
clustering.predict(X)

# or
clustering.fit_predict(X)

Components: