YgorClustering

Goofy logo.

About

YgorClustering is a header-only C++ implementation of the DBSCAN clustering algorithm. The implementation is specifically based on the article "A Density-Based Algorithm for Discovering Clusters" by Martin Ester, Hans-Peter Kriegel, Jörg Sander, and Xiaowei Xu in 1996. DBSCAN is generally regarded as a reliable clustering technique compared with techniques such as k-means (which is, for example, unable to cluster concave clusters).

Implementations of other clustering techniques are planned. At the moment only (vanilla) DBSCAN is implemented. It uses Boost.Geometry R*-trees for fast indexing. It can cluster 20 million 2D datum in around an hour, or 20 thousand datum in seconds.

Download

The source is available here and is released under a GPLv3 or later license. Please send questions or comments to . Or, even better, send a pull request ☺.

Examples

The included test programs perform clustering on various types of data. The fourth example uses Boost.Filesystem and Boost.DateTime to cluster a collection of photos based on the modification time. It can detect clusters of photos including vacations, rapid-fire photos, or multi-year photo-taking behaviour depending on the choice of tuning parameters.

The below video shows a simple 2D clustering example using random data. Different types of clusters are detected when the tuning parameters are tweaked; the video transitions between epsilon 4.5 and 6.0 (arbitrary spatial units).