YgorClustering: C++ header-only DBSCAN

Hal Clark

About

YgorClustering is a header-only C++ library that implements the DBSCAN clustering algorithm. The implementation is specifically based on the article “A Density-Based Algorithm for Discovering Clusters” by Martin Ester, Hans-Peter Kriegel, Jörg Sander, and Xiaowei Xu in 1996.

DBSCAN is generally regarded as a reliable clustering technique compared with techniques such as k-means (which is, for example, unable to cluster concave clusters).

Implementations of other clustering techniques are planned. At the moment only (vanilla) DBSCAN is implemented. It uses Boost.Geometry R*-trees for fast indexing. With a single CPU core it can cluster 20 million 2D datum in around an hour, or 20 thousand datum in seconds.

YgorClustering is actively maintained as part of the DICOMautomaton suite.

Download

The source is available here and is released under a GPLv3 or later license.

Examples

The included test programs perform clustering on various types of data. The fourth example uses Boost.Filesystem and Boost.DateTime to cluster a collection of photos based on the modification time. It can detect clusters of photos including vacations, rapid-fire photos, or multi-year photo-taking behaviour depending on the choice of tuning parameters.

The below video shows a simple 2D clustering example using random data. Different types of clusters are detected when the tuning parameters are tweaked; the video transitions between epsilon 4.5 and 6.0 (arbitrary spatial units).

Feedback

Please send questions, comments, or pull requests here.