Publica
Hier finden Sie wissenschaftliche Publikationen aus den FraunhoferInstituten. Contraction clustering (raster): A big data algorithm for densitybased clustering in constant memory and linear time
 Nicosia, G.: Machine learning, optimization, and big data. Third International Conference, MOD 2017 : Volterra, Italy, September 1417, 2017; Revised selected papers Cham: Springer International Publishing, 2018 (Lecture Notes in Computer Science 10710) ISBN: 9783319729251 (Print) ISBN: 9783319729268 (Online) S.6375 
 International Workshop on Machine Learning, Optimization, and Big Data (MOD) <3, 2017, Volterra> 

 Englisch 
 Konferenzbeitrag 
 Fraunhofer FCC () 
Abstract
Clustering is an essential data mining tool for analyzing and grouping similar objects. In big data applications, however, many clustering methods are infeasible due to their memory requirements or runtime complexity. Open image in new window (Raster) is a lineartime algorithm for identifying densitybased clusters. Its coefficient is negligible as it depends neither on input size nor the number of clusters. Its memory requirements are constant. Consequently, Raster is suitable for big data applications where the size of the data may be huge. It consists of two steps: (1) a contraction step which projects objects onto tiles and (2) an agglomeration step which groups tiles into clusters. Our algorithm is extremely fast. In singlethreaded execution on a contemporary workstation, it clusters ten million points in less than 20 s—when using a slow interpreted programming language like Python. Furthermore, Raster is easily parallelizable.