Scalable Hierarchical Clustering: Twister Tries with a Posteriori Trie Elimination

Exact method for Agglomerative Hierarchical Clustering (AHC) with average linkage do not scale well when the number of items to be clustered is large. The best known algorithms show quadratic behavior and it is generally accepted that this cannot be improved without using specifics of certain metric spaces. Twister tries is an algorithm which produces a dendrogram (i.e, outcome of a hierarchical clustering) which resembles the one produced by AHC, while only needing linear space and time. However, twister tries are sensitive to rare, but possible hash evaluations. These might have a disastrous effect on the final outcome. We propose the use of a metaheuristic to overcome this sensitivity and show how approximate computations of dendrogram quality can help to evaluate the heuristic within reasonable time.

Michael Cochez (De Montfort University), Ferrante Neriy (University of Jyväskylä): Scalable Hierarchical Clustering: Twister Tries with a Posteriori Trie Elimination.

Presented at the 6th IEEE Symposium on Computational Intelligence and Data Mining, 7-10, December, Cape Town

http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=7376688

Share on LinkedInGoogle+Tweet about this on TwitterShare on FacebookEmail to someone