Manchester Metropolitan University's Research Repository

    Cluster Analysis of Twitter Data: A Review of Algorithms

    Crockett, KA, Mclean, D, Latham, A and Alnajran, N (2017) Cluster Analysis of Twitter Data: A Review of Algorithms. In: 9th International Conference on Agents and Artificial Intelligence (ICAART), 24 February 2017 - 26 February 2017, Portugal.


    Available under License : See the attached licence file.

    Download (235kB) | Preview
    Official URL: http://www.icaart.org/


    Twitter, a microblogging online social network (OSN), has quickly gained prominence as it provides people with the opportunity to communicate and share posts and topics. Tremendous value lies in automated analysing and reasoning about such data in order to derive meaningful insights, which carries potential opportunities for businesses, users, and consumers. However, the sheer volume, noise, and dynamism of Twitter, imposes challenges that hinder the efficacy of observing clusters with high intra-cluster (i.e. minimum variance) and low inter-cluster similarities. This review focuses on research that has used various clustering algorithms to analyse Twitter data streams and identify hidden patterns in tweets where text is highly unstructured. This paper performs a comparative analysis on approaches of unsupervised learning in order to determine whether empirical findings support the enhancement of decision support and pattern recognition applications. A review of the literature identified 13 studies that implemented different clustering methods. A comparison including clustering methods, algorithms, number of clusters, dataset(s) size, distance measure, clustering features, evaluation methods, and results was conducted. The conclusion reports that the use of unsupervised learning in mining social media data has several weaknesses. Success criteria and future directions for research and practice to the research community are discussed.

    Impact and Reach


    Activity Overview
    6 month trend
    6 month trend

    Additional statistics for this dataset are available via IRStats2.


    Repository staff only

    Edit record Edit record