Manchester Metropolitan University's Research Repository

    An Empirical Performance Evaluation of Semantic-Based Similarity Measures in Microblogging Social Media

    Alnajran, Noufa, Crockett, Keeley, McLean, David and Latham, Annabel (2018) An Empirical Performance Evaluation of Semantic-Based Similarity Measures in Microblogging Social Media. In: Fifth IEEE/ACM International Conference on Big Data Computing, Applications and Technologies (BDCAT 2018), 17 December 2018 - 20 December 2018, Zurich.

    Accepted Version
    Download (1MB) | Preview


    Measuring textual semantic similarity has been a subject of intense discussion in NLP and AI for many years. A new area of research has emerged that applies semantic similarity measures within Twitter. However, the development of these measures for the semantic analysis of tweets imposes fundamental challenges. The sparsity, ambiguity, and informality present in social media are hampering the performance of traditional textual similarity measures as “tweets”, have special syntactic and semantic characteristics. This paper reviews and evaluates the performance of topological, statistical, and hybrid similarity measures, in the context of Twitter analysis. Furthermore, the performance of each measure is compared against a naïve keyword-based similarity computation method to assess the significance of semantic computation in capturing the meaning in tweets. An experiment is designed and conducted to evaluate the different measures through examining various metrics, including correlation, error rates, and statistical tests on a benchmark dataset. The potential weaknesses of semantic similarity measures in relation to Twitter applications of textual similarity assessment and the research contributions are discussed. This research highlights challenges and potential improvement areas for the semantic similarity of tweets, a resource for researchers and practitioners.

    Impact and Reach


    Activity Overview
    6 month trend
    6 month trend

    Additional statistics for this dataset are available via IRStats2.

    Repository staff only

    Edit record Edit record