e-space
Manchester Metropolitan University's Research Repository

An Empirical Performance Evaluation of Semantic-Based Similarity Measures in Microblogging Social Media

Alnajran, Noufa and Crockett, Keeley and McLean, David and Latham, Annabel (2018) An Empirical Performance Evaluation of Semantic-Based Similarity Measures in Microblogging Social Media. In: Fifth IEEE/ACM International Conference on Big Data Computing, Applications and Technologies (BDCAT 2018), 17 December 2018 - 20 December 2018, Zurich.

[img]
Preview

Download (1MB) | Preview

Abstract

Measuring textual semantic similarity has been a subject of intense discussion in NLP and AI for many years. A new area of research has emerged that applies semantic similarity measures within Twitter. However, the development of these measures for the semantic analysis of tweets imposes fundamental challenges. The sparsity, ambiguity, and informality present in social media are hampering the performance of traditional textual similarity measures as “tweets”, have special syntactic and semantic characteristics. This paper reviews and evaluates the performance of topological, statistical, and hybrid similarity measures, in the context of Twitter analysis. Furthermore, the performance of each measure is compared against a naïve keyword-based similarity computation method to assess the significance of semantic computation in capturing the meaning in tweets. An experiment is designed and conducted to evaluate the different measures through examining various metrics, including correlation, error rates, and statistical tests on a benchmark dataset. The potential weaknesses of semantic similarity measures in relation to Twitter applications of textual similarity assessment and the research contributions are discussed. This research highlights challenges and potential improvement areas for the semantic similarity of tweets, a resource for researchers and practitioners.

Impact and Reach

Statistics

Downloads
Activity Overview
4Downloads
66Hits

Additional statistics for this dataset are available via IRStats2.

Actions (login required)

Edit Item Edit Item