Shardlow, Matthew ORCID: https://orcid.org/0000-0003-1129-2750, Gerber, Luciano ORCID: https://orcid.org/0000-0002-8423-4642 and Nawaz, Raheel ORCID: https://orcid.org/0000-0001-9588-0052 (2022) One emoji, many meanings: A corpus for the prediction and disambiguation of emoji sense. Expert Systems with Applications, 198. ISSN 0957-4174
|
Published Version
Available under License Creative Commons Attribution. Download (789kB) | Preview |
Abstract
In this work, we uncover a hidden linguistic property of emoji, namely that they are polysemous and can be used to form a semantic network of emoji meanings. Our key contributions to this direction of study are as follows: (1) We have developed a new corpus to help in the task of emoji sense prediction. This corpus contains tweets with single emojis, where each emoji has been labelled with an appropriate sense identifier from WordNet. (2) Experiments, which demonstrate that it is possible to predict the sense of an emoji using our corpus to a reasonable level of accuracy. We are able to report an average path-similarity score of 0.4146 for our best emoji sense prediction algorithm. (3) We further show that emoji sense is a useful feature in the emoji prediction task, where we report an accuracy of 58.8816 and macro-F1 score of 46.6640, beating reasonable baselines in this task. Our work demonstrates that importance of considering the meaning behind emoji, rather than ignoring them, or simply treating them as extra wordforms.
Impact and Reach
Statistics
Additional statistics for this dataset are available via IRStats2.