Kumar, Akshi ORCID: https://orcid.org/0000-0003-4263-7168, Sangwan, Saurabh Raj, Singh, Adarsh Kumar and Wadhwa, Gandharv (2023) Hybrid deep learning model for sarcasm detection in Indian indigenous language using word-emoji embeddings. ACM Transactions on Asian and Low-Resource Language Information Processing, 22 (5). pp. 1-20. ISSN 2375-4699
|
Accepted Version
Available under License In Copyright. Download (736kB) | Preview |
Abstract
Automated sarcasm detection is deemed as a complex natural language processing task and extending it to a morphologically-rich and free-order dominant indigenous Indian language Hindi is another challenge in itself. The scarcity of resources and tools such as annotated corpora, lexicons, dependency parser, Part-of-Speech tagger and benchmark datasets engorge the linguistic challenges of sarcasm detection in low-resource languages like Hindi. Furthermore, as context incongruity is imperative to detect sarcasm, various linguistic, aural and visual cues can be used to predict target utterance as sarcastic. While pre-trained word embeddings capture the meanings, semantic relationships and different types of contexts in the form of word representations, emojis can also render useful contextual information, analogous to human facial expressions, for gauging sarcasm. Thus, the goal of this research is to demonstrate the use of a hybrid deep learning model trained using two embeddings, namely word and emoji embeddings to detect sarcasm. The model is validated on a Hindi tweets dataset, Sarc-H, manually annotated with sarcastic and non-sarcastic labels. The preliminary results clearly depict the importance of using emojis for sarcasm detection, with our model attaining an accuracy of 97.35% with an F-score of 0.9708. The research validates that automated feature engineering facilitates efficient and repeatable predictive model for detecting sarcasm in indigenous, low-resource languages.
Impact and Reach
Statistics
Additional statistics for this dataset are available via IRStats2.