Hybrid deep learning model for sarcasm detection in Indian indigenous language using word-emoji embeddings

Kumar, Akshi ORCID: https://orcid.org/0000-0003-4263-7168, Sangwan, Saurabh Raj, Singh, Adarsh Kumar and Wadhwa, Gandharv (2023) Hybrid deep learning model for sarcasm detection in Indian indigenous language using word-emoji embeddings. ACM Transactions on Asian and Low-Resource Language Information Processing, 22 (5). pp. 1-20. ISSN 2375-4699

Preview

Accepted Version
Available under License In Copyright.
Download (736kB) | Preview

Official URL: https://dl.acm.org/doi/10.1145/3519299

Abstract

Automated sarcasm detection is deemed as a complex natural language processing task and extending it to a morphologically-rich and free-order dominant indigenous Indian language Hindi is another challenge in itself. The scarcity of resources and tools such as annotated corpora, lexicons, dependency parser, Part-of-Speech tagger and benchmark datasets engorge the linguistic challenges of sarcasm detection in low-resource languages like Hindi. Furthermore, as context incongruity is imperative to detect sarcasm, various linguistic, aural and visual cues can be used to predict target utterance as sarcastic. While pre-trained word embeddings capture the meanings, semantic relationships and different types of contexts in the form of word representations, emojis can also render useful contextual information, analogous to human facial expressions, for gauging sarcasm. Thus, the goal of this research is to demonstrate the use of a hybrid deep learning model trained using two embeddings, namely word and emoji embeddings to detect sarcasm. The model is validated on a Hindi tweets dataset, Sarc-H, manually annotated with sarcastic and non-sarcastic labels. The preliminary results clearly depict the importance of using emojis for sarcasm detection, with our model attaining an accuracy of 97.35% with an F-score of 0.9708. The research validates that automated feature engineering facilitates efficient and repeatable predictive model for detecting sarcasm in indigenous, low-resource languages.

Item Type:	Article
Peer-reviewed:	Yes
Date Deposited:	22 Aug 2022 08:41
Publisher:	Association for Computing Machinery (ACM)
Additional Information:	© ACM 2022. This is the author's version of the work. It is posted here for your personal use. Not for redistribution. The definitive Version of Record was published in ACM Transactions on Asian and Low-Resource Language Information Processing, http://dx.doi.org/10.1145/3519299.
Divisions:	Faculties > Science and Engineering
URI:	https://e-space.mmu.ac.uk/id/eprint/630278
DOI:	https://doi.org/10.1145/3519299
ISSN	2375-4699
e-ISSN	2375-4702

Impact and Reach

Statistics

DownloadsShow export options

Activity Overview

6 month trend

1,027Downloads

6 month trend

200Hits

Additional statistics for this dataset are available via IRStats2.

Altmetric

Repository staff only

Edit record