Manchester Metropolitan University's Research Repository

    Hybrid deep learning model for sarcasm detection in Indian indigenous language using word-emoji embeddings

    Kumar, Akshi ORCID logoORCID: https://orcid.org/0000-0003-4263-7168, Sangwan, Saurabh Raj, Singh, Adarsh Kumar and Wadhwa, Gandharv (2023) Hybrid deep learning model for sarcasm detection in Indian indigenous language using word-emoji embeddings. ACM Transactions on Asian and Low-Resource Language Information Processing, 22 (5). pp. 1-20. ISSN 2375-4699

    Accepted Version
    Download (736kB) | Preview


    Automated sarcasm detection is deemed as a complex natural language processing task and extending it to a morphologically-rich and free-order dominant indigenous Indian language Hindi is another challenge in itself. The scarcity of resources and tools such as annotated corpora, lexicons, dependency parser, Part-of-Speech tagger and benchmark datasets engorge the linguistic challenges of sarcasm detection in low-resource languages like Hindi. Furthermore, as context incongruity is imperative to detect sarcasm, various linguistic, aural and visual cues can be used to predict target utterance as sarcastic. While pre-trained word embeddings capture the meanings, semantic relationships and different types of contexts in the form of word representations, emojis can also render useful contextual information, analogous to human facial expressions, for gauging sarcasm. Thus, the goal of this research is to demonstrate the use of a hybrid deep learning model trained using two embeddings, namely word and emoji embeddings to detect sarcasm. The model is validated on a Hindi tweets dataset, Sarc-H, manually annotated with sarcastic and non-sarcastic labels. The preliminary results clearly depict the importance of using emojis for sarcasm detection, with our model attaining an accuracy of 97.35% with an F-score of 0.9708. The research validates that automated feature engineering facilitates efficient and repeatable predictive model for detecting sarcasm in indigenous, low-resource languages.

    Impact and Reach


    Activity Overview
    6 month trend
    6 month trend

    Additional statistics for this dataset are available via IRStats2.


    Repository staff only

    Edit record Edit record