AGI-P: A Gender Identification Framework for Authorship Analysis Using Customized Fine-Tuning of Multilingual Language Model

Sarwar, Raheem ORCID: https://orcid.org/0000-0002-0640-807X, An Ha, Le, Teh, Pin Shen ORCID: https://orcid.org/0000-0002-0607-2617, Sabah, Fahad, Nawaz, Raheel ORCID: https://orcid.org/0000-0001-9588-0052, Hameed, Ibrahim A and Hassan, Muhammad Umair ORCID: https://orcid.org/0000-0001-7607-5154 (2024) AGI-P: A Gender Identification Framework for Authorship Analysis Using Customized Fine-Tuning of Multilingual Language Model. IEEE Access, 12. pp. 15399-15409.

Preview

Published Version
Available under License Creative Commons Attribution.
Download (1MB) | Preview

Official URL: http://dx.doi.org/10.1109/access.2024.3358199

Abstract

In this investigation, we propose a solution for the author’s gender identification task called AGI-P. This task has several real-world applications across different fields, such as marketing and advertising, forensic linguistics, sociology, recommendation systems, language processing, historical analysis, education, and language learning. We created a new dataset to evaluate our proposed method. The dataset is balanced in terms of gender using a random sampling method and consists of 1944 samples in total. We use accuracy as an evaluation measure and compare the performance of the proposed solution (AGI-P) against state-of-the-art machine learning classifiers and fine-tuned pre-trained multilingual language models such as DistilBERT, mBERT, XLM-RoBERTa, and Multilingual DEBERTa. In this regard, we also propose a customized fine-tuning strategy that improves the accuracy of the pre-trained language models for the author gender identification task. Our extensive experimental studies reveal that our solution (AGI-P) outperforms the well-known machine learning classifiers and fine-tuned pre-trained multilingual language models with an accuracy level of 92.03%. Moreover, the pre-trained multilingual language models, fine-tuned with the proposed customized strategy, outperform the fine-tuned pre-trained language models using an out-of-the-box fine-tuning strategy. The codebase and corpus can be accessed on our GitHub page at: https://github.com/mumairhassan/AGI-P

Item Type:	Article (Article)
Peer-reviewed:	Yes
Date Deposited:	05 Feb 2024 11:54
Publisher:	Institute of Electrical and Electronics Engineers (IEEE)
Additional Information:	This is an open access article published in IEEE Access, by Institute of Electrical and Electronics Engineers.
Divisions:	Organisation > Business and Law
Subject terms:	08 Information and Computing Sciences, 09 Engineering, 10 Technology
URI:	https://e-space.mmu.ac.uk/id/eprint/633837
DOI:	https://doi.org/10.1109/access.2024.3358199
e-ISSN	2169-3536

Impact and Reach

Statistics

DownloadsShow export options

Activity Overview

6 month trend

176Downloads

6 month trend

137Hits

Additional statistics for this dataset are available via IRStats2.

Altmetric

Repository staff only

Edit record