Translator attribution for Arabic using machine learning

Mohamed, Emad, Sarwar, Raheem ORCID: https://orcid.org/0000-0002-0640-807X and Mostafa, Sayed (2023) Translator attribution for Arabic using machine learning. Digital Scholarship in the Humanities, 38 (2). pp. 658-666. ISSN 2055-7671

Preview

Accepted Version
Available under License In Copyright.
Download (300kB) | Preview

Official URL: https://academic.oup.com/dsh/advance-article/doi/1...

Abstract

Given a set of target language documents and their translators, the translator attribution task aims at identifying which translator translated which documents. The attribution and the identification of the translator’s style could contribute to fields including translation studies, digital humanities, and forensic linguistics. To conduct this investigation, firstly, we develop a new corpus containing the translations of world-famous books into Arabic. We then pre-process the books in our corpus which mainly involves cleaning irrelevant material, morphological segmentation analysis of words, and devocalization. After pre-processing the books, we propose to use 100 most frequent words and/or morphologically segmented function words as writing style markers of the translators (i.e. stylometric features) to differentiate between translations of different translators. After the completion of features extraction process, we applied several supervised and unsupervised machine-learning algorithms along with our novel cluster-to-author index to perform this task. We found that the translators are not invisible, and morphological analysis may not be more useful than just using the 100 most frequent words as features. The support vector machine linear kernel algorithm reported 99% classification accuracy. Similar findings were reported by the unsupervised machine-learning methods, namely, K-mean clustering and hierarchical clustering.

Item Type:	Article (Article)
Peer-reviewed:	Yes
Date Deposited:	19 Oct 2022 08:10
Publisher:	Oxford University Press (OUP)
Additional Information:	This is a pre-copyedited, author-produced version of an article accepted for publication in Digital Scholarship in the Humanities following peer review. The version of record Emad Mohamed, Raheem Sarwar, Sayed Mostafa, Translator attribution for Arabic using machine learning, Digital Scholarship in the Humanities, 2022;, fqac054 is available online at: https://academic.oup.com/dsh/advance-article/doi/10.1093/llc/fqac054/6760698, https://doi.org/10.1093/llc/fqac054
Divisions:	Organisation > Business and Law
URI:	https://e-space.mmu.ac.uk/id/eprint/630544
DOI:	https://doi.org/10.1093/llc/fqac054
ISSN	2055-7671
e-ISSN	2055-768X

Impact and Reach

Statistics

DownloadsShow export options

Activity Overview

6 month trend

129Downloads

6 month trend

177Hits

Additional statistics for this dataset are available via IRStats2.

Altmetric

Repository staff only

Edit record