e-space
Manchester Metropolitan University's Research Repository

    Crossing linguistic barriers: authorship attribution in Sinhala texts

    Sarwar, Raheem ORCID logoORCID: https://orcid.org/0000-0002-0640-807X, Perera, Maneesha ORCID logoORCID: https://orcid.org/0009-0000-0684-726X, Teh, Pin Shen ORCID logoORCID: https://orcid.org/0000-0002-0607-2617, Nawaz, Raheel ORCID logoORCID: https://orcid.org/0000-0001-9588-0052 and Hassan, Muhammad Umair ORCID logoORCID: https://orcid.org/0000-0001-7607-5154 (2024) Crossing linguistic barriers: authorship attribution in Sinhala texts. ACM Transactions on Asian and Low-Resource Language Information Processing, 23 (5). pp. 1-14. ISSN 2375-4699

    [img]
    Preview
    Accepted Version
    Available under License In Copyright.

    Download (582kB) | Preview

    Abstract

    Authorship attribution involves determining the original author of an anonymous text from a pool of potential authors. The author attribution task has applications in several domains, such as plagiarism detection, digital text forensics, and information retrieval. While these applications extend beyond any single language, existing research has predominantly centered on English, posing challenges for application in languages such as Sinhala due to linguistic disparities and a lack of language processing tools. We present the first comprehensive study on cross-topic authorship attribution for Sinhala texts and propose a solution that can effectively perform the authorship attribution task even if the topics within the test and training samples differ. Our solution consists of three main parts: (i) extraction of topic-independent stylometric features, (ii) generation of a small candidate author set with the help of similarity search, and (iii) identification of the true author. Several experimental studies were carried out to demonstrate that the proposed solution can effectively handle real-world scenarios involving a large number of candidate authors and a limited number of text samples for each candidate author.

    Impact and Reach

    Statistics

    Activity Overview
    6 month trend
    85Downloads
    6 month trend
    34Hits

    Additional statistics for this dataset are available via IRStats2.

    Altmetric

    Repository staff only

    Edit record Edit record