e-space
Manchester Metropolitan University's Research Repository

    Bias in, Bias out: Annotation Bias in Multilingual Large Language Models

    Cui, Xia ORCID logoORCID: https://orcid.org/0000-0002-1726-3814, Huang, Ziyi and Adel, Naomi (2025) Bias in, Bias out: Annotation Bias in Multilingual Large Language Models. In: Proceedings of the First Interdisciplinary Workshop on Observations of Misunderstood, Misguided and Malicious Use of Language Models (OMMM 2025), pp. 1-16. Presented at The 15th International Conference on Recent Advances in Natural Language Processing 2025, 11 September 2025 - 13 September 2025, Varna, Bulgaria.

    [img]
    Preview
    Published Version
    Available under License In Copyright.

    Download (557kB) | Preview

    Abstract

    Annotation bias in NLP datasets remains a major challenge for developing multilingual Large Language Models (LLMs), particularly in culturally diverse settings. Bias from task framing, annotator subjectivity, and cultural mismatches can distort model outputs and exacerbate social harms. We propose a comprehensive framework for understanding annotation bias, distinguishing among instruction bias, annotator bias, and contextual and cultural bias. We review detection methods (including inter-annotator agreement, model disagreement, and metadata analysis) and highlight emerging techniques such as multilingual model divergence and cultural inference. We further outline proactive and reactive mitigation strategies, including diverse annotator recruitment, iterative guideline refinement, and post-hoc model adjustments. Our contributions include: (1) a structured typology of annotation bias, (2) a comparative synthesis of detection metrics, (3) an ensemble-based bias mitigation approach adapted for multilingual settings, and (4) an ethical analysis of annotation processes. Together, these contributions aim to inform the design of more equitable annotation pipelines for LLMs.

    Impact and Reach

    Statistics

    Activity Overview
    6 month trend
    13Downloads
    6 month trend
    30Hits

    Additional statistics for this dataset are available via IRStats2.

    Altmetric

    Repository staff only

    Edit record Edit record