e-space
Manchester Metropolitan University's Research Repository

    Linguistic patterns in pandemic-related content: a comparative analysis of COVID-19, Constraint, and Monkeypox datasets

    Sikosana, Mkululi ORCID logoORCID: https://orcid.org/0009-0008-3244-5764, Maudsley-Barton, Sean ORCID logoORCID: https://orcid.org/0000-0003-0289-0783 and Ajao, Oluwaseun ORCID logoORCID: https://orcid.org/0000-0002-6606-6569 (2025) Linguistic patterns in pandemic-related content: a comparative analysis of COVID-19, Constraint, and Monkeypox datasets. Frontiers in Artificial Intelligence, 8. 1627522.

    [img]
    Preview
    Published Version
    Available under License Creative Commons Attribution.

    Download (774kB) | Preview

    Abstract

    Introduction: This study investigates how linguistic features distinguish health misinformation from factual communication in pandemic-related online discourse. Understanding these differences is essential for improving detection of misinformation and informing effective public health messaging during crises. Methods: We conducted a computational linguistic analysis across three corpora: COVID-19 false narratives (n = 7,588), general COVID-19 content (n = 10,700), and Monkeypox-related posts (n = 5,787). We examined readability, rhetorical markers, and persuasive language, focusing on differences between misinformation and factual communication. Results: COVID-19 misinformation exhibited markedly lower readability scores and contained more than twice the frequency of fear-related and persuasive terms compared to the other datasets. It showed minimal use of exclamation marks, contrasting with the more emotive style of Monkeypox content. These findings suggest that misinformation employs a deliberately complex rhetorical style combined with emotional cues, which may enhance perceived credibility. Discussion: Our findings contribute to the growing body of research on digital health misinformation by identifying linguistic indicators that can aid in detection. They also inform theoretical models of crisis communication and public health messaging strategies in networked media environments. However, the study has limitations, including reliance on traditional readability indices, a narrow persuasive lexicon, and static aggregate analysis. Future work should adopt longitudinal designs, incorporate broader emotion lexicons, and employ platform-sensitive approaches to improve robustness. The data and code supporting this study are openly available at: https://doi.org/10.5281/zenodo.17024569.

    Impact and Reach

    Statistics

    Activity Overview
    6 month trend
    2Downloads
    6 month trend
    3Hits

    Additional statistics for this dataset are available via IRStats2.

    Altmetric

    Repository staff only

    Edit record Edit record