Linguistic patterns in pandemic-related content: a comparative analysis of COVID-19, Constraint, and Monkeypox datasets

Sikosana, Mkululi ORCID: https://orcid.org/0009-0008-3244-5764, Maudsley-Barton, Sean ORCID: https://orcid.org/0000-0003-0289-0783 and Ajao, Oluwaseun ORCID: https://orcid.org/0000-0002-6606-6569 (2025) Linguistic patterns in pandemic-related content: a comparative analysis of COVID-19, Constraint, and Monkeypox datasets. Frontiers in Artificial Intelligence, 8. 1627522.

Preview

Published Version
Available under License Creative Commons Attribution.
Download (774kB) | Preview

Official URL: https://doi.org/10.3389/frai.2025.1627522

Abstract

Introduction: This study investigates how linguistic features distinguish health misinformation from factual communication in pandemic-related online discourse. Understanding these differences is essential for improving detection of misinformation and informing effective public health messaging during crises. Methods: We conducted a computational linguistic analysis across three corpora: COVID-19 false narratives (n = 7,588), general COVID-19 content (n = 10,700), and Monkeypox-related posts (n = 5,787). We examined readability, rhetorical markers, and persuasive language, focusing on differences between misinformation and factual communication. Results: COVID-19 misinformation exhibited markedly lower readability scores and contained more than twice the frequency of fear-related and persuasive terms compared to the other datasets. It showed minimal use of exclamation marks, contrasting with the more emotive style of Monkeypox content. These findings suggest that misinformation employs a deliberately complex rhetorical style combined with emotional cues, which may enhance perceived credibility. Discussion: Our findings contribute to the growing body of research on digital health misinformation by identifying linguistic indicators that can aid in detection. They also inform theoretical models of crisis communication and public health messaging strategies in networked media environments. However, the study has limitations, including reliance on traditional readability indices, a narrow persuasive lexicon, and static aggregate analysis. Future work should adopt longitudinal designs, incorporate broader emotion lexicons, and employ platform-sensitive approaches to improve robustness. The data and code supporting this study are openly available at: https://doi.org/10.5281/zenodo.17024569.

Item Type:	Article (Article)
Peer-reviewed:	Yes
Date Deposited:	03 Nov 2025 14:40
Publisher:	Frontiers Media
Additional Information:	This is an open access article published in Frontiers in Artificial Intelligence, by Frontiers Media.
Divisions:	Organisation > Science and Engineering Organisation > Science and Engineering > Department of Computing and Maths
Subject terms:	4007 Control engineering, mechatronics and robotics, 4602 Artificial intelligence, 4611 Machine learning
Data Access Statement:	All code and reproducibility materials are openly available on Zenodo Sikosana (2025). The following datasets were used in this study: (Patwa et al., 2021; Saenz et al., 2021; Crone, 2022). All other relevant data supporting the findings of this study are provided within the article.
URI:	https://e-space.mmu.ac.uk/id/eprint/642013
DOI:	https://doi.org/10.3389/frai.2025.1627522
e-ISSN	2624-8212

Impact and Reach

Statistics

DownloadsShow export options

Activity Overview

6 month trend

2Downloads

6 month trend

3Hits

Additional statistics for this dataset are available via IRStats2.

Altmetric

Repository staff only

Edit record