Sikosana, Mkululi ORCID: https://orcid.org/0009-0008-3244-5764, Maudsley-Barton, Sean
ORCID: https://orcid.org/0000-0003-0289-0783 and Ajao, Oluwaseun
ORCID: https://orcid.org/0000-0002-6606-6569
(2025)
Linguistic patterns in pandemic-related content: a comparative analysis of COVID-19, Constraint, and Monkeypox datasets.
Frontiers in Artificial Intelligence, 8.
1627522.
|
Published Version
Available under License Creative Commons Attribution. Download (774kB) | Preview |
Abstract
Introduction: This study investigates how linguistic features distinguish health misinformation from factual communication in pandemic-related online discourse. Understanding these differences is essential for improving detection of misinformation and informing effective public health messaging during crises. Methods: We conducted a computational linguistic analysis across three corpora: COVID-19 false narratives (n = 7,588), general COVID-19 content (n = 10,700), and Monkeypox-related posts (n = 5,787). We examined readability, rhetorical markers, and persuasive language, focusing on differences between misinformation and factual communication. Results: COVID-19 misinformation exhibited markedly lower readability scores and contained more than twice the frequency of fear-related and persuasive terms compared to the other datasets. It showed minimal use of exclamation marks, contrasting with the more emotive style of Monkeypox content. These findings suggest that misinformation employs a deliberately complex rhetorical style combined with emotional cues, which may enhance perceived credibility. Discussion: Our findings contribute to the growing body of research on digital health misinformation by identifying linguistic indicators that can aid in detection. They also inform theoretical models of crisis communication and public health messaging strategies in networked media environments. However, the study has limitations, including reliance on traditional readability indices, a narrow persuasive lexicon, and static aggregate analysis. Future work should adopt longitudinal designs, incorporate broader emotion lexicons, and employ platform-sensitive approaches to improve robustness. The data and code supporting this study are openly available at: https://doi.org/10.5281/zenodo.17024569.
Impact and Reach
Statistics
Additional statistics for this dataset are available via IRStats2.

