Towards a corpus for credibility assessment in software practitioner blog articles

Williams, Ashley ORCID: https://orcid.org/0000-0002-6888-0521, Shardlow, Matthew ORCID: https://orcid.org/0000-0003-1129-2750 and Rainer, Austen (2021) Towards a corpus for credibility assessment in software practitioner blog articles. In: EASE 2021: Evaluation and Assessment in Software Engineering, pp. 100-108. Presented at EASE 2021: Evaluation and Assessment in Software Engineering, 21 June 2021 - 23 June 2021, Trondheim, Norway.

Preview

Accepted Version
Available under License In Copyright.
Download (510kB) | Preview

Official URL: https://dl.acm.org/doi/10.1145/3463274.3463330

Abstract

Background: Blogs are a source of grey literature which are widely adopted by software practitioners for disseminating opinion and experience. Analysing such articles can provide useful insights into the state-of-practice for software engineering research. However, there are challenges in identifying higher quality content from the large quantity of articles available. Credibility assessment can help in identifying quality content, though there is a lack of existing corpora. Credibility is typically measured through a series of conceptual criteria, with 'argumentation' and 'evidence' being two important criteria. Objective: We create a corpus labelled for argumentation and evidence that can aid the credibility community. The corpus consists of articles from the blog of a single software practitioner and is publicly available. Method: Three annotators label the corpus with a series of conceptual credibility criteria, reaching an agreement of 0.82 (Fleiss' Kappa). We present preliminary analysis of the corpus by using it to investigate the identification of claim sentences (one of our ten labels). Results: We train four systems (Bert, KNN, Decision Tree and SVM) using three feature sets (Bag of Words, Topic Modelling and InferSent), achieving an F1 score of 0.64 using InferSent and a Linear SVM. Conclusions: Our preliminary results are promising, indicating that the corpus can help future studies in detecting the credibility of grey literature. Future research will investigate the degree to which the sentence level annotations can infer the credibility of the overall document.

Item Type:	Conference or Workshop Item (Paper)
Published Proceedings:	EASE 2021: Evaluation and Assessment in Software Engineering
Peer-reviewed:	Yes
Date Deposited:	22 Sep 2022 13:01
Publisher:	Association for Computing Machinery (ACM)
Additional Information:	© ACM 2021. This is the author's version of the work. It is posted here for your personal use. Not for redistribution. The definitive Version of Record was published in EASE 2021: Evaluation and Assessment in Software Engineering, http://dx.doi.org/10.1145/3463274.3463330
Divisions:	Organisation > Science and Engineering
Subject terms:	Science & Technology, Technology, Computer Science, Software Engineering, Computer Science, credibility assessment, argumentation mining, experience mining, text mining, GREY LITERATURE, cs.SE, cs.SE
URI:	https://e-space.mmu.ac.uk/id/eprint/629935
DOI:	https://doi.org/10.1145/3463274.3463330

Impact and Reach

Statistics

DownloadsShow export options

Activity Overview

6 month trend

371Downloads

6 month trend

123Hits

Additional statistics for this dataset are available via IRStats2.

Altmetric

Repository staff only

Edit record