Leveraging semantic text analysis to improve the performance of transformer-based relation extraction

Evans, Marie-Therese Charlotte, Latifi, Majid ORCID: https://orcid.org/0000-0002-2671-0516, Ahsan, Mominul ORCID: https://orcid.org/0000-0002-7300-506X and Haider, Julfikar ORCID: https://orcid.org/0000-0001-7010-8285 (2024) Leveraging semantic text analysis to improve the performance of transformer-based relation extraction. Information, 15 (2). 91. ISSN 2078-2489

Preview

Published Version
Available under License Creative Commons Attribution.
Download (5MB) | Preview

Official URL: http://dx.doi.org/10.3390/info15020091

Abstract

Keyword extraction from Knowledge Bases underpins the definition of relevancy in Digital Library search systems. However, it is the pertinent task of Joint Relation Extraction, which populates the Knowledge Bases from which results are retrieved. Recent work focuses on fine-tuned, Pre-trained Transformers. Yet, F1 scores for scientific literature achieve just 53.2, versus 69 in the general domain. The research demonstrates the failure of existing work to evidence the rationale for optimisations to finetuned classifiers. In contrast, emerging research subjectively adopts the common belief that Natural Language Processing techniques fail to derive context and shared knowledge. In fact, global context and shared knowledge account for just 10.4% and 11.2% of total relation misclassifications, respectively. In this work, the novel employment of semantic text analysis presents objective challenges for the Transformer-based classification of Joint Relation Extraction. This is the first known work to quantify that pipelined error propagation accounts for 45.3% of total relation misclassifications, the most poignant challenge in this domain. More specifically, Part-of-Speech tagging highlights the misclassification of complex noun phrases, accounting for 25.47% of relation misclassifications. Furthermore, this study identifies two limitations in the purported bidirectionality of the Bidirectional Encoder Representations from Transformers (BERT) Pre-trained Language Model. Firstly, there is a notable imbalance in the misclassification of right-to-left relations, which occurs at a rate double that of left-to-right relations. Additionally, a failure to recognise local context through determiners and prepositions contributes to 16.04% of misclassifications. Furthermore, it is highlighted that the annotation scheme of the singular dataset utilised in existing research, Scientific Entities, Relations and Coreferences (SciERC), is marred by ambiguity. Notably, two asymmetric relations within this dataset achieve recall rates of only 10% and 29%

Item Type:	Article (Article)
Peer-reviewed:	Yes
Date Deposited:	12 Feb 2024 09:41
Publisher:	MDPI AG
Additional Information:	This is an open access article which first appeared in Information, published by MDPI
Divisions:	Organisation > Science and Engineering
Subject terms:	08 Information and Computing Sciences
Data Access Statement:	Publicly available datasets were analysed in this study. This data can be found here: https://nlp.cs.washington.edu/sciIE/ (accessed on 1 February 2024).
URI:	https://e-space.mmu.ac.uk/id/eprint/633877
DOI:	https://doi.org/10.3390/info15020091
ISSN	2078-2489
e-ISSN	2078-2489

Impact and Reach

Statistics

DownloadsShow export options

Activity Overview

6 month trend

162Downloads

6 month trend

185Hits

Additional statistics for this dataset are available via IRStats2.

Altmetric

Repository staff only

Edit record