Extracting algorithmic complexity in scientific literature for advance searching

Bakar, Abu, Sarwar, Raheem ORCID: https://orcid.org/0000-0002-0640-807X, Hassan, Saeed-Ul and Nawaz, Raheel (2023) Extracting algorithmic complexity in scientific literature for advance searching. Journal of Computational and Applied Linguistics, 1. pp. 39-65. ISSN 2815-4967

Preview

Published Version
Available under License In Copyright.
Download (275kB) | Preview

Official URL: https://ojs.nbu.bg/index.php/JCAL/article/view/959

Abstract

Non-textual document elements such as charts, diagrams, algorithms and tables play an important role to present key information in scientific documents. Recent advances in information retrieval systems tap this information to answer more complex user queries by mining text pertaining to non-textual document elements from full text. Algorithms are critically important in computer science. Researchers are working on existing algorithms to improve them for critical application. Moreover, new algorithms for unsolved and newly faced problems are under development. These enhanced and new algorithms are mostly published in scholarly documents. The complexity of these algorithms is also discussed in the same document by the authors. Complexity of an algorithm is also an important factor for information retrieval (IR) systems. In this paper, we mine the relevant complexities of algorithms from full text document by comparing the metadata of the algorithm, such as caption and function name, with the context of the paragraph in which complexity related discussion is made by the authors. Using the dataset of 256 documents downloaded from CiteSeerX repository, we manually annotate 417 links between algorithms and their complexities. Further, we apply our novel rule-based approach that identifies the desired links with 81% precision, 75% recall, 78% F1-score and 65% accuracy. Overall, our method of identifying the links has potential to improve information retrieval systems that tap the advancements of full text and more specifically non-textual document elements.

Item Type:	Article (Article)
Peer-reviewed:	Yes
Date Deposited:	19 Jul 2024 13:40
Publisher:	New Bulgarian University
Additional Information:	This article first appeared in Journal of Computational and Applied Linguistics
Divisions:	Organisation > Business and Law
URI:	https://e-space.mmu.ac.uk/id/eprint/632632
DOI:	https://doi.org/10.33919/JCAL.23.1.2
ISSN	2815-4967

Impact and Reach

Statistics

DownloadsShow export options

Activity Overview

6 month trend

50Downloads

6 month trend

266Hits

Additional statistics for this dataset are available via IRStats2.

Altmetric

Repository staff only

Edit record