Bakar, Abu, Sarwar, Raheem ORCID: https://orcid.org/0000-0002-0640-807X, Hassan, Saeed-Ul and Nawaz, Raheel (2023) Extracting algorithmic complexity in scientific literature for advance searching. Journal of Computational and Applied Linguistics, 1. pp. 39-65. ISSN 2815-4967
|
Published Version
Available under License In Copyright. Download (275kB) | Preview |
Abstract
Non-textual document elements such as charts, diagrams, algorithms and tables play an important role to present key information in scientific documents. Recent advances in information retrieval systems tap this information to answer more complex user queries by mining text pertaining to non-textual document elements from full text. Algorithms are critically important in computer science. Researchers are working on existing algorithms to improve them for critical application. Moreover, new algorithms for unsolved and newly faced problems are under development. These enhanced and new algorithms are mostly published in scholarly documents. The complexity of these algorithms is also discussed in the same document by the authors. Complexity of an algorithm is also an important factor for information retrieval (IR) systems. In this paper, we mine the relevant complexities of algorithms from full text document by comparing the metadata of the algorithm, such as caption and function name, with the context of the paragraph in which complexity related discussion is made by the authors. Using the dataset of 256 documents downloaded from CiteSeerX repository, we manually annotate 417 links between algorithms and their complexities. Further, we apply our novel rule-based approach that identifies the desired links with 81% precision, 75% recall, 78% F1-score and 65% accuracy. Overall, our method of identifying the links has potential to improve information retrieval systems that tap the advancements of full text and more specifically non-textual document elements.
Impact and Reach
Statistics
Additional statistics for this dataset are available via IRStats2.