e-space
Manchester Metropolitan University's Research Repository

    Extracting algorithmic complexity in scientific literature for advance searching

    Bakar, Abu, Sarwar, Raheem ORCID logoORCID: https://orcid.org/0000-0002-0640-807X, Hassan, Saeed-Ul and Nawaz, Raheel (2023) Extracting algorithmic complexity in scientific literature for advance searching. Journal of Computational and Applied Linguistics, 1. pp. 39-65. ISSN 2815-4967

    [img]
    Preview
    Published Version
    Available under License In Copyright.

    Download (275kB) | Preview

    Abstract

    Non-textual document elements such as charts, diagrams, algorithms and tables play an important role to present key information in scientific documents. Recent advances in information retrieval systems tap this information to answer more complex user queries by mining text pertaining to non-textual document elements from full text. Algorithms are critically important in computer science. Researchers are working on existing algorithms to improve them for critical application. Moreover, new algorithms for unsolved and newly faced problems are under development. These enhanced and new algorithms are mostly published in scholarly documents. The complexity of these algorithms is also discussed in the same document by the authors. Complexity of an algorithm is also an important factor for information retrieval (IR) systems. In this paper, we mine the relevant complexities of algorithms from full text document by comparing the metadata of the algorithm, such as caption and function name, with the context of the paragraph in which complexity related discussion is made by the authors. Using the dataset of 256 documents downloaded from CiteSeerX repository, we manually annotate 417 links between algorithms and their complexities. Further, we apply our novel rule-based approach that identifies the desired links with 81% precision, 75% recall, 78% F1-score and 65% accuracy. Overall, our method of identifying the links has potential to improve information retrieval systems that tap the advancements of full text and more specifically non-textual document elements.

    Impact and Reach

    Statistics

    Activity Overview
    6 month trend
    28Downloads
    6 month trend
    27Hits

    Additional statistics for this dataset are available via IRStats2.

    Altmetric

    Repository staff only

    Edit record Edit record