Hassan, Saeed-Ul, Imran, Mubashir, Iqbal, Sehrish, Aljohani, Naif Radi and Nawaz, Raheel (2018) Deep context of citations using machine‑learning models in scholarly full‑text articles. Scientometrics, 117 (3). ISSN 0138-9130
|
Accepted Version
Available under License In Copyright. Download (1MB) | Preview |
Abstract
Information retrieval systems for scholarly literature rely heavily not only on text matching but on semantic- and context-based features. Readers nowadays are deeply interested in how important an article is, its purpose and how influential it is in follow-up research work. Numerous techniques to tap the power of machine learning and artificial intelligence have been developed to enhance retrieval of the most influential scientific literature. In this paper, we compare and improve on four existing state-of-the-art techniques designed to identify influential citations. We consider 450 citations from the Association for Computational Linguistics corpus, classified by experts as either important or unimportant, and further extract 64 features based on the methodology of four state-of-the-art techniques. We apply the Extra-Trees classifier to select 29 best features and apply the Random Forest and Support Vector Machine classifiers to all selected techniques. Using the Random Forest classifier, our supervised model improves on the state-of-the-art method by 11.25%, with 89% Precision-Recall area under the curve. Finally, we present our deep-learning model, the Long Short-Term Memory network, that uses all 64 features to distinguish important and unimportant citations with 92.57% accuracy.
Impact and Reach
Statistics
Additional statistics for this dataset are available via IRStats2.