Solving feature sparseness in text classification using core-periphery decomposition

Cui, Xia ORCID: https://orcid.org/0000-0002-1726-3814, Kojaku, Sadamori, Masuda, Naoki and Bollegala, Danushka (2018) Solving feature sparseness in text classification using core-periphery decomposition. In: Proceedings of the Seventh Joint Conference on Lexical and Computational Semantics, pp. 255-264. Presented at The Seventh Joint Conference on Lexical and Computational Semantics, 05 June 2018 - 06 June 2018, New Orleans, Louisiana, USA.

Preview

Published Version
Available under License Creative Commons Attribution.
Download (472kB) | Preview

Official URL: http://dx.doi.org/10.18653/v1/s18-2030

Abstract

Feature sparseness is a problem common to cross-domain and short-text classification tasks. To overcome this feature sparseness problem, we propose a novel method based on graph decomposition to find candidate features for expanding feature vectors. Specifically, we first create a feature-relatedness graph, which is subsequently decomposed into core-periphery (CP) pairs and use the peripheries as the expansion candidates of the cores. We expand both training and test instances using the computed related features and use them to train a text classifier. We observe that prioritising features that are common to both training and test instances as cores during the CP decomposition to further improve the accuracy of text classification. We evaluate the proposed CP-decomposition-based feature expansion method on benchmark datasets for cross-domain sentiment classification and short-text classification. Our experimental results show that the proposed method consistently outperforms all baselines on short-text classification tasks, and perform competitively with pivot-based cross-domain sentiment classification methods.

Item Type:	Conference or Workshop Item (Paper)
Published Proceedings:	Proceedings of the Seventh Joint Conference on Lexical and Computational Semantics
Peer-reviewed:	Yes
Date Deposited:	24 Mar 2023 13:49
Publisher:	Association for Computational Linguistics
Divisions:	Organisation > Science and Engineering
URI:	https://e-space.mmu.ac.uk/id/eprint/631633
DOI:	https://doi.org/10.18653/v1/s18-2030

Impact and Reach

Statistics

DownloadsShow export options

Activity Overview

6 month trend

215Downloads

6 month trend

93Hits

Additional statistics for this dataset are available via IRStats2.

Altmetric

Repository staff only

Edit record