Effect of data imbalance on Unsupervised Domain Adaptation of Part-of-Speech tagging and pivot selection strategies

Cui, Xia ORCID: https://orcid.org/0000-0002-1726-3814, Coenen, Frans and Bollegala, Danushka (2017) Effect of data imbalance on Unsupervised Domain Adaptation of Part-of-Speech tagging and pivot selection strategies. In: Proceedings of Machine Learning Research, pp. 103-115. Presented at First International Workshop on Learning with Imbalanced Domains: Theory and Applications (ECML-PKDD 2017), 22 September 2017 - 22 September 2017, Skopje, Macedonia.

Preview

Published Version
Available under License In Copyright.
Download (651kB) | Preview

Official URL: https://proceedings.mlr.press/v74/cui17a

Abstract

Domain adaptation is the task of transforming a model trained using data from a source domain to a different target domain. In Unsupervised Domain Adaptation (UDA), we do not assume any labelled training data from the target domain. In this paper, we consider the problem of UDA in the contact of Part-of-Speech (POS). Specifically, we study the effect of data imbalance on UDA of POS, and compare different pivot selection strategies for accurately adapting a POS tagger trained using some source domain data to a target domain. We propose the use of F-score to select pivots using available labelled data in the source domain. Our experimental results on using benchmark dataset for cross-domain POS tagging, show that using frequency combined with F-scores for selecting pivots in the source labelled data produces the best results.

Item Type:	Conference or Workshop Item (Paper)
Published Proceedings:	Proceedings of Machine Learning Research
Volume:	74
Peer-reviewed:	No
Date Deposited:	24 Mar 2023 14:26
Publisher:	ML Research Press
Divisions:	Organisation > Science and Engineering
URI:	https://e-space.mmu.ac.uk/id/eprint/631634
ISSN	2640-3498

Impact and Reach

Statistics

DownloadsShow export options

Activity Overview

6 month trend

26Downloads

6 month trend

98Hits

Additional statistics for this dataset are available via IRStats2.

Repository staff only

Edit record