e-space
Manchester Metropolitan University's Research Repository

    How do control tokens affect natural language generation tasks like text simplification

    Li, Zhao ORCID logoORCID: https://orcid.org/0009-0009-1071-5708 and Shardlow, Matthew ORCID logoORCID: https://orcid.org/0000-0003-1129-2750 (2024) How do control tokens affect natural language generation tasks like text simplification. Natural Language Engineering. pp. 1-28. ISSN 1351-3249

    [img]
    Preview
    Published Version
    Available under License Creative Commons Attribution.

    Download (826kB) | Preview

    Abstract

    Recent work on text simplification has focused on the use of control tokens to further the state-of-the-art. However, it is not easy to further improve without an in-depth comprehension of the mechanisms underlying control tokens. One unexplored factor is the tokenization strategy, which we also explore. In this paper, we (1) reimplemented AudienCe-CEntric Sentence Simplification, (2) explored the effects and interactions of varying control tokens, (3) tested the influences of different tokenization strategies, (4) demonstrated how separate control tokens affect performance and (5) proposed new methods to predict the value of control tokens. We show variations of performance in the four control tokens separately. We also uncover how the design of control tokens could influence performance and give some suggestions for designing control tokens. We show the newly proposed method with higher performance in both SARI (a common scoring metric in text simplificaiton) and BERTScore (a score derived from the BERT language model) and potential in real applications.

    Impact and Reach

    Statistics

    Activity Overview
    6 month trend
    880Downloads
    6 month trend
    12Hits

    Additional statistics for this dataset are available via IRStats2.

    Altmetric

    Repository staff only

    Edit record Edit record