Li, Zihao (2024) Controllable Text Simplification. Masters by Research thesis (MPhil), Manchester Metropolitan University.
|
Available under License Creative Commons Attribution Non-commercial No Derivatives. Download (1MB) | Preview |
Abstract
Text simplification is a tool that enhances the accessibility of text at both lexical and syntactical levels. It aids individuals in comprehending complex texts more easily, particularly children, students, and people with reading difficulties. This thesis aims to investigate the effects, limitations, and potential enhancements of the current state-of-the-art method in text simplification. The thesis consists of three main experiments and addresses the challenges in text simplification. In the first experiment, we focused on the various needs in text simplification, explored the impact of the control mechanism in text simplification, re-implemented the state-of-the-art system with less computation power, and redesigned the tokenization and quantization for the control mechanism to improve the performance by up to 0.5 points in the metrics. In the second experiment, we addressed the impact of text style in text simplification tasks in different domains, constructed a genre-specific test scenario focused on coronavirus, verified the effect of the genre in text simplification tasks, and compared these models with large language models (e.g. ChatGPT) as the generic model. In the final experiment, we addressed the lack of adaptation ability in the system, fine-tuned models to predict the value of four control tokens, integrated these predictors with the current system, and thereby enhanced the practicality and popularity of controllable text simplification systems. As a result, we explored the mechanism of control tokens, verified the effectiveness of controllable text simplification in the genre-specific corpus, and improved the overall performance and adaptability of the controllable text simplification system.
Impact and Reach
Statistics
Additional statistics for this dataset are available via IRStats2.