Vásquez-Rodríguez, Laura, Shardlow, Matthew ORCID: https://orcid.org/0000-0003-1129-2750, Przybyła, Piotr and Ananiadou, Sophia (2021) The role of Text Simplification operations in evaluation. In: First Workshop on Current Trends in Text Simplification (CTTS 2021) co-located with the 37th Conference of the Spanish Society for Natural Language Processing (SEPLN2021), 21 September 2021 - 21 September 2021, Online (initially located in Málaga, Spain).
|
Published Version
Available under License Creative Commons Attribution. Download (966kB) | Preview |
Abstract
Research in Text Simplification (TS) has relied mostly on the Wikipedia-based datasets and the SARI evaluation metric, as the preferred means for creating and evaluating new simplification methods. Previous studies have pointed out the flaws of data evaluation resources, including incorrect alignment of simple/ complex sentence pairs, sentences with no simplifications or a dearth in the variety of simplification operations. However, there are no further analyses on the impact of the original data distribution regarding the type of simplification operations performed. In this paper, we set up a systematic benchmark of the most common TS datasets, basing our evaluation on different protocols for split selection (e.g., selection by random or by Monte Carlo). We perform an operation-based investigation, demonstrating in detail the limitations of existing simplification datasets. Further, we make recommendations for future standardised practices in the design, creation and evaluation of TS resources. © 2021
Impact and Reach
Statistics
Additional statistics for this dataset are available via IRStats2.