Hsieh, WC, Hsu, GS ORCID: https://orcid.org/0000-0003-2631-0448, Chen, JY, Yap, MH
ORCID: https://orcid.org/0000-0001-7681-4287 and Chao, ZC
(2025)
Contour-Guided Context Learning for Scene Text Recognition.
In: Pattern Recognition: 27th International Conference, ICPR 2024, Kolkata, India, December 1–5, 2024, Proceedings, Part XX, pp. 103-117. Presented at 27th International Conference, ICPR 2024, 1 December 2024 – 5 December 2024, Kolkata, India.
![]() |
Accepted Version
File will be available on: 5 December 2025. Available under License In Copyright. Download (6MB) |
Abstract
We propose contour-guided context learning (CCL) for bilingual scene text recognition (STR). The CCL framework consists of three parts: Contour Guided Transformer (CGT), Contextual Learning Transformer (CLT) and Multimodal Transformer (MMT) for fusion. CGT embeds a CLIP image encoder and utilizes CLIP’s pre-training capabilities to capture contour features from input images, and CLT embeds a CLIP text encoder to correct contextual errors. The fusion network incorporates attention features extracted by Transformer to enhance text recognition performance. Unlike most STR methods that only target English, the proposed CCL is designed to handle both English and Chinese and can handle irregularly shaped scene text. We conduct a comprehensive evaluation on Chinese and English benchmark datasets to validate the performance of our approach against state-of-the-art methods.
Impact and Reach
Statistics
Additional statistics for this dataset are available via IRStats2.