e-space
Manchester Metropolitan University's Research Repository

    Contour-Guided Context Learning for Scene Text Recognition

    Hsieh, WC, Hsu, GS ORCID logoORCID: https://orcid.org/0000-0003-2631-0448, Chen, JY, Yap, MH ORCID logoORCID: https://orcid.org/0000-0001-7681-4287 and Chao, ZC (2025) Contour-Guided Context Learning for Scene Text Recognition. In: Pattern Recognition: 27th International Conference, ICPR 2024, Kolkata, India, December 1–5, 2024, Proceedings, Part XX, pp. 103-117. Presented at 27th International Conference, ICPR 2024, 1 December 2024 – 5 December 2024, Kolkata, India.

    [img] Accepted Version
    File will be available on: 5 December 2025.
    Available under License In Copyright.

    Download (6MB)

    Abstract

    We propose contour-guided context learning (CCL) for bilingual scene text recognition (STR). The CCL framework consists of three parts: Contour Guided Transformer (CGT), Contextual Learning Transformer (CLT) and Multimodal Transformer (MMT) for fusion. CGT embeds a CLIP image encoder and utilizes CLIP’s pre-training capabilities to capture contour features from input images, and CLT embeds a CLIP text encoder to correct contextual errors. The fusion network incorporates attention features extracted by Transformer to enhance text recognition performance. Unlike most STR methods that only target English, the proposed CCL is designed to handle both English and Chinese and can handle irregularly shaped scene text. We conduct a comprehensive evaluation on Chinese and English benchmark datasets to validate the performance of our approach against state-of-the-art methods.

    Impact and Reach

    Statistics

    Activity Overview
    6 month trend
    1Download
    6 month trend
    31Hits

    Additional statistics for this dataset are available via IRStats2.

    Altmetric

    Repository staff only

    Edit record Edit record