3 August 2022 Spatial attention contrastive network for scene text recognition
Fan Wang, Dong Yin
Author Affiliations +
Abstract

At present, most scene text recognition methods achieve good performance by training models on many synthetic data. However, many data lead to huge storage space and large amount of calculation. And there is a gap between synthetic and real data. To solve these problems, we use a few real data to train a novel proposed model named spatial attention contrastive network (SAC-Net). The SAC-Net consists of a background suppression network (BSNet), a feature encoder, an attention decoder (ADEer), and a feature contrastive network (FCNet). The BSNet based on U-Net is used to reduce the interference of background. For relatively low prediction accuracy brought by connectionist temporal classification, we design an ADEer to improve performance by using convolutional attention mechanism. Based on data augmentation, we design a FCNet which belongs to contrastive learning. Finally, our SAC-Net is almost equivalent to the state-of-the-art model trained on a few real data for word accuracy on six benchmark test datasets.

© 2022 SPIE and IS&T
Fan Wang and Dong Yin "Spatial attention contrastive network for scene text recognition," Journal of Electronic Imaging 31(4), 043026 (3 August 2022). https://doi.org/10.1117/1.JEI.31.4.043026
Received: 25 February 2022; Accepted: 7 July 2022; Published: 3 August 2022
Lens.org Logo
CITATIONS
Cited by 1 scholarly publication.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Data modeling

Performance modeling

Convolution

Computer programming

Feature extraction

Detection and tracking algorithms

Statistical modeling

Back to Top