Paper
24 January 2011 Robust keyword retrieval method for OCRed text
Yusaku Fujii, Hiroaki Takebe, Hiroshi Tanaka, Yoshinobu Hotta
Author Affiliations +
Proceedings Volume 7874, Document Recognition and Retrieval XVIII; 787411 (2011) https://doi.org/10.1117/12.876470
Event: IS&T/SPIE Electronic Imaging, 2011, San Francisco Airport, California, United States
Abstract
Document management systems have become important because of the growing popularity of electronic filing of documents and scanning of books, magazines, manuals, etc., through a scanner or a digital camera, for storage or reading on a PC or an electronic book. Text information acquired by optical character recognition (OCR) is usually added to the electronic documents for document retrieval. Since texts generated by OCR generally include character recognition errors, robust retrieval methods have been introduced to overcome this problem. In this paper, we propose a retrieval method that is robust against both character segmentation and recognition errors. In the proposed method, the insertion of noise characters and dropping of characters in the keyword retrieval enables robustness against character segmentation errors, and character substitution in the keyword of the recognition candidate for each character in OCR or any other character enables robustness against character recognition errors. The recall rate of the proposed method was 15% higher than that of the conventional method. However, the precision rate was 64% lower.
© (2011) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Yusaku Fujii, Hiroaki Takebe, Hiroshi Tanaka, and Yoshinobu Hotta "Robust keyword retrieval method for OCRed text", Proc. SPIE 7874, Document Recognition and Retrieval XVIII, 787411 (24 January 2011); https://doi.org/10.1117/12.876470
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Optical character recognition

Error analysis

Copper

Chlorine

Detection and tracking algorithms

Image segmentation

Document management

Back to Top