Paper
30 March 1995 Counting OCR errors in typeset text
Jonathan S. Sandberg
Author Affiliations +
Proceedings Volume 2422, Document Recognition II; (1995) https://doi.org/10.1117/12.205821
Event: IS&T/SPIE's Symposium on Electronic Imaging: Science and Technology, 1995, San Jose, CA, United States
Abstract
Frequently object recognition accuracy is a key component in the performance analysis of pattern matching systems. In the past three years, the results of numerous excellent and rigorous studies of OCR system typeset-character accuracy (henceforth OCR accuracy) have been published, encouraging performance comparisons between a variety of OCR products and technologies. These published figures are important; OCR vendor advertisements in the popular trade magazines lead readers to believe that published OCR accuracy figures effect market share in the lucrative OCR market. Curiously, a detailed review of many of these OCR error occurrence counting results reveals that they are not reproducible as published and they are not strictly comparable due to larger variances in the counts than would be expected by the sampling variance. Naturally, since OCR accuracy is based on a ratio of the number of OCR errors over the size of the text searched for errors, imprecise OCR error accounting leads to similar imprecision in OCR accuracy. Some published papers use informal, non-automatic, or intuitively correct OCR error accounting. Still other published results present OCR error accounting methods based on string matching algorithms such as dynamic programming using Levenshtein (edit) distance but omit critical implementation details (such as the existence of suspect markers in the OCR generated output or the weights used in the dynamic programming minimization procedure). The problem with not specifically revealing the accounting method is that the number of errors found by different methods are significantly different. This paper identifies the basic accounting methods used to measure OCR errors in typeset text and offers an evaluation and comparison of the various accounting methods.
© (1995) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Jonathan S. Sandberg "Counting OCR errors in typeset text", Proc. SPIE 2422, Document Recognition II, (30 March 1995); https://doi.org/10.1117/12.205821
Lens.org Logo
CITATIONS
Cited by 1 scholarly publication.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Optical character recognition

Error analysis

Computer programming

Databases

Scanners

Radon

Statistical modeling

Back to Top