Paper
1 April 1998 Methodologies for using UW databases for OCR and image-understanding systems
Ihsin T. Phillips
Author Affiliations +
Proceedings Volume 3305, Document Recognition V; (1998) https://doi.org/10.1117/12.304624
Event: Photonics West '98 Electronic Imaging, 1998, San Jose, CA, United States
Abstract
This paper discusses methodologies for automatically selecting document pages and zones form the UW databases, having the desired page/zone attributes. The selected pages can then be randomly partitioned into subsets for training and testing purposes. This paper also discusses three degradation methodologies that allow the developers of OCR and document recognition systems to create unlimited 'real- life' degraded images - with geometric distortions, coffee stains and water marks. Since the degraded images are created from the images in the UW databases, the nearly perfect original groundtruth files in the UW databases can be reused. The process of creating the additional document images, the associated groundtruth and attribute files require only a fraction of the original cost and time.
© (1998) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Ihsin T. Phillips "Methodologies for using UW databases for OCR and image-understanding systems", Proc. SPIE 3305, Document Recognition V, (1 April 1998); https://doi.org/10.1117/12.304624
Lens.org Logo
CITATIONS
Cited by 14 scholarly publications.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Databases

Optical character recognition

Binary data

Image processing

Image understanding

Mathematics

Nanoimprint lithography

RELATED CONTENT


Back to Top