Methodologies for using UW databases for OCR and image-understanding systems

Ihsin T. Phillips

doi:10.1117/12.304624

1 April 1998 Methodologies for using UW databases for OCR and image-understanding systems

Ihsin T. Phillips

Proceedings Volume 3305, Document Recognition V; (1998) https://doi.org/10.1117/12.304624
Event: Photonics West '98 Electronic Imaging, 1998, San Jose, CA, United States

Abstract

This paper discusses methodologies for automatically selecting document pages and zones form the UW databases, having the desired page/zone attributes. The selected pages can then be randomly partitioned into subsets for training and testing purposes. This paper also discusses three degradation methodologies that allow the developers of OCR and document recognition systems to create unlimited 'real- life' degraded images - with geometric distortions, coffee stains and water marks. Since the degraded images are created from the images in the UW databases, the nearly perfect original groundtruth files in the UW databases can be reused. The process of creating the additional document images, the associated groundtruth and attribute files require only a fraction of the original cost and time.

Citation Download Citation

Ihsin T. Phillips "Methodologies for using UW databases for OCR and image-understanding systems", Proc. SPIE 3305, Document Recognition V, (1 April 1998); https://doi.org/10.1117/12.304624

ACCESS THE FULL ARTICLE

INSTITUTIONAL
Select your institution to access the SPIE Digital Library.

SELECT YOUR INSTITUTION

PERSONAL
Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.

PERSONAL SIGN IN

No SPIE Account? Create one

PURCHASE THIS CONTENT

SUBSCRIBE TO DIGITAL LIBRARY

50 downloads per 1-year subscription

Members: $195

Non-members: $335 ADD TO CART

25 downloads per 1 - year subscription

Members: $145

Non-members: $250 ADD TO CART

PURCHASE SINGLE ARTICLE

Includes PDF, HTML & Video, when available