Acquiring foreign language from degraded hardcopy documents is of interest to military and border control applications. Bi-tonal image scans are desirable because file size is small. However, the nature of hardcopy degradations and the scanner or image enhancement software capabilities used directly affect the quality of the captured image and the extent of language acquisition. We applied a collection of manual treatments to hardcopy Arabic documents to develop a corpus of bi-tonal images. We then used this corpus in an exploratory study to derive conclusions about how bi-tonal images could be enhanced. This paper discusses the manually degraded Arabic document corpus, the image enhancement study, and the significant optical character recognition (OCR) improvements obtained with simple scanner driver adjustments.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.