Paper
27 November 2002 Multilingual information identification and extraction from imaged documents using optical correlator
Bruce W. Stalcup, James Brower, Lou Vaughn, Mike Vertuno
Author Affiliations +
Abstract
Most organizations usually have large archives of paper documents that they maintain. These archives typically contain valuable information and data, which are imaged to provide electronic access. However, once a document is either printed or imaged, these organizations had no efficient method of retrieving information from these documents. The only methods available to retrieve information from them were to either manually read them or to convert them to ASCII text using optical character recognition (OCR). For most of the archives with large numbers of documents, these methods are problematic. Manual searches are not feasible. OCR, on the other hand, can be CPU intensive and prone to error. In addition, for many foreign languages, OCR engines do not exist. By contrast, our system provides an innovative approach to the problem of retrieving information from imaged document archives utilizing a client/server architecture. Since its beginning in 1999, we have made significant advances in the development of a system that employs optical correlation (OC) technology (either software or hardware) to access directly the textual and graphic information contained in imaged paper documents therefore eliminating the OCR process. It provides a fast, accurate means of accessing this information directly from multilingual documents. In addition, our system can also rapidly and accurately detect the presence of duplicate documents within an archive using optical correlation techniques. In this paper, we describe the present system and selected examples of its capabilities. We also present some performance results (accuracy, speed, etc.) against test document sets.
© (2002) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Bruce W. Stalcup, James Brower, Lou Vaughn, and Mike Vertuno "Multilingual information identification and extraction from imaged documents using optical correlator", Proc. SPIE 4789, Algorithms and Systems for Optical Information Processing VI, (27 November 2002); https://doi.org/10.1117/12.453848
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Optical character recognition

Optical correlators

Image processing

Java

Visualization

Databases

Digital signal processing

RELATED CONTENT


Back to Top