KEYWORDS: Receivers, Feature extraction, Machine learning, Document management, Rule based systems, Data modeling, Computer networks, Computing systems, Data archive systems, Chemical elements
Current systems for automatic extraction of index terms from business documents either take a rule-based
or training-based approach. As both approaches have their advantages and disadvantages it seems natural to
combine both methods to get the best of both worlds. We present a combination method with the steps selection,
normalization, and combination based on comparable scores produced during extraction. Furthermore, novel
evaluation metrics are developed to support the assessment of each step in an existing extraction system. Our
methods were evaluated on an example extraction system with three individual extractors and a corpus of 12,000
scanned business documents.
Archiving official written documents such as invoices, reminders and account statements in business and private
area gets more and more important. Creating appropriate index entries for document archives like sender's name,
creation date or document number is a tedious manual work. We present a novel approach to handle automatic
indexing of documents based on generic positional extraction of index terms. For this purpose we apply the
knowledge of document templates stored in a common full text search index to find index positions that were
successfully extracted in the past.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.