This paper presents an algorithm to identify the handwritten and the printed texts among document images. The characteristic of stroke thickness is used and a kind of calculating method is designed for this feature. The proposed method, which is clearly defined and easily realized, calculates the stroke thickness feature by counting edge pixels in a neighborhood. Document images are generally divided into text lines or characters. However, the line and the character are not conducive to the judgment between handwritten and printed text distinction. The line is too rough and the character is too small. Using the stroke thickness characteristics, combined with layout analysis, the text line in the document image is further divided into the area of uniform thickness. This kind of area is more detailed than text line and larger than a single character. So more stable features can be extracted from it. Last, the features of these regions are divided by using SVM. The proposed algorithm obtained better performance in the document image database including handwritten and printed texts.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.