Paper
23 September 1999 Geometrical approach to skew detection for documents containing the Latin/Cyrillic characters
Oleg G. Okun
Author Affiliations +
Abstract
Document skew is a distortion mainly concerning the orientation of text lines and occurring when digitizing the paper documents. Its visual effect is a slope of text lines, which are normally horizontal for such scripts as Latin or Cyrillic, with respect to the X-axis. Many available document recognition systems, however, require properly aligned text liens for accurate text segmentation and recognition. It means that the skew, if present, should be estimated and compensated before further processing. The Hough transform is one of the popular techniques for skew detection. To lower its computational cost, it is usually applied to a small number of representative points of each character or its bounding box. However, a problem with this method is that different characters have different heights. As a result, the representative points of characters belonging to the same line often do not fit well to a straight line and this often leads to errors in skew detection by using the Hough transform. In this paper, we propose a new algorithm to overcome this problem. It only uses the bounding boxes of the connected components of characters and a number of simple tests in order to obtain the skew angle estimation.
© (1999) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Oleg G. Okun "Geometrical approach to skew detection for documents containing the Latin/Cyrillic characters", Proc. SPIE 3811, Vision Geometry VIII, (23 September 1999); https://doi.org/10.1117/12.364111
Lens.org Logo
CITATIONS
Cited by 2 scholarly publications.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Hough transforms

Algorithm development

Image processing

Visualization

Binary data

Chromium

Distortion

RELATED CONTENT


Back to Top