Open Access Open Access  Restricted Access Subscription Access

An Improved Method for Document Image Binarization


Affiliations
1 Department of Computer Science and Engineering, RCC Institute of Information Technology, Kolkata - 700015, West Bengal, India
 

Handwriting analysis of document image has four parts- preprocessing, segmentation, feature extraction and classification. Image pre-processing technique is used to improve the quality of the image for easily and efficiently processing in future steps. Principal stage of image pre-processing is binarization, according to which the pixels are classified into text and background. It is a crucial stage that can affect further stages including the final character recognition stage. This paper proposed a binarization technique which is based on Otsu which has been already used for handwriting document binarization. But in order to tolerate badly degraded document images, present work proposed a binarization technique with the help of Otsu algorithm, which can segment the foreground from the background if text document is badly degraded, such as uneven illumination, image contrast variation, bleeding-through, and smear. The proposed method was tested on text image of H-DIBCO2012 and DIBCO2009. Experimental results show that proposed technique achieved a high precision that gives better result than the Otsu algorithm.

Keywords

Binarization, Gray Scale Image, Line Segment, Otsu, Threshold.
User
Notifications
Font Size

  • Bolan S, Lu S, Tan CL. Robust document image binarization technique for degraded document images. IEEE Transactions on Image Processing. 2013; 22(4):1408–17.
  • Otsu N. A threshold selection method from gray-scale histogram. IEEE Trans Systems, Man, and Cybernetics. 1978; 8:62–6.
  • Kittler J, Illingworth J. On threshold selection using clustering criteria. IEEE Trans Systems, Man, and Cybernetics. 1985; 15:652–5.
  • Lee SU, Chung SY. A comparative performance study of several global thresholding techniques for segmentation. Computer Vision, Graphics, and Image Processing. 1990; 52:171–90.
  • Bolan S, Lu S, Tan CL. Combination of document image binarization techniques. 2011 IEEE International Conference on Document Analysis and Recognition (ICDAR); Beijing. 2011 Sep 18-21. p. 22–6
  • Lu S, Su B, Tan C. Document image binarization using background estimation and stroke edges. International Journal on Document Analysis and Recognition. 2010 Dec; 13:303–14.
  • DIBCO 2009 (Document Image Binarization Contest) image dataset.
  • H-DIBCO 2012 (Handwritten Document Image Binarization Contest) image dataset.
  • Gill TK. Document image binarization techniques- a review. International Journal of Computer Applications. 2014 Jul; 98(12).
  • Shaikh SH, Maiti A, Chaki N. Image binarization using iterative partitioning: A global thresholding approach. International Conference on IEEE Recent Trends in Information Systems (ReTIS); Kolkata. 2011 Dec 21-23. p. 281–6.
  • Gupta MR, Jacobson NP, Garcia EK. OCR binarization and image pre-processing for searching historical documents. The Journal of the Pattern Recognition Society. 2007; 40(2):389–97.
  • Bal A, Saha R. An efficient method for skew normalization of handwriting image. 6th IEEE International Conference on Communication Systems and Network Technologies; Chandigarh. 2016. p. 222–8. ISBN: 978-1-4673-9950-0.
  • Bal A, Saha R. An improved method for text segmentation and skew normalization of handwriting image. 4th Springer International Conference on Advanced Computing, Networking, and Informatics (ICACNI-2016); India: National Institute of Technology Rourkela. 2016 Sep 22-24. ISSN: 1876-1100.
  • Niblack W. An introduction to digital image processing. Englewood Cliffs: Prentice Hall; 1986.
  • Sauvola J, Seppanen T, Haapakoski S, Pietikainen M. Adaptive document binarization. 4th Int Conf on Document Analysis and Recognition; Ulm, Germany. 1997. p. 147–52.

Abstract Views: 884

PDF Views: 385




  • An Improved Method for Document Image Binarization

Abstract Views: 884  |  PDF Views: 385

Authors

Nilima Paul
Department of Computer Science and Engineering, RCC Institute of Information Technology, Kolkata - 700015, West Bengal, India
Harinandan Tunga
Department of Computer Science and Engineering, RCC Institute of Information Technology, Kolkata - 700015, West Bengal, India

Abstract


Handwriting analysis of document image has four parts- preprocessing, segmentation, feature extraction and classification. Image pre-processing technique is used to improve the quality of the image for easily and efficiently processing in future steps. Principal stage of image pre-processing is binarization, according to which the pixels are classified into text and background. It is a crucial stage that can affect further stages including the final character recognition stage. This paper proposed a binarization technique which is based on Otsu which has been already used for handwriting document binarization. But in order to tolerate badly degraded document images, present work proposed a binarization technique with the help of Otsu algorithm, which can segment the foreground from the background if text document is badly degraded, such as uneven illumination, image contrast variation, bleeding-through, and smear. The proposed method was tested on text image of H-DIBCO2012 and DIBCO2009. Experimental results show that proposed technique achieved a high precision that gives better result than the Otsu algorithm.

Keywords


Binarization, Gray Scale Image, Line Segment, Otsu, Threshold.

References