Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

Automatic Separation of foreground Text from Complex Background in Color Document Images


Affiliations
1 Department of Studies in Computer Science, University of Mysore, Mysore - 570006, India
     

   Subscribe/Renew Journal


Reading of the foreground text is difficult in documents having multi colored complex background. Automatic foreground text separation in such document images is very much essential for smooth reading of the document contents. In this paper we propose a hybrid approach which combines connected component analysis and an unsupervised thresholding for separation of text from the complex background. The proposed approach identifies the candidate text regions based on edge detection followed by a connected component analysis. Because of background complexity it is also possible that a non text region may be identified as a text region. This problem is overcome by analyzing the texture features of connected components. Finally the threshold value for each detected text region is derived automatically from the data of corresponding image region to perform foreground separation. The proposed approach can handle document images with varying background of multiple colors. Also it can handle foreground text of any color, font and size. Experimental results show that the proposed algorithm detects on an average 97.8% of text regions in the source document. Readability of the extracted foreground text is illustrated through OCRing.

Keywords

Color Document Image, Complex Background, Connected Component Analysis, Text Separation, Feature Extraction, Unsupervised Thresholding, OCR.
User
Subscription Login to verify subscription
Notifications
Font Size

Abstract Views: 371

PDF Views: 2




  • Automatic Separation of foreground Text from Complex Background in Color Document Images

Abstract Views: 371  |  PDF Views: 2

Authors

N. Shivananda
Department of Studies in Computer Science, University of Mysore, Mysore - 570006, India
P. Nagabhushan
Department of Studies in Computer Science, University of Mysore, Mysore - 570006, India

Abstract


Reading of the foreground text is difficult in documents having multi colored complex background. Automatic foreground text separation in such document images is very much essential for smooth reading of the document contents. In this paper we propose a hybrid approach which combines connected component analysis and an unsupervised thresholding for separation of text from the complex background. The proposed approach identifies the candidate text regions based on edge detection followed by a connected component analysis. Because of background complexity it is also possible that a non text region may be identified as a text region. This problem is overcome by analyzing the texture features of connected components. Finally the threshold value for each detected text region is derived automatically from the data of corresponding image region to perform foreground separation. The proposed approach can handle document images with varying background of multiple colors. Also it can handle foreground text of any color, font and size. Experimental results show that the proposed algorithm detects on an average 97.8% of text regions in the source document. Readability of the extracted foreground text is illustrated through OCRing.

Keywords


Color Document Image, Complex Background, Connected Component Analysis, Text Separation, Feature Extraction, Unsupervised Thresholding, OCR.