Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

Retriving the Exact Text Line from Handwritten Document Based on an Energy Minimization Framework for Indian Script Languages


Affiliations
1 Periyar Maniammai University, Thanjavur, India
     

   Subscribe/Renew Journal


In this project, we present algorithm for extracting text-lines from handwritten document images. Our algorithm is based on the novel approach for content aware image resizing. We adopted the signed distance transform to generate the energy map, where extreme points indicate the layout of text-lines. Dynamic programming is then used to compute the minimum energy left-to right paths, which pass along the "middle" of the text lines. Each path intersects a set of components, which determine the extracted text-line and estimate its height. The estimated height determines the text-line’s region, which guides splitting touching components among consecutive lines. Unassigned components that fall within the region of a text-line are added to the components list of the line. The components between two consecutive lines are processed when the two lines are extracted and assigned to the closest text-line, based on the attributes of extracted lines, the sizes and positions of components. Our experimental results on Tamil, Hindi, and English historical documents show that our approach manage to separate multi-skew text blocks into lines at high success rates.<

Keywords

Handwritten Document, State Estimation in Document Images, Text-Line Extraction.
User
Subscription Login to verify subscription
Notifications
Font Size

Abstract Views: 220

PDF Views: 2




  • Retriving the Exact Text Line from Handwritten Document Based on an Energy Minimization Framework for Indian Script Languages

Abstract Views: 220  |  PDF Views: 2

Authors

S. Dhivyaprabha
Periyar Maniammai University, Thanjavur, India
G. Jagajothi
Periyar Maniammai University, Thanjavur, India

Abstract


In this project, we present algorithm for extracting text-lines from handwritten document images. Our algorithm is based on the novel approach for content aware image resizing. We adopted the signed distance transform to generate the energy map, where extreme points indicate the layout of text-lines. Dynamic programming is then used to compute the minimum energy left-to right paths, which pass along the "middle" of the text lines. Each path intersects a set of components, which determine the extracted text-line and estimate its height. The estimated height determines the text-line’s region, which guides splitting touching components among consecutive lines. Unassigned components that fall within the region of a text-line are added to the components list of the line. The components between two consecutive lines are processed when the two lines are extracted and assigned to the closest text-line, based on the attributes of extracted lines, the sizes and positions of components. Our experimental results on Tamil, Hindi, and English historical documents show that our approach manage to separate multi-skew text blocks into lines at high success rates.<

Keywords


Handwritten Document, State Estimation in Document Images, Text-Line Extraction.