Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

Word Alignment to Encourage Outsized English-Hindi Parallel Corpus


Affiliations
1 Sun Engineering College, Bhilai, India
2 Dr. CV Raman University, Bhilai, India
     

   Subscribe/Renew Journal


Proposed work gives description about methodology to understand parallel English-Hindi sentences using word alignment. It is part of natural language processing (NLP) where processing of natural language is done to increase understandability of natural language. NLP is part of artificial intelligence (A.I) to develop human intelligence of natural. Various previous works ignore word identities and consider only the sentence lengths which don’t give satisfactory point to exact identification of words, so proposed system is useful to align large outsized parallel corpus by aligning words there. Used methodology is foundation to develop the parallel English-Hindi word dictionary after syntactically and semantically analysis of the English-Hindi source text. Method of proposed system is used for the English and Hindi sentences; moreover the methodology can be used for other languages. Outsized parallel corpus of English-Hindi pair language is not frequently available. Progress is based on two strategies to solve this problem. First is normalization of tagged English sentences and Hindi sentences. Second is mapping English-Hindi sentence using parallel English-Hindi word dictionary. Fortunately word alignment is clearly known and few aligning algorithms are without restraint accessible.

Keywords

Tagging, Local Word Grouping, Word Mapping, Normalization, Part of Speech Tagging (Post), Word Dictionary, Multi Word Expressions, Mapping Score.
User
Subscription Login to verify subscription
Notifications
Font Size

Abstract Views: 200

PDF Views: 3




  • Word Alignment to Encourage Outsized English-Hindi Parallel Corpus

Abstract Views: 200  |  PDF Views: 3

Authors

Shweta Dubey
Sun Engineering College, Bhilai, India
Tarun Dhar Diwan
Dr. CV Raman University, Bhilai, India

Abstract


Proposed work gives description about methodology to understand parallel English-Hindi sentences using word alignment. It is part of natural language processing (NLP) where processing of natural language is done to increase understandability of natural language. NLP is part of artificial intelligence (A.I) to develop human intelligence of natural. Various previous works ignore word identities and consider only the sentence lengths which don’t give satisfactory point to exact identification of words, so proposed system is useful to align large outsized parallel corpus by aligning words there. Used methodology is foundation to develop the parallel English-Hindi word dictionary after syntactically and semantically analysis of the English-Hindi source text. Method of proposed system is used for the English and Hindi sentences; moreover the methodology can be used for other languages. Outsized parallel corpus of English-Hindi pair language is not frequently available. Progress is based on two strategies to solve this problem. First is normalization of tagged English sentences and Hindi sentences. Second is mapping English-Hindi sentence using parallel English-Hindi word dictionary. Fortunately word alignment is clearly known and few aligning algorithms are without restraint accessible.

Keywords


Tagging, Local Word Grouping, Word Mapping, Normalization, Part of Speech Tagging (Post), Word Dictionary, Multi Word Expressions, Mapping Score.