Open Access Open Access  Restricted Access Subscription Access

To Find the POS Tag of Unknown Words in Punjabi Language


Affiliations
1 B.I.S College of Engineering and Technology, Moga – 142001, India
2 LPU, Jalandhar, India
3 B.I.S College of Engineering and Technology, India
 

The accuracy of unknown words in the task of Part of Speech tagging is one significant area where there is still room for improvement. Because of their high information content, unknown words are also disproportionately important for how often they occur, and increase in number when experimenting with corpora from different domains. One area however, where all POS tagging methods suffer a significant decrease in accuracy, is with unknown words. These words are those that are seen for the first time in the testing phase of the tagger, having never appeared in the training data. In general, on POS tagging as well as other similar NLP tasks, accuracy on unknown words is about 10% less than words that have been seen in the training data (Brill, 1994). Unknown words also occur a significant amount of the time, comprising approximately 5% of a test corpus (Mikheev, 1997).
User
Notifications
Font Size

Abstract Views: 201

PDF Views: 1




  • To Find the POS Tag of Unknown Words in Punjabi Language

Abstract Views: 201  |  PDF Views: 1

Authors

Blossom Manchanda
B.I.S College of Engineering and Technology, Moga – 142001, India
Mr. Ravishanker
LPU, Jalandhar, India
Sanjeev Kumar Sharma
B.I.S College of Engineering and Technology, India

Abstract


The accuracy of unknown words in the task of Part of Speech tagging is one significant area where there is still room for improvement. Because of their high information content, unknown words are also disproportionately important for how often they occur, and increase in number when experimenting with corpora from different domains. One area however, where all POS tagging methods suffer a significant decrease in accuracy, is with unknown words. These words are those that are seen for the first time in the testing phase of the tagger, having never appeared in the training data. In general, on POS tagging as well as other similar NLP tasks, accuracy on unknown words is about 10% less than words that have been seen in the training data (Brill, 1994). Unknown words also occur a significant amount of the time, comprising approximately 5% of a test corpus (Mikheev, 1997).