Open Access
Subscription Access
To Find the POS Tag of Unknown Words in Punjabi Language
The accuracy of unknown words in the task of Part of Speech tagging is one significant area where there is still room for improvement. Because of their high information content, unknown words are also disproportionately important for how often they occur, and increase in number when experimenting with corpora from different domains. One area however, where all POS tagging methods suffer a significant decrease in accuracy, is with unknown words. These words are those that are seen for the first time in the testing phase of the tagger, having never appeared in the training data. In general, on POS tagging as well as other similar NLP tasks, accuracy on unknown words is about 10% less than words that have been seen in the training data (Brill, 1994). Unknown words also occur a significant amount of the time, comprising approximately 5% of a test corpus (Mikheev, 1997).
User
Font Size
Information
Abstract Views: 202
PDF Views: 1