The PDF file you selected should load here if your Web browser has a PDF reader plug-in installed (for example, a recent version of Adobe Acrobat Reader).

If you would like more information about how to print, save, and work with PDFs, Highwire Press provides a helpful Frequently Asked Questions about PDFs.

Alternatively, you can download the PDF file directly to your computer, from where it can be opened using a PDF reader. To download the PDF, click the Download link above.

Fullscreen Fullscreen Off


This paper presents a Hidden Markov Model (HMM) based Chunker for Punjabi. Chunking is the process of segmenting the text into syntactically correlated word groups known as chunks and then identifying the labels of the defined chunks. A robust Chunker is an important component for various applications requiring Natural Language Processing (NLP). In this research work, my goal is to develop an HMM based Chunker for Punjabi language. HMM Chunker is based on statistical probabilities. I have followed Hidden Markov Model in achieving my goal in which Viterbi Algorithm is used for calculating the highest probability of chunks and to train the system, Baum-Welch algorithm is followed and 25,000 lines of chunked Punjabi text are used. An annotated text file having 1,000 lines is used for testing the system. The accuracy of the system to find the chunk boundaries of the system is about 80% approx and the labelling is applied with an accuracy of about 98% and the labelling is applied with an accuracy of about 82%.

Keywords

Baum-Welch Algorithm, Chunking, Hidden Markov Model, Viterbi Algorithm
User