Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

Sentence Boundary Detection Using Maximum Entropy Model


Affiliations
1 Deptt. of Engineering, Dr. C. V. Raman University, Bilaspur (C.G), India
2 Dr. C. V. Raman University, Bilaspur (C.G), India
3 Sagar Institute of Sciences and Technology, Bhopal, India
     

   Subscribe/Renew Journal


Sentence boundary detection system has three independent applications (Rule-based, HMM, and Maximum Entropy). Maximum Entropy Model is the central part of this system, which achieved an error rate less than 2% on part of the Wall Street Journal (WSJ) Corpus with only eight binary features. The performance of the three applications is illustrated and discussed. Sentence boundary disambiguation is the task of identifying the sentence elements within a paragraph or an article. Because the sentence is the basic textual unit immediately above the word and phrase, Sentence Boundary Disambiguation (SBD) is one of the essential problems for many applications of Natural Language Processing – Parsing, Information Extraction, Machine Translation, and Document Summarizations. The accuracy of the SBD system will directly affect the performance of these applications. However, the past research work in this field has already achieved very high performance, and it is not very active now. The problem seems too simple to attract the attention of the researchers.

Keywords

Sentence Boundary Disambiguation, Maximum Entropy Model, Features, Generalized Iterative Scaling, Hidden Markov Model.
User
Subscription Login to verify subscription
Notifications
Font Size

Abstract Views: 174

PDF Views: 1




  • Sentence Boundary Detection Using Maximum Entropy Model

Abstract Views: 174  |  PDF Views: 1

Authors

Tarun Dhar Diwan
Deptt. of Engineering, Dr. C. V. Raman University, Bilaspur (C.G), India
Priti Verma
Dr. C. V. Raman University, Bilaspur (C.G), India
Kamal Mehta
Sagar Institute of Sciences and Technology, Bhopal, India

Abstract


Sentence boundary detection system has three independent applications (Rule-based, HMM, and Maximum Entropy). Maximum Entropy Model is the central part of this system, which achieved an error rate less than 2% on part of the Wall Street Journal (WSJ) Corpus with only eight binary features. The performance of the three applications is illustrated and discussed. Sentence boundary disambiguation is the task of identifying the sentence elements within a paragraph or an article. Because the sentence is the basic textual unit immediately above the word and phrase, Sentence Boundary Disambiguation (SBD) is one of the essential problems for many applications of Natural Language Processing – Parsing, Information Extraction, Machine Translation, and Document Summarizations. The accuracy of the SBD system will directly affect the performance of these applications. However, the past research work in this field has already achieved very high performance, and it is not very active now. The problem seems too simple to attract the attention of the researchers.

Keywords


Sentence Boundary Disambiguation, Maximum Entropy Model, Features, Generalized Iterative Scaling, Hidden Markov Model.