Open Access Open Access  Restricted Access Subscription Access

Quantitative Analysis of English Corpus in Tourism and Health Domain


Affiliations
1 Department of Computer Science, DAV College, Jalandhar, India
 

Statistical analysis of a language is an essential part of any of the natural language processing activity though it is translation, transliteration, summarization, lexicon formation, keyboard designs and many more. In this paper, a domain specific corpus (health and tourism) of English language provided by Computational Linguistic R & D at Special Centre for Sanskrit Studies J.N.U is analyzed statistically. The frequency analysis and word length analysis of English text is performed. Unigram, bigram, trigram and positional analysis of words has been studied.

Keywords

Corpus, English, Statistical Analysis, Quantitative Analysis, Unigram, Bigram, Trigram Introduction.
User
Notifications
Font Size

Abstract Views: 129

PDF Views: 0




  • Quantitative Analysis of English Corpus in Tourism and Health Domain

Abstract Views: 129  |  PDF Views: 0

Authors

Lalit Goyal
Department of Computer Science, DAV College, Jalandhar, India

Abstract


Statistical analysis of a language is an essential part of any of the natural language processing activity though it is translation, transliteration, summarization, lexicon formation, keyboard designs and many more. In this paper, a domain specific corpus (health and tourism) of English language provided by Computational Linguistic R & D at Special Centre for Sanskrit Studies J.N.U is analyzed statistically. The frequency analysis and word length analysis of English text is performed. Unigram, bigram, trigram and positional analysis of words has been studied.

Keywords


Corpus, English, Statistical Analysis, Quantitative Analysis, Unigram, Bigram, Trigram Introduction.