Open Access Open Access  Restricted Access Subscription Access

n-Gram Character Analysis of English Text on Domain Specific Corpus


Affiliations
1 Department of Computer Science, DAV College, Jalandhar, India
 

Statistical analysis of a language is a vital part of natural language processing. It refers to a collection of methods used to process large amounts of data and report overall trends. In this paper, frequency and word length analysis of individual characters in English text is performed. Unigram, bigram, trigram and positional analysis characters in the domain specific English corpus in health domain has been studied. Miscellaneous analysis like Percentage occurrence of various numbers of distinct words and their coverage in English Corpus is studied.

Keywords

Corpus, English, Statistical Analysis, Quantitative Analysis, Unigram, Bigram, Trigram.
User
Notifications
Font Size

Abstract Views: 202

PDF Views: 1




  • n-Gram Character Analysis of English Text on Domain Specific Corpus

Abstract Views: 202  |  PDF Views: 1

Authors

Lalit Goyal
Department of Computer Science, DAV College, Jalandhar, India

Abstract


Statistical analysis of a language is a vital part of natural language processing. It refers to a collection of methods used to process large amounts of data and report overall trends. In this paper, frequency and word length analysis of individual characters in English text is performed. Unigram, bigram, trigram and positional analysis characters in the domain specific English corpus in health domain has been studied. Miscellaneous analysis like Percentage occurrence of various numbers of distinct words and their coverage in English Corpus is studied.

Keywords


Corpus, English, Statistical Analysis, Quantitative Analysis, Unigram, Bigram, Trigram.