Open Access
Subscription Access
n-Gram Character Analysis of English Text on Domain Specific Corpus
Statistical analysis of a language is a vital part of natural language processing. It refers to a collection of methods used to process large amounts of data and report overall trends. In this paper, frequency and word length analysis of individual characters in English text is performed. Unigram, bigram, trigram and positional analysis characters in the domain specific English corpus in health domain has been studied. Miscellaneous analysis like Percentage occurrence of various numbers of distinct words and their coverage in English Corpus is studied.
Keywords
Corpus, English, Statistical Analysis, Quantitative Analysis, Unigram, Bigram, Trigram.
User
Font Size
Information
Abstract Views: 202
PDF Views: 1