Open Access Open Access  Restricted Access Subscription Access

An Automated Error Detection System for Indian Language Using Statistical Approach


Affiliations
1 Assistant Professor, Department of Computer Science and Applications, Maharishi Markandeshvar Engineering college, Mullana, Ambala, India
2 Research Scholar, Department of Computer Science and Applications, DAV University, Jalandhar, India
3 Associate Professor, Department of Computer Science and Applications, DAV University, Jalandhar, India
 

Grammatical error detection system also called grammar checker or syntactic analyzer is one of the advance tool for natural language processing. This tool plays an important role in proof reading and for development of many other natural language processing applications like machine translation, summarization, question answering system etc. In this research article, we proposed a framework for detection of grammatical error using statistical approach. Further in statistical approach, we used N-gram approach for detection of the grammatical errors. Corpus used for generation of n-grams is taken from Indian Languages Corpora Initiative. This corpus is annotated by using morphological analyzer followed by part of speech tagger. Bi-gram, tri-gram and quad gram of part of speech tags are generated by using the annotated corpus. On testing the proposed algorithm on self-generated test data for Punjabi language, Overall accuracy was 100 percent, recall was 87.2, and the f-measure was 93.16,according to us.

Keywords

Error Detection System, NLP, N-Gram, Syntactic Analyzer, Morphological Analyzer, POS Tagger.
User
Notifications
Font Size

  • .Bernth, A.: EasyEnglish: a tool for improving document quality. In: 5th Proceedings on Conference on Applied NLP natural language processing. ACL (Assoc. for Computational Linguistics), pp. 159-165. (1997).
  • .Martins, R. T., Hasegawa, R., Montilha, G., & De Oliveira, O. N.: Linguistic issues in the development of ReGra: A grammar checker for Brazilian Portuguese. Natural Language Engineering, 4(04), 287-307 (1998).
  • .Alam, M. J., Mumit, K., & Naushad, U.: N-gram based Statistical Grammar Checker for Bangla and English. In: 9 th International Proc. on Computer and IT (ICCIT), (2006).
  • .Bigert, J., Kann, V., Knutsson, O., & Sjobergh, J.: Swedish Grammar checking for second language learners, 33-47(2004).
  • .Ehsan, N., & Faili, H.: Towards grammar checker development for Persian language. IEEE International Conference on Natural Language Processing and Knowledge Engineering (NLP-KE), 2010. pp. 1-8(2010).
  • .Temesgen, A., & Assabie, Y.: Development of Grammar Checker for Amharic Using Morphological Features of Words and N-Gram Based Probabilistic Methods, IWPT (2013).
  • .Henrich, V.: LIS Grammar Checker: Statistical Language Independent Grammar Checking(Doctoral dissertation, Reykjavík Univ.) (2009).
  • .Hein, A. S.: A Grammar Checking Chart-Based Framework for Initial Studies. In: Proc. of 11th Nordic Conference in CL Computational Linguistic, pp. 68-80 (1998).
  • .Schmidt, W.A.: German Grammar and style checking. In: Proceedings of CLAW, Vol. 98,(1998).
  • . Ravin, Y.: Grammar Errors and Weaknesses in Style in Text-Critiquing System. In Natural Language Processing: The PLNLP Approach. Springer US, 65-76 (1993).
  • . Young, S.C.: Improvement of Korean Proofreading System Using Corpus and CollocationRules. Language, pp. 328-333 (1998).
  • . Carlberger, J., Kann, V., Domeij, R., & Knutsson, O.: A grammar checker for Swedish.Submitted to Computational. Linguistics, oktober (2002).
  • . Carlberger, J., Kann, V., Domeij, R., & Knutsson, O.: Swedish grammar checker development and performance: A language engineering perspective. Natural languageengineering, 1(1) (2004).
  • . Kabir, H., Zaman, J., Nayyer, S., & Hussain, S.: Two Pass Parsing Implementation GrammarChecker for Urdu. In: Proceedings of International Multi Topic Conference. Abstracts. INMIC 2002, pp. 51-51, IEEE, (2002).
  • . Naber, D.: A style and grammar checker as rule-based. Thesis, Technical Faculty, Universityof Bielefeld, Germany, (2003).
  • . Rider, Z.: POS tagging Grammar checking using rules matching. In: Proceedings of Conference on Class of 2005 on NLP Natural Language Processing. (2005).
  • . Tesfaye, D.: An Afan Oromo Grammar rule-based Checker. IJACSA Editorial.(2011).
  • . Jiang, Y., Wang, T., Lin, T., Wang, F., Cheng, W., Liu, X., & Zhang, W.: A Chinese spelling and rule based grammar detection system utility. In: Proceedings of IEEE International Conference on SSE System Science and Engineering (ICSSE), pp. 437-440. (2012).
  • . Kasbon, R., Mahamad, S., Amran, N., & Mazlan, E.: Language sentence checker for Malay. World Appl. Sci. J.(Special Issue on Computer Applications and Knowledge Management), 12, 19-25 ( 2011).
  • . Gill, M. S., & Lehal, G. S.: A Punjabi grammar checking system. In: Proceedings of 22nd International Conference on CL Computational Linguistics: Demonstration Papers. ACL, Association for Computational Linguistics. pp. 149-152 (2008).
  • . Kinoshita, J., Menezes, C. E. D., & Salvador, L. N.: CoGrOO: a Portuguese - Brazilian CETENFOLHA Corpus based Grammar checker. In: Proceedings of 5th international conference on LRE, Language Resources and Evaluation, LREC. (2006).
  • . Bopche, L., Kshirsagar, M., & Dhopavkar, G.: Rule Based Morphological Process GrammarChecking System for an Indian Language. In: Proceedings of 4th International Conference on GTISSA, Global Trends in Information Systems and Software Applications. (2011).
  • . Nazar, R., & Renau, I.: N-gram corpus grammar checker for Google books. In: Proceedings of 2nd Workshop on CLW, Computational Linguistics and Writing. Cognitive and Linguistic Aspects of Document Engineering and Document Creation. Association for ComputationalLinguistics, pp. 27-34. (2012).
  • . Gill, M. S., Joshi, S. S., & Lehal, G. S.: POS Part of speech tagging for Punjabi grammar checking. The Linguistic Journal, 4(1), 6-21(2009),
  • . Ghosh, S., & Kristensson, P. O.: Text Correction using neural networks and completion in keyboard decoding, arXiv preprint arXiv: 1709.06429.(2017).
  • . Smith, A.; Recurrent neural networks grammar inference. Department of Computer Sc., University of San Diego, California, www. cse. ucsd. edu/~ atsmith. (2003).
  • . Huang, S., & Wang, H.:Bi-LSTM Chinese grammatical error diagnosis using neural networks. In: Proceedings of 3rd Workshop on NLP Natural Language Processing Techniques for Educational Applications (NLPTEA2016), pp. 148-154. (2016).
  • . Lewis, G.: Recurrent Neural Networks and Sentence Correction. Department of Computer Sc., Stanford University. (2016).
  • . Gudmundsson, J., & Menkes, F.: Natural Language Processing using Swedish using LSTM Long Short-term Memory Neural Networks: A ML-powered Grammar and Spell-checker for the Swedish Language. (2018).

Abstract Views: 118

PDF Views: 0




  • An Automated Error Detection System for Indian Language Using Statistical Approach

Abstract Views: 118  |  PDF Views: 0

Authors

Misha Mittal
Assistant Professor, Department of Computer Science and Applications, Maharishi Markandeshvar Engineering college, Mullana, Ambala, India
Vikas Verma
Research Scholar, Department of Computer Science and Applications, DAV University, Jalandhar, India
S.K. Sharma
Associate Professor, Department of Computer Science and Applications, DAV University, Jalandhar, India

Abstract


Grammatical error detection system also called grammar checker or syntactic analyzer is one of the advance tool for natural language processing. This tool plays an important role in proof reading and for development of many other natural language processing applications like machine translation, summarization, question answering system etc. In this research article, we proposed a framework for detection of grammatical error using statistical approach. Further in statistical approach, we used N-gram approach for detection of the grammatical errors. Corpus used for generation of n-grams is taken from Indian Languages Corpora Initiative. This corpus is annotated by using morphological analyzer followed by part of speech tagger. Bi-gram, tri-gram and quad gram of part of speech tags are generated by using the annotated corpus. On testing the proposed algorithm on self-generated test data for Punjabi language, Overall accuracy was 100 percent, recall was 87.2, and the f-measure was 93.16,according to us.

Keywords


Error Detection System, NLP, N-Gram, Syntactic Analyzer, Morphological Analyzer, POS Tagger.

References