Open Access Open Access  Restricted Access Subscription Access

Using Sentence Simplification to Generate Paraphrase for Low Resource Punjabi Language


Affiliations
1 Research Scholar, SBBS University, Jalandhar, India
2 Associate Professor, DAV University, Jalandhar, India
 

The field of natural language processing is growing in computer science, and generating paraphrases is a difficult task, especially for languages like Hindi, Punjabi, and Urdu, which are morphologically rich and have limited resources. This research article focuses on generating paraphrases for Punjabi, a morphologically rich Indian language, using a sentence simplification approach. The author employed several sentence simplification algorithms to simplify long Punjabi sentences and used antonym-synonym replacement to generate the paraphrases. The sentence simplification component of the system achieved a precision of 100%, recall of 95%, and an f-measure of 97.43% when tested with a set of data. The developed system's performance was analyzed using various complexity measurement parameters, and it was observed that a combination of lexical and syntactic simplifications yielded the best results.

Keywords

NLP, Punjabi Language Processing, Paraphrasing, Syntactic Simplification, Lexical Simplification.
User
Notifications
Font Size

  • . Lehal, G. S. (2007). Design and implementation of Punjabi spell checker. International Journal of Systemics, Cybernetics and Informatics, 3(8), 70-75.
  • . Gill, M. S., Lehal, G. S., & Joshi, S. S. (2008). A punjabi grammar checker. In Proceedings of the Third International Joint Conference on Natural Language Processing: Volume-II.
  • . Gill, M. S., Lehal, G. S., & Joshi, S. S. (2009). Part of speech tagging for grammar checking of Punjabi. The Linguistic Journal, 4(1), 6-21.
  • . Singh, D. M. (2010). A Punjabi Morphological Analyzer and Generator. Advanced Centre for Technical Development of Punjabi Language, Literature and Culture, Punjabi University.
  • . Lehal, G. S. (2009). A Gurmukhi to Shahmukhi transliteration system. In proceedings of ICON-2009: 7th international conference on Natural Language Processing (pp. 167-173).
  • . Goyal, V., &Lehal, G. S. (2009). Hindi-Punjabi Machine Transliteration System (For Machine Translation System). George Ronchi Foundation Journal, Italy, 64(1), 2009.
  • . Josan, G. S., &Lehal, G. S. (2008, August). A Punjabi to Hindi machine translation system. In 22nd International Conference on on Computational Linguistics: Demonstration Papers (pp. 157-160). Association for Computational Linguistics.
  • . Lehal, G. S., & Singh, C. (2000, September). A Gurmukhi script recognition system. In Proceedings 15th International Conference on Pattern Recognition. ICPR-2000 (Vol. 2, pp. 557-560). IEEE.
  • . Gupta, V., &Lehal, G. S. (2012, December). Automatic Punjabi text extractive summarization system. In Proceedings of COLING 2012: Demonstration Papers (pp. 191-198).
  • . Kevin Knight and Daniel Marcu. 2000. Statisticsbased summarization-step one: Sentence compression. In Proceedings of AAAI-IAAI.
  • . Trevor Cohn and Mirella Lapata. 2008. Sentence compression beyond word deletion. In Proceedings of COLING.
  • . Katja Filippova and Michael Strube. 2008. Dependency tree based sentence compression. In Proceedings of INLG
  • . Emily Pitler. 2010. Methods for sentence compression. Technical report, University of Pennsylvania.
  • . Katja Filippova, Enrique Alfonseca, Carlos Colmenares, Lukasz Kaiser, and Oriol Vinyals. 2015. Sentence compression by deletion with LSTMs. In Proceedings of EMNLP.
  • . Kristina Toutanova, Chris Brockett, Ke M. Tran, and SaleemaAmershi. 2016. A dataset and evaluation metrics for abstractive compression of sentences and short paragraphs. In Proceedings of EMNLP.
  • . Kathleen McKeown, Sara Rosenthal, Kapil Thadani, and Coleman Moore. 2010. Time-efficient creation of an accurate sentence fusion corpus. In Proceedings of NAACL-HLT.
  • . Katja Filippova. 2010. Multi-sentence compression: Finding shortest paths in word graphs. In Proceedings of COLING.
  • . Dras, Mark. 1997a. Representing Paraphrases Using S-TAGs. Proceedings of the 35th Meeting of the Association for Computational Linguistics, 516-518.
  • . Mark Dras. 1999. Tree adjoining grammar and the reluctant paraphrasing of text. Ph.D. thesis, Macquarie University, Australia
  • . Regina Barzilay and Kathleen R McKeown. 2001. Extracting paraphrases from a parallel corpus. In Proceedings of ACL.
  • . Colin Bannard and Chris Callison-Burch. 2005. Paraphrasing with bilingual parallel corpora. In Proceedings of ACL.
  • . Sander Wubben, Antal Van Den Bosch, and Emiel Krahmer. 2010. Paraphrase generation as monolingual translation: Data and evaluation. In Proceedings of INLG.
  • . Jonathan Mallinson, Rico Sennrich, and Mirella Lapata. 2017. Paraphrasing revisited with neural machine translation. In Proceedings of EACL.
  • . AdvaithSiddharthan. 2010. Complex lexico-syntactic reformulation of sentences using typed dependency representations. In Proceedings of INLG.

Abstract Views: 218

PDF Views: 0




  • Using Sentence Simplification to Generate Paraphrase for Low Resource Punjabi Language

Abstract Views: 218  |  PDF Views: 0

Authors

Ravinder Mohan Jindal
Research Scholar, SBBS University, Jalandhar, India
Leekha Jindal
Research Scholar, SBBS University, Jalandhar, India
Sanjeev Kumar Sharma
Associate Professor, DAV University, Jalandhar, India

Abstract


The field of natural language processing is growing in computer science, and generating paraphrases is a difficult task, especially for languages like Hindi, Punjabi, and Urdu, which are morphologically rich and have limited resources. This research article focuses on generating paraphrases for Punjabi, a morphologically rich Indian language, using a sentence simplification approach. The author employed several sentence simplification algorithms to simplify long Punjabi sentences and used antonym-synonym replacement to generate the paraphrases. The sentence simplification component of the system achieved a precision of 100%, recall of 95%, and an f-measure of 97.43% when tested with a set of data. The developed system's performance was analyzed using various complexity measurement parameters, and it was observed that a combination of lexical and syntactic simplifications yielded the best results.

Keywords


NLP, Punjabi Language Processing, Paraphrasing, Syntactic Simplification, Lexical Simplification.

References