Open Access
Subscription Access
Using Sentence Simplification to Generate Paraphrase for Low Resource Punjabi Language
The field of natural language processing is growing in computer science, and generating paraphrases is a difficult task, especially for languages like Hindi, Punjabi, and Urdu, which are morphologically rich and have limited resources. This research article focuses on generating paraphrases for Punjabi, a morphologically rich Indian language, using a sentence simplification approach. The author employed several sentence simplification algorithms to simplify long Punjabi sentences and used antonym-synonym replacement to generate the paraphrases. The sentence simplification component of the system achieved a precision of 100%, recall of 95%, and an f-measure of 97.43% when tested with a set of data. The developed system's performance was analyzed using various complexity measurement parameters, and it was observed that a combination of lexical and syntactic simplifications yielded the best results.
Keywords
NLP, Punjabi Language Processing, Paraphrasing, Syntactic Simplification, Lexical Simplification.
User
Font Size
Information
- . Lehal, G. S. (2007). Design and implementation of Punjabi spell checker. International Journal of Systemics, Cybernetics and Informatics, 3(8), 70-75.
- . Gill, M. S., Lehal, G. S., & Joshi, S. S. (2008). A punjabi grammar checker. In Proceedings of the Third International Joint Conference on Natural Language Processing: Volume-II.
- . Gill, M. S., Lehal, G. S., & Joshi, S. S. (2009). Part of speech tagging for grammar checking of Punjabi. The Linguistic Journal, 4(1), 6-21.
- . Singh, D. M. (2010). A Punjabi Morphological Analyzer and Generator. Advanced Centre for Technical Development of Punjabi Language, Literature and Culture, Punjabi University.
- . Lehal, G. S. (2009). A Gurmukhi to Shahmukhi transliteration system. In proceedings of ICON-2009: 7th international conference on Natural Language Processing (pp. 167-173).
- . Goyal, V., &Lehal, G. S. (2009). Hindi-Punjabi Machine Transliteration System (For Machine Translation System). George Ronchi Foundation Journal, Italy, 64(1), 2009.
- . Josan, G. S., &Lehal, G. S. (2008, August). A Punjabi to Hindi machine translation system. In 22nd International Conference on on Computational Linguistics: Demonstration Papers (pp. 157-160). Association for Computational Linguistics.
- . Lehal, G. S., & Singh, C. (2000, September). A Gurmukhi script recognition system. In Proceedings 15th International Conference on Pattern Recognition. ICPR-2000 (Vol. 2, pp. 557-560). IEEE.
- . Gupta, V., &Lehal, G. S. (2012, December). Automatic Punjabi text extractive summarization system. In Proceedings of COLING 2012: Demonstration Papers (pp. 191-198).
- . Kevin Knight and Daniel Marcu. 2000. Statisticsbased summarization-step one: Sentence compression. In Proceedings of AAAI-IAAI.
- . Trevor Cohn and Mirella Lapata. 2008. Sentence compression beyond word deletion. In Proceedings of COLING.
- . Katja Filippova and Michael Strube. 2008. Dependency tree based sentence compression. In Proceedings of INLG
- . Emily Pitler. 2010. Methods for sentence compression. Technical report, University of Pennsylvania.
- . Katja Filippova, Enrique Alfonseca, Carlos Colmenares, Lukasz Kaiser, and Oriol Vinyals. 2015. Sentence compression by deletion with LSTMs. In Proceedings of EMNLP.
- . Kristina Toutanova, Chris Brockett, Ke M. Tran, and SaleemaAmershi. 2016. A dataset and evaluation metrics for abstractive compression of sentences and short paragraphs. In Proceedings of EMNLP.
- . Kathleen McKeown, Sara Rosenthal, Kapil Thadani, and Coleman Moore. 2010. Time-efficient creation of an accurate sentence fusion corpus. In Proceedings of NAACL-HLT.
- . Katja Filippova. 2010. Multi-sentence compression: Finding shortest paths in word graphs. In Proceedings of COLING.
- . Dras, Mark. 1997a. Representing Paraphrases Using S-TAGs. Proceedings of the 35th Meeting of the Association for Computational Linguistics, 516-518.
- . Mark Dras. 1999. Tree adjoining grammar and the reluctant paraphrasing of text. Ph.D. thesis, Macquarie University, Australia
- . Regina Barzilay and Kathleen R McKeown. 2001. Extracting paraphrases from a parallel corpus. In Proceedings of ACL.
- . Colin Bannard and Chris Callison-Burch. 2005. Paraphrasing with bilingual parallel corpora. In Proceedings of ACL.
- . Sander Wubben, Antal Van Den Bosch, and Emiel Krahmer. 2010. Paraphrase generation as monolingual translation: Data and evaluation. In Proceedings of INLG.
- . Jonathan Mallinson, Rico Sennrich, and Mirella Lapata. 2017. Paraphrasing revisited with neural machine translation. In Proceedings of EACL.
- . AdvaithSiddharthan. 2010. Complex lexico-syntactic reformulation of sentences using typed dependency representations. In Proceedings of INLG.
Abstract Views: 218
PDF Views: 0