Open Access Open Access  Restricted Access Subscription Access

Neural Machine Translation for English-Malayalam


Affiliations
1 AU-KBC Research Centre, MIT Campus of Anna University., India
 

Neural Machine Translation systems produce state-of-art translation for high resource languages. It is yet a challenge in low-resource and morphologically rich languages. In this paper, we have discussed the existing techniques in handling the morphologically rich and low-resource languages and presented our experiments on developing English-Malayalam NMT system where we have processed the data using different techniques namely word segmentation using morphological analyser and applying Byte pair Encoding (BPE) technique. The results show a significant improvement by implementing the word segmentation using morphological analyser.

Keywords

Neural Machine Translation, Morphologically rich languages, Morph segmentation, Byte Pair Encoding.
User
Notifications
Font Size

  • Bahdanau D., Cho K., and Bengio Y (2014) Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473
  • Banerjee, A.,Jain A., Mhaskar S., Deoghare S,D. Sehgal A., and Bhattacharya, P. (2021). Neural Machine Translation in Low-Resource Setting: a Case Study in English-Marathi Pair. In Proceedings of the 18th Biennial Machine Translation Summit - Volume 1: Research Track, MTSummit 2021 Virtual, pp 35-47
  • Cho, K., van Merrienboer, B., Gulcehre, C., Bougares, F., Schwenk, H., and Bengio, Y. (2014).Learning phrase representations using RNN encoder-decoder for statistical machine translation. In Proceedings of the Empiricial Methods in Natural Language Processing (EMNLP 2014).
  • Dewangan, S., Alva, S., Joshi, N., Bhattacharyya, P. (2021). Experience of neural machine translation between Indian languages. Machine Translation 35, 71–99
  • Dominik Macháček, Jonáš Vidra, Ondřej Bojar (2018): Morphological and LanguageAgnostic Word Segmentation for NMT. In: Proceedings of the 21st International Conference on Text, Speech and Dialogue—TSD 2018, pp. 277-284, Springer-Verlag, Cham, Switzerland, ISBN 978-3-030-00794-2
  • Goyal, Vikrant and Kumar, Sourav and Sharma, Dipti Misra. (2020). Efficient Neural Machine Translation for Low-Resource Languages via Exploiting Related Languages. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop, pp 162-168
  • Hema Ala, Vandan Mujadia, Dipti Misra Sharma. (2021). Domain Adaptation for HindiTelugu Machine Translation Using Domain Specific Back Translation. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021), pp 26-34
  • Kalchbrenner, N. and Blunsom, P. (2013). Recurrent continuous translation models. In Proceedings of the ACL Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1700–1709. Association for Computational Linguistics.
  • Kim, Y., Petrov, P., Petrushkov, P., Khadivi, S., and Ney, H. (2019). Pivot-based transfer learning for neural machine translation between non-English languages. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Proˇcessing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 866– 876, Hong Kong, China. Association for Computaˇtional Linguistics
  • Klein G., Hernandez F., Nguyen V., and Senellart J. (2020) The opennmt neural machine translation toolkit: 2020 edition. In Proceedings of the 14th Conference of the Association for Machine Translation in the Americas (AMTA 2020), pages 102–109.
  • Koneru, Sai; Liu, Danni; Niehues, Jan. (2021). Unsupervised Machine Translation On Dravidian Languages, In 16th conference of the European Chapter of the Association for Computational Linguistics (EACL), Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages.
  • Lakshmi S., and Sobha Lalitha Devi (2013).”Malayalam Morphological Analyser”, In processings of International Seminar on Current Trends in Dravidian Linguistics, May 27-29, 2013
  • Laskar SR., Paul B., Adhikary PK, Pakray P., Bandyopadhyay S. (2021), Neural Machine Translation for Tamil–Telugu Pair. In Proceedings of the Sixth Conference on Machine Translation (WMT), pages 284–287
  • Luong M., Pham H., and Manning D. (2015). Effective approaches to attention-based neural machine translation. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pages 1412–1421.
  • Mujadia V. and Dipti Sharma. (2020) NMT based Similar Language Translation for Hindi - Marathi. In Proceedings of the Fifth Conference on Machine Translation, pages 414–417, Online. Association for Computational Linguistics.
  • Papineni K., Roukos S., Ward T., and Zhu W (2002) Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pages 311–318.
  • Ranathunga, Surangika, En-Shiun Annie Lee, Marjana Prifti Skenduli, Ravi Shekhar, Mehreen Alam, and Rishemjit Kaur. 2021. Neural machine translation for low-resource languages: A survey. CoRR, abs/2106.15115.
  • Saldanha R., Ananthanarayana V. S and Anand Kumar M and Parameswari K. (2021) NITK-UoH: Tamil-Telugu Machine Translation Systems for the WMT21 Similar Language Translation Task. In Proceedings of the Sixth Conference on Machine Translation (WMT), pages 299–303
  • Sennrich R., Haddow B., and Birch A. (2016) Neural machine translation of rare words with subword units. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1715–1725.
  • Sennrich, R., Haddow, B., and Birch, A. (2016). Improving neural machine translation models with monolingual data. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 86–96, Berlin, Germany. Association for Computational Linguistics.
  • Sutskever, I., Vinyals, O., and Le, Q. (2014). Sequence to sequence learning with neural networks. In Advances in Neural Information Processing Systems (NIPS 2014)
  • Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, U.; Polosukhin, I. (2017) Attention is All You Need. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9
  • Zhao, Y., Y. Wang, J. Zhang, and C. Zong (2018). Phrase table as recommendation memory for neural machine translation. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI 2018, July 13-19, 2018, Stockholm, Sweden., pp. 4609–4615.

Abstract Views: 90

PDF Views: 0




  • Neural Machine Translation for English-Malayalam

Abstract Views: 90  |  PDF Views: 0

Authors

Vijay Sundar Ram R
AU-KBC Research Centre, MIT Campus of Anna University., India
Sobha Lalitha Devi
AU-KBC Research Centre, MIT Campus of Anna University., India

Abstract


Neural Machine Translation systems produce state-of-art translation for high resource languages. It is yet a challenge in low-resource and morphologically rich languages. In this paper, we have discussed the existing techniques in handling the morphologically rich and low-resource languages and presented our experiments on developing English-Malayalam NMT system where we have processed the data using different techniques namely word segmentation using morphological analyser and applying Byte pair Encoding (BPE) technique. The results show a significant improvement by implementing the word segmentation using morphological analyser.

Keywords


Neural Machine Translation, Morphologically rich languages, Morph segmentation, Byte Pair Encoding.

References