Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

Deep Bidirectional RNNs Using Gated Recurrent Units & Long Short-Term Memory Units for Building Acoustic Models for Automatic Speech Recognition


Affiliations
1 Sardar Patel Institute of Technology, Andheri, Mumbai, Maharashtra, India
     

   Subscribe/Renew Journal


Deep Neural Networks are gaining popularity to train speech dataset for speech recognition. A lot of work has been done with various neural network models, starting right from conventional convolutional neural networks to deep recurrent neural networks. Research has led us to arrive at the conclusion that bidirectional RNNs are suited for speech recognition. It has been seen that bidirectional RNNs provide greater accuracy as compared to deep RNNs and unidirectional RNNs. Units that are used with bidirectional RNNs are usually Long Short-Term Memory units. They have their own advantages and disadvantages. Gated Recurrent Units can also be used. In this paper we have tried to experiment and compare between deep bidirectional models using GRU units and LSTM units.

Keywords

Acoustic Modeling, Automatic Speech Recognition, Bidirectional RNN, Convolutional Neural Networks, Deep Recurrent Neural Networks, Gated Recurrent Unit, Keras, Long Short-Term Memory (LSTM), MFCC, Recurrent Neural Networks, TimeDistributed Dense, TensorFlow, Spectrogram.
User
Subscription Login to verify subscription
Notifications
Font Size

  • S. Hochreiter, and J. Schmidhuber, “Long short-term memory,” Neural Computation, vol. 9, no. 8, pp. 1735-1780, 1997.
  • M. Schuster, and K. K. Paliwal, “Bidirectional recurrent neural networks,” IEEE Transactions on Signal Processing, vol. 45, no. 11, pp. 2673-2681, 1997.
  • T. Mikolov, A. Deoras, D. Povey, L. Burget, and J. Cernocky, “Strategies for training large scale neural network language models,” in 2011 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU), 2011.
  • A.-R. Mohamed, F. Seide, D. Yu, J. Droppo, A. Stoicke, G. Zweig, G. Penn, “Deep bi-directional recurrent networks over spectral windows,” in 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), 2015.
  • X. Li, and X. Wu, “Constructing long short-term memory based deep recurrent neural networks for large vocabulary speech recognition,” in 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2015.
  • X. Li, and X. Wu, “Improving long short-term memory networks using maxout units for large vocabulary speech recognition,” in 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2015.
  • H.-K. J. Kuo, E. Arisoy, A. Emami, and P. Vozila, “Large scale hierarchical neural network language models,” in Proceedings of Interspeech, Portland, Oregon, USA, 2012.
  • E. Arisoy, A. Sethy, B. Ramabhadran, and S. Chen, “Bidirectional recurrent neural network language models for automatic speech recognition,” in 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2015.
  • A. Graves, N. Jaitly, and A.-R. Mohamed, “Hybrid speech recognition with deep bidirectional LSTM,” in 2013 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp. 273-278, 2013.
  • A. Zeyer, P. Doetsch, P. Voigtlaender, R. Schluter, and H. Ney, “A comprehensive study of deep bidirectional LSTM RNNs for acoustic modeling in speech recognition,” in ICASSP 2017 Conference, IEEE, 2017.
  • A. Graves, A.-R. Mohamed, and G. Hinton, “Speech recognition with deep recurrent neural networks,” in 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2013.
  • J. Chung, C. Gulcehre, K. H. Cho, and Y. Bengio, “Empirical evaluation of gated recurrent neural networks on sequence modeling,”. Available at arXiv:1412.3555 [cs.NE]
  • Z. Wu, and S. King, “Investigating gated recurrent networks for speech synthesis,” in 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2016.
  • V. Panayotov, G. Chen, D. Povey, and S. Khudanpur, “Librispeech: An ASR corpus based on public domain audio books,” in 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2015.
  • H. Sak, A. Senior, and F. Beaufays, “Long short-term memory based recurrent neural network architectures for large vocabulary speech recognition,”. Available at arXiv:1402.1128 [cs.NE]
  • M. Sundermeyer, R. Schluter, and H. Neym, “LSTM neural networks for language modeling,” in Proceedings of Interspeech, 2012.
  • D. P. Kingma, and J. Ba, “Adam: A method for stochastic optimization,” in 3rd International Conference for Learning Representations (ICLR), 2015. Available at arXiv:1412.6980 [cs.LG]
  • A. Graves, S. Fernandez, F. Gomez, and J. Schmidhuber, “Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks,” in Proceedings of the 23rd International Conference on Machine Learning, Pittsburgh, PA, 2006.

Abstract Views: 574

PDF Views: 1




  • Deep Bidirectional RNNs Using Gated Recurrent Units & Long Short-Term Memory Units for Building Acoustic Models for Automatic Speech Recognition

Abstract Views: 574  |  PDF Views: 1

Authors

Madhuri Jain
Sardar Patel Institute of Technology, Andheri, Mumbai, Maharashtra, India
Nishita Dutta
Sardar Patel Institute of Technology, Andheri, Mumbai, Maharashtra, India
Dnyaneshwari Bhirud
Sardar Patel Institute of Technology, Andheri, Mumbai, Maharashtra, India
Nikahat Mulla
Sardar Patel Institute of Technology, Andheri, Mumbai, Maharashtra, India

Abstract


Deep Neural Networks are gaining popularity to train speech dataset for speech recognition. A lot of work has been done with various neural network models, starting right from conventional convolutional neural networks to deep recurrent neural networks. Research has led us to arrive at the conclusion that bidirectional RNNs are suited for speech recognition. It has been seen that bidirectional RNNs provide greater accuracy as compared to deep RNNs and unidirectional RNNs. Units that are used with bidirectional RNNs are usually Long Short-Term Memory units. They have their own advantages and disadvantages. Gated Recurrent Units can also be used. In this paper we have tried to experiment and compare between deep bidirectional models using GRU units and LSTM units.

Keywords


Acoustic Modeling, Automatic Speech Recognition, Bidirectional RNN, Convolutional Neural Networks, Deep Recurrent Neural Networks, Gated Recurrent Unit, Keras, Long Short-Term Memory (LSTM), MFCC, Recurrent Neural Networks, TimeDistributed Dense, TensorFlow, Spectrogram.

References