Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

An End-to-End Trainable Capsule Network for Image-Based Character Recognition and its Application to Video Subtitle Recognition


Affiliations
1 Department of Computer Science, Biskra University, Algeria
     

   Subscribe/Renew Journal


The text presented in videos contains important information for a wide range of vision-based applications. The key modules for extracting this information include detection of text followed by its recognition, which are the subject of our study. In this paper, we propose an innovative end-to-end subtitle detection and recognition system for videos. Our system consists of three modules. Video subtitle are firstly detected by a novel image operator based on our blob extraction method. Then, the video subtitle is individually segmented as single characters by simple technique on the binary image and then passed to recognition module. Lastly, Capsule neural network (CapsNet) trained on Chars74K dataset is adopted for recognizing characters. The proposed detection method is robust and has good performance on video subtitle detection, which was evaluated on dataset we constructed. In addition, CapsNet show its validity and effectiveness for recognition of video subtitle. To the best of our knowledge, this is the first work that capsule networks have been empirically investigated for Character recognition of video subtitles.

Keywords

Capsule Networks, Convolutional Neural Networks, Subtitle Text Detection, Text Recognition.
Subscription Login to verify subscription
User
Notifications
Font Size

  • X. Chen and A.L. Yuille, “Detecting and Reading Text in Natural Scenes”, Proceedings of IEEE Computer Society Conférence on Computer Vision and Pattern Recognition, pp. 1-12, 2004.
  • C. Wolf and J.M. Jolion, “Extraction and Recognition of Artificial Text in Multimedia Documents”, Formal Pattern Analysis and Applications, Vol. 6, pp. 309-326, 2004.
  • S.M. Lucas, “Text locating Competition Results”, Proceedings of International Conference on Document Analysis and Recognition, pp. 80-84, 2005.
  • P. Viola and M. Jones, “Fast and Robust Classification using Asymmetric Adaboost and a Detector Cascade”, Advances in Neural Information Processing Systems, Vol. 14, pp. 1311-1318, 2001.
  • C. Yao, X. Bai, W. Liu, Y. Ma and Z. Tu, “Detecting Texts of Arbitrary orientations in Natural Images”, Proceedings of International Conference on Computer Vision and Pattern Recognition, pp. 1083-1090, 2012.
  • C. Yi and Y. Tian, “Text String Detection from Natural Scenes by Structure-Based Partition and Grouping”, IEEE Transactions on Image Processing, Vol. 20, pp. 2594-2605, 2011.
  • W. Huang, Z. Lin, J. Yang and J. Wang, “Text Localization in Natural Images using Stroke Feature Transform and Text Covariance Descriptors”, Proceedings of the IEEE International Conference on Computer Vision, pp. 1241-1248, 2013.
  • D. Chen, H. Bourlard and J.P. Thiran, “Text Identification in Complex Background using SVM”, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 1-14, 2001.
  • S. Sabour, N. Frosst, and G. E. Hinton, “Dynamic Routing Between Capsules”, Advances in Neural Information Processing Systems, Vol. 48, pp. 3856-3866, 2017.
  • J. Su, D.V. Vargas and K. Sakurai, “One Pixel Attack for Fooling Deep Neural Networks”, IEEE Transactions on Evolutionary Computation, Vol. 23, pp. 828-841, 2019.
  • J. Su, D.V. Vargas and K. Sakurai, “Attacking Convolutional Neural Network using Differential Evolution”, IPSJ Transactions on Computer Vision and Applications, Vol. 11, pp. 1-16, 2019.
  • G.E. Hinton, A. Krizhevsky and S.D. Wang, “Transforming Auto-Encoders”, Proceedings of International Conference on Artificial Neural Networks, pp. 44-51, 2011.
  • X. Wang, L. Huang and C. Liu, “A New Block Partitioned Text Feature for Text Verification”, Proceedings of International Conference on Document Analysis and Recognition, pp. 366-370, 2009.
  • R. Minetto, N. Thome, M. Cord, N. J. Leite and J. Stolfi, “T-HOG: An Effective Gradient-Based Descriptor for Single Line Text Regions”, Pattern Recognition, Vol. 46, pp. 1078-1090, 2013.
  • X. Ren, K. Chen, X. Yang, Y. Zhou, J. He and J. Sun, “A New Unsupervised Convolutional Neural Network Model for Chinese Scene Text Detection”, Proceedings of International Conference on Signal and Information Processing, pp. 428-432, 2015.
  • W. Huang, Y. Qiao and X. Tang, “Robust Scene Text Detection with Convolution Neural Network Induced MSER Trees”, Proceedings of European Conference on Computer Vision, pp. 497-511, 2014.
  • A. Coates, B. Carpenter, C. Case, S. Satheesh, B. Suresh, T. Wang, D. J. Wu and A. Y. Ng, “Text Detection and Character Recognition in Scene Images with Unsupervised Feature Learning”, Proceedings of International Conference on Document Analysis and Recognition, pp. 440-445, 2011.
  • B. Shi, X. Bai and C. Yao, “An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 39, pp. 2298-2304, 2016.
  • B. Shi, X. Wang, P. Lyu, C. Yao and X. Bai, “Robust Scene Text Recognition with Automatic Rectification”, Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, pp. 4168-4176, 2016.
  • M. Jaderberg, K. Simonyan, A. Vedaldi and A. Zisserman, “Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition”, Proceedings of International Conference on Computer Vision and Pattern Recognition, pp. 1-7, 2014.
  • R. Katebi, Y. Zhou, R. Chornock and R. Bunescu, “Galaxy Morphology Prediction using Capsule Networks”, Monthly Notices of the Royal Astronomical Society, Vol. 486, pp. 1539-1547, 2019.
  • A.D. Kumar, “Novel Deep Learning Model for Traffic Sign Detection using Capsule Networks”, Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, pp. 432-443, 2018.
  • M. Wang, J. Xie, Z. Tan, J. Su, D. Xiong and L. Li, “Towards Linear Time Neural Machine Translation with Capsule Networks”, Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, pp. 551-557, 2018.
  • B. Mandal, S. Dubey, S. Ghosh, R. Sarkhel and N. Das, “Handwritten Indic Character Recognition using Capsule Networks”, Proceedings of IEEE International Conference on Applied Signal Processing, pp. 304-308, 2018.
  • W. Zhao, J. Ye, M. Yang, Z. Lei, S. Zhang and Z. Zhao, “Investigating Capsule Networks with Dynamic Routing for Text Classification”, Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, pp. 881-889, 2018.
  • J. Kim, S. Jang, E. Park and S. Choi, “Text Classification using Capsules”, Neurocomputing, Vol. 376, pp. 214-221, 2020.
  • C. Xia, C. Zhang, X. Yan, Y. Chang and P. S. Yu, “Zero-Shot User Intent Detection via Capsule Neural Networks”, Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, pp. 661-673, 2018.
  • C. Zhang, Y. Li, N. Du, W. Fan and P.S. Yu, “Joint Slot Filling and Intent Detection via Capsule Neural Networks”, Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, pp. 487-494, 2018.
  • Y. Wang, A. Sun, J. Han, Y. Liu and X. Zhu, “Sentiment Analysis by Capsules”, Proceedings of IEEE International Conference on World Wide Web, pp. 1165-1174, 2018.
  • H. Chao, L. Dong, Y. Liu and B. Lu, “Emotion Recognition from Multiband EEG Signals using CapsNet”, Sensors, Vol. 19, pp. 2212-2222, 2019.
  • Y. Kim, P. Wang, Y. Zhu and L. Mihaylova, “A Capsule Network for Traffic Speed Prediction in Complex Road Networks”, Proceedings of IEEE International Conference on Sensor Data Fusion: Trends, Solutions, Applications, pp. 1-6, 2018.
  • X. Ma, H. Zhong, Y. Li, J. Ma, Z. Cui and Y. Wang, “Forecasting Transportation Network Speed using Deep Capsule Networks with Nested LSTM Models”, IEEE Transactions on Intelligent Transportation Systems (Early Access), pp. 1-12, 2020.
  • M. Kim and S. Chi, “Detection of Centerline Crossing in Abnormal Driving using CapsNet”, Journal of Supercomputing, Vol. 75, pp. 189-196, 2019.
  • T. Iqbal, Y. Xu, Q. Kong and W. Wang, “Capsule Routing for Sound Event Detection”, Proceedings of IEEE International Conference on Signal Processing, pp. 2255-2259, 2018.
  • F. Vesperini, L. Gabrielli, E. Principi and S. Squartini, “Polyphonic Sound Event Detection by using Capsule Neural Networks”, IEEE Journal of Selected Topics in Signal Processing, Vol. 13, pp. 310-322, 2019.
  • A. Pal, A. Chaturvedi, U. Garain, A. Chandra, R. Chatterjee and S. Senapati, “CapsDeMM: Capsule Network for Detection of Munro’s Microabscess in Skin Biopsy Images”, Proceedings of IEEE International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 389-397, 2018.
  • T. Iesmantas and R. Alzbutas, “Convolutional Capsule Network for Classification of Breast Cancer Histology Images”, Proceedings of IEEE International Conference on Image Analysis and Recognition, pp. 853-860, 2018.
  • S. Prakash and G. Gu, “Simultaneous Localization and Mapping with Depth Prediction using Capsule networks for UAVS”, Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, pp. 191-198, 2018.
  • L. Annabi and M. G. Ortiz, “State Representation Learning with Recurrent Capsule Networks”, Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, pp. 208-215, 2018.
  • S. Garg, J. Alexander and T. Kothari, “Using Capsule Networks with Thermometer Encoding to Defend Against Adversarial Attacks”, Available at http://cs229.stanford.edu/proj2017/final-reports/5244416.pdf, Accessed at 2017.
  • K. Duarte, Y. Rawat and M. Shah, “Videocapsulenet: A Simplified Network for Action Detection”, Advances in Neural Information Processing Systems, pp. 7610-7619, 2018.
  • S.M. Lucas, A. Panaretos, L. Sosa, A. Tang, S. Wong and R. Young, “Robust Reading Competitions”, Proceedings of International Conference on Document Analysis and Recognition, pp. 682-687, 2003.
  • T.E. De Campos, B.R. Babu and M. Varma, “Character Recognition in Natural Images”, Proceedings of International Conference on Image and Video Formation, Preprocessing and Analysis, pp. 1-8, 2009.
  • D.P. Kingma and J. Ba, “Adam: A Method for Stochastic Optimization”, Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, pp. 981-989, 2014.

Abstract Views: 224

PDF Views: 0




  • An End-to-End Trainable Capsule Network for Image-Based Character Recognition and its Application to Video Subtitle Recognition

Abstract Views: 224  |  PDF Views: 0

Authors

Ahmed Tibermacine
Department of Computer Science, Biskra University, Algeria
Selmi Mohamed Amine
Department of Computer Science, Biskra University, Algeria

Abstract


The text presented in videos contains important information for a wide range of vision-based applications. The key modules for extracting this information include detection of text followed by its recognition, which are the subject of our study. In this paper, we propose an innovative end-to-end subtitle detection and recognition system for videos. Our system consists of three modules. Video subtitle are firstly detected by a novel image operator based on our blob extraction method. Then, the video subtitle is individually segmented as single characters by simple technique on the binary image and then passed to recognition module. Lastly, Capsule neural network (CapsNet) trained on Chars74K dataset is adopted for recognizing characters. The proposed detection method is robust and has good performance on video subtitle detection, which was evaluated on dataset we constructed. In addition, CapsNet show its validity and effectiveness for recognition of video subtitle. To the best of our knowledge, this is the first work that capsule networks have been empirically investigated for Character recognition of video subtitles.

Keywords


Capsule Networks, Convolutional Neural Networks, Subtitle Text Detection, Text Recognition.

References