Open Access Open Access  Restricted Access Subscription Access

Real Time Static and Dynamic Sign Language Recognition Using Deep Learning


Affiliations
1 Department of Computer Technology, MIT, Anna University, Chennai 600 044, Tamil Nadu, India
2 Department of Information Technology, MIT, Anna University, Chennai 600 044, Tamil Nadu, India
 

Sign language recognition systems are used for enabling communication between deaf-mute people and normal user. Spatial localization of the hands could be a challenging task when hands-only occupies 10% of the entire image. This is overcome by designing a real-time efficient system that is capable of performing the task of extraction, recognition, and classification within a single network with the use of a deep convolution network. The recognition is performed for static image dataset with a simple and complex background, dynamic video dataset. Static image dataset is trained and tested using a 2D deep-convolution neural network whereas dynamic video dataset is trained and tested using a 3D deep-convolution neural network. Spatial augmentation is done to increase the number of images of static dataset and key-frame extraction to extract the key-frames from the videos for dynamic dataset. To improve the system performance and accuracy Batch-Normalization layer is added to the convolution network. The accuracy is nearly 99% for dataset with a simple background, 92% for dataset with complex background, and 84% for the video dataset. By obtaining a good accuracy, the system is proved to be real-time efficient in recognizing and interpreting the sign language gestures.

Keywords

Deaf-Mute People, Human-Machine Interaction, Inception Deep-Convolution Network, Key Frame Extraction, Video Analytics.
User
Notifications
Font Size

  • Bhatt R, Fernandes N & Dhage A, Vision based hand gesture recognition for human computer interaction, Int J Innov Sci Eng Technol, 2 (2013) 110–114.
  • Rautaray S S & Agrawal A, Vision based hand gesture recognition for human computer interaction: A survey, Artif Intell Rev, 43 (2015) 1–54.
  • Starner T E, Visual Recognition of American Sign Language Using Hidden Markov Models, MS dissertation, Massachusetts Institute of Technology, USA, 1995.
  • Anjo M D S, Pizzolato E B & Feuerstack S, A real-time system to recognize static gestures of Brazilian sign language (libras) alphabet using Kinect, Brazilian Symp on Human Factors in Computer Systems (Brazil) 2012, 259–268.
  • Huang J, Zhou W, Li H & Li W, Sign language recognition using 3D convolutional neural networks, Proc IEEE Int Conf Multimedia Expo (Torino, Italy) 2015, 1–6.
  • Masood S, Srivastava A, Thuwal H C & Ahmad M, Real-time sign language gesture (word) recognition from video sequences using CNN and RNN, Proc Int Conf Front Intell Comput; Theory Appl (Odisha, India) 2018, 623–632.
  • Joys J, Balakrishnan K & Sreeraj M, Sign quiz: A quiz based tool for earning finger spelled signs in Indian sign language using ASLR, IEEE Access, 7 (2019) 28363–28371.
  • Lee D & Park Y, Vision-based remote control system by motion detection and open finger counting, IEEE Trans Consum Electron, 55 (2009) 2308–2313.
  • Lamberti L & Camastra F, Real-time hand gesture recognition using a color glove, Proc Int Conf Image Analysis & Process, (ICIAP) (Ravenna, Italy) 2011, 365–373.
  • Erden F & Çetin A E, Hand gesture based remote control system using infrared sensors and a camera, IEEE Trans Consum Electron, 60 (2014) 2308–2313.
  • Wang Y & Yang R, Real-time hand posture recognition based on hand dominant line using Kinect, Proc IEEE Int Conf Multimed Expo Worksh (ICMEW) (San Jose, CA, USA) 2013, 1–4.
  • Mishra S R, Krishna D, Sanyal G & Sarkar A, A feature weighting technique on SVM for human action recognition, J Sci Ind Res, 79 (2020) 626–630.
  • Chen Z H, Kim J T, Liang J, Zhang J & Yuan Y B, Realtime hand gesture recognition using finger segmentation, Sci World J, 2014 (2014) 2456–2459.
  • Gokgoz K, The Nature of Object Marking in ASL, Ph.D Thesis, Purdue University, United States, 2013.
  • Girshick R, Donahue J, Darrell T & Malik J, Region based convolutional networks for accurate object detection and segmentation, IEEE Trans Pattern Anal MachIntell, 38 (2015) 142–158.
  • Mahajan P, Abrol P & Lehana P K, Scene based classification of aerial images using convolution neural networks, J Sci Ind Res, 79 (2020) 1087–1094.
  • Kopuklu O, Gunduz A, Kose N & Rigoll G, Real-time hand gesture detection and classification using convolutional neural networks, Proc Int Conf Automatic Face Gesture Recognit (Lille, France) 2019, 1–8.
  • Molchanov P, Gupta S, Kim K & Kautz J, Hand gesture recognition with 3D convolutional neural networks, IEEE Int Conf Comput Vis Pattern Recognit Worksh (ICCVPR) 2015, 1–7.
  • Ji S, Xu W, Yang M & Yu K, 3D convolutional neural networks for human action recognition, IEEE Trans Pattern Anal Mach Intell, 35 (2012) 221–231.
  • Srivastava N, Hinton G, Krizhevsky A, Sutskever I & Salakhutdinov R, Dropout: A simple way to prevent neural networks from overfitting, J Mach Learn Res, 15 (2014) 1929–1958.
  • Kishore P V V, Anil Kumar D, Chandra Sekhara Sastry A S & Kumar K, Motionlets matching with adaptive kernal for 3-D Indian sign language recognition, IEEE Sens J, 18 (2018) 3327–3337.
  • Pan J, Luo Y, Li Y, Khong C, Chun-Huat T, Aaron H & Thean V-Y, Wireless multi-channel capacitive sensor system for efficient glove-based gesture recognition with AI at the edge, IEEE Trans Circuits Syst, 67 (2020) 1624–1628.
  • Bao P, Maqueda A I, Del-Blanco C R & Garcia N, Tiny hand gesture recognition without localization via a deep convolutional network, IEEE Trans Consum Electron, 63 (2017) 251–257.
  • Kanchana P, Kosin C & Jing-Ming G, Signer independence finger alphabet recognition using discrete wavelet transform and area level run lengths, J Vis Commun Image Represent, 38 (2016) 658–677.
  • Nandy A, Prasad J S, Mondal S, Chakraborty P & Nandi G C, Recognition of Isolated Indian Sign Language Gesture in Real Time, J Commun Comput Inf Sci, 70 (2010) 102–107.
  • Li Y & Zhang P, Static hand gesture recognition based on hierarchical decision and classification of finger features, J Sci Prog, 105(1) (2022) 163–170.
  • Gupta S, Jaafar J & Ahmad W F W, Static hand gesture recognition using local Gabor filter, Procedia Engineering, Int Symp Robot Intell Sensors (Kuching, Sarawak, Malaysia) 2012, 827–832.
  • Naveed M, Quratulain Q & Shaukat A, Comparison of GLCM based hand gesture recognition systems using multiple classifiers, Proc IEEE Int Conf Robot Autom2 (Xi'an, China) 2021, 1–5.
  • Ghosh D K & Ari S, "Static hand gesture recognition using mixture of features and SVM classifier, 5th Int Conf Commun Syst Netw (Gwalior, MP, India) 2015, 1094–1099.
  • Lim K M, Tan A W C & Tan S C, Block based histogram of optical flow for isolated sign language recognition, J Vis Commun Image Represent, 40 (2016) 538–545.
  • Nielson M, Neural Networks and Deep Learning (Determination Press, San Francisco, CA, USA) 2015.
  • Pan W, Zhang X & Zhongfu Ye, Attention-based sign language recognition network utilizing key frame sampling and skeletal features, IEEE Access, 8 (2020) 215592–21560

Abstract Views: 70

PDF Views: 57




  • Real Time Static and Dynamic Sign Language Recognition Using Deep Learning

Abstract Views: 70  |  PDF Views: 57

Authors

P Jayanthi
Department of Computer Technology, MIT, Anna University, Chennai 600 044, Tamil Nadu, India
Ponsy R K Sathia Bhama
Department of Computer Technology, MIT, Anna University, Chennai 600 044, Tamil Nadu, India
K Swetha
Department of Information Technology, MIT, Anna University, Chennai 600 044, Tamil Nadu, India
S A Subash
Department of Information Technology, MIT, Anna University, Chennai 600 044, Tamil Nadu, India

Abstract


Sign language recognition systems are used for enabling communication between deaf-mute people and normal user. Spatial localization of the hands could be a challenging task when hands-only occupies 10% of the entire image. This is overcome by designing a real-time efficient system that is capable of performing the task of extraction, recognition, and classification within a single network with the use of a deep convolution network. The recognition is performed for static image dataset with a simple and complex background, dynamic video dataset. Static image dataset is trained and tested using a 2D deep-convolution neural network whereas dynamic video dataset is trained and tested using a 3D deep-convolution neural network. Spatial augmentation is done to increase the number of images of static dataset and key-frame extraction to extract the key-frames from the videos for dynamic dataset. To improve the system performance and accuracy Batch-Normalization layer is added to the convolution network. The accuracy is nearly 99% for dataset with a simple background, 92% for dataset with complex background, and 84% for the video dataset. By obtaining a good accuracy, the system is proved to be real-time efficient in recognizing and interpreting the sign language gestures.

Keywords


Deaf-Mute People, Human-Machine Interaction, Inception Deep-Convolution Network, Key Frame Extraction, Video Analytics.

References