Open Access Open Access  Restricted Access Subscription Access

Real Time Static and Dynamic Sign Language Recognition Using Deep Learning


Affiliations
1 Department of Computer Technology, MIT, Anna University, Chennai 600 044, Tamil Nadu, India
2 Department of Information Technology, MIT, Anna University, Chennai 600 044, Tamil Nadu, India
 

Sign language recognition systems are used for enabling communication between deaf-mute people and normal user. Spatial localization of the hands could be a challenging task when hands-only occupies 10% of the entire image. This is overcome by designing a real-time efficient system that is capable of performing the task of extraction, recognition, and classification within a single network with the use of a deep convolution network. The recognition is performed for static image dataset with a simple and complex background, dynamic video dataset. Static image dataset is trained and tested using a 2D deep-convolution neural network whereas dynamic video dataset is trained and tested using a 3D deep-convolution neural network. Spatial augmentation is done to increase the number of images of static dataset and key-frame extraction to extract the key-frames from the videos for dynamic dataset. To improve the system performance and accuracy Batch-Normalization layer is added to the convolution network. The accuracy is nearly 99% for dataset with a simple background, 92% for dataset with complex background, and 84% for the video dataset. By obtaining a good accuracy, the system is proved to be real-time efficient in recognizing and interpreting the sign language gestures.

Keywords

Deaf-Mute People, Human-Machine Interaction, Inception Deep-Convolution Network, Key Frame Extraction, Video Analytics.
User
Notifications
Font Size


  • Real Time Static and Dynamic Sign Language Recognition Using Deep Learning

Abstract Views: 218  |  PDF Views: 117

Authors

P Jayanthi
Department of Computer Technology, MIT, Anna University, Chennai 600 044, Tamil Nadu, India
Ponsy R K Sathia Bhama
Department of Computer Technology, MIT, Anna University, Chennai 600 044, Tamil Nadu, India
K Swetha
Department of Information Technology, MIT, Anna University, Chennai 600 044, Tamil Nadu, India
S A Subash
Department of Information Technology, MIT, Anna University, Chennai 600 044, Tamil Nadu, India

Abstract


Sign language recognition systems are used for enabling communication between deaf-mute people and normal user. Spatial localization of the hands could be a challenging task when hands-only occupies 10% of the entire image. This is overcome by designing a real-time efficient system that is capable of performing the task of extraction, recognition, and classification within a single network with the use of a deep convolution network. The recognition is performed for static image dataset with a simple and complex background, dynamic video dataset. Static image dataset is trained and tested using a 2D deep-convolution neural network whereas dynamic video dataset is trained and tested using a 3D deep-convolution neural network. Spatial augmentation is done to increase the number of images of static dataset and key-frame extraction to extract the key-frames from the videos for dynamic dataset. To improve the system performance and accuracy Batch-Normalization layer is added to the convolution network. The accuracy is nearly 99% for dataset with a simple background, 92% for dataset with complex background, and 84% for the video dataset. By obtaining a good accuracy, the system is proved to be real-time efficient in recognizing and interpreting the sign language gestures.

Keywords


Deaf-Mute People, Human-Machine Interaction, Inception Deep-Convolution Network, Key Frame Extraction, Video Analytics.

References