Sign language recognition systems are used for enabling communication between deaf-mute people and normal user. Spatial localization of the hands could be a challenging task when hands-only occupies 10% of the entire image. This is overcome by designing a real-time efficient system that is capable of performing the task of extraction, recognition, and classification within a single network with the use of a deep convolution network. The recognition is performed for static image dataset with a simple and complex background, dynamic video dataset. Static image dataset is trained and tested using a 2D deep-convolution neural network whereas dynamic video dataset is trained and tested using a 3D deep-convolution neural network. Spatial augmentation is done to increase the number of images of static dataset and key-frame extraction to extract the key-frames from the videos for dynamic dataset. To improve the system performance and accuracy Batch-Normalization layer is added to the convolution network. The accuracy is nearly 99% for dataset with a simple background, 92% for dataset with complex background, and 84% for the video dataset. By obtaining a good accuracy, the system is proved to be real-time efficient in recognizing and interpreting the sign language gestures.
Keywords
Deaf-Mute People, Human-Machine Interaction, Inception Deep-Convolution Network, Key Frame Extraction, Video Analytics.
User
Font Size
Information