Open Access Open Access  Restricted Access Subscription Access

Accelerating Data Mining Application in R Using CUDA C


Affiliations
1 Mumbai, Maharashtra, India
2 Oracle Financial Services Software, Mumbai, Maharashtra, India
3 IIM, Lucknow, Uttar Pradesh, India
4 Department of Electrical and Computer Engineering, Carnegie Mellon University, United States
 

This paper focuses on an innovative approach of implementing parallel processing using NVIDIA’s Graphics Processing Unit (GPU) to accelerate a data mining application in R. In order to accomplish this, one of the most apposite and efficient solution is to use CUDA (Compute Unified Device Architecture). We have used the k-means clustering algorithm to demonstrate the effectiveness of CUDA C in terms of speed-up and reduced latency. It is a widely used unsupervised learning technique in data science applications. The currently existing sequential R programming technique using C for k-means algorithm was converted to a more optimized and efficient code that uses concepts of parallel computing using GPU. The efficiency of C and CUDA C codes has been compared on the basis of execution time.


Keywords

CUDA, NVIDIA, GPU, Parallel Processing, R, CUDA C, Clustering, K-Means Algorithm, Llyod Algorithm, Data Mining.
User
Notifications
Font Size

Abstract Views: 193

PDF Views: 2




  • Accelerating Data Mining Application in R Using CUDA C

Abstract Views: 193  |  PDF Views: 2

Authors

Sneha Shankar
Mumbai, Maharashtra, India
Sharwari Gadkari
Oracle Financial Services Software, Mumbai, Maharashtra, India
Himani Dudhat
Mumbai, Maharashtra, India
Nikita Chakraborty
IIM, Lucknow, Uttar Pradesh, India
Radhika Somthankar
Department of Electrical and Computer Engineering, Carnegie Mellon University, United States

Abstract


This paper focuses on an innovative approach of implementing parallel processing using NVIDIA’s Graphics Processing Unit (GPU) to accelerate a data mining application in R. In order to accomplish this, one of the most apposite and efficient solution is to use CUDA (Compute Unified Device Architecture). We have used the k-means clustering algorithm to demonstrate the effectiveness of CUDA C in terms of speed-up and reduced latency. It is a widely used unsupervised learning technique in data science applications. The currently existing sequential R programming technique using C for k-means algorithm was converted to a more optimized and efficient code that uses concepts of parallel computing using GPU. The efficiency of C and CUDA C codes has been compared on the basis of execution time.


Keywords


CUDA, NVIDIA, GPU, Parallel Processing, R, CUDA C, Clustering, K-Means Algorithm, Llyod Algorithm, Data Mining.