Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

Gain Ratio Based Feature Selection Method for Privacy Preservation


Affiliations
1 Department of Computer Science and Engineering, Avinashilingam Deemed University for Women, Tamil Nadu, India
2 Department of Computer Science and Engineering, Government College of Technology, Tamil Nadu, India
     

   Subscribe/Renew Journal


Privacy-preservation is a step in data mining that tries to safeguard sensitive information from unsanctioned disclosure and hence protecting individual data records and their privacy. There are various privacy preservation techniques like k-anonymity, l-diversity and t-closeness and data perturbation. In this paper k-anonymity privacy protection technique is applied to high dimensional datasets like adult and census. since, both the data sets are high dimensional, feature subset selection method like Gain Ratio is applied and the attributes of the datasets are ranked and low ranking attributes are filtered to form new reduced data subsets. K-anonymization privacy preservation technique is then applied on reduced datasets. The accuracy of the privacy preserved reduced datasets and the original datasets are compared for their accuracy on the two functionalities of data mining namely classification and clustering using naïve Bayesian and k-means algorithm respectively. Experimental results show that classification and clustering accuracy are comparatively the same for reduced k-anonym zed datasets and the original data sets.

Keywords

Privacy Preservation, Data Mining, K-Anonymity, Feature Subset Selection, Gain Ratio.
Subscription Login to verify subscription
User
Notifications
Font Size

Abstract Views: 274

PDF Views: 0




  • Gain Ratio Based Feature Selection Method for Privacy Preservation

Abstract Views: 274  |  PDF Views: 0

Authors

R. Praveena Priyadarsini
Department of Computer Science and Engineering, Avinashilingam Deemed University for Women, Tamil Nadu, India
M. L. Valarmathi
Department of Computer Science and Engineering, Government College of Technology, Tamil Nadu, India
S. Sivakumari
Department of Computer Science and Engineering, Avinashilingam Deemed University for Women, Tamil Nadu, India

Abstract


Privacy-preservation is a step in data mining that tries to safeguard sensitive information from unsanctioned disclosure and hence protecting individual data records and their privacy. There are various privacy preservation techniques like k-anonymity, l-diversity and t-closeness and data perturbation. In this paper k-anonymity privacy protection technique is applied to high dimensional datasets like adult and census. since, both the data sets are high dimensional, feature subset selection method like Gain Ratio is applied and the attributes of the datasets are ranked and low ranking attributes are filtered to form new reduced data subsets. K-anonymization privacy preservation technique is then applied on reduced datasets. The accuracy of the privacy preserved reduced datasets and the original datasets are compared for their accuracy on the two functionalities of data mining namely classification and clustering using naïve Bayesian and k-means algorithm respectively. Experimental results show that classification and clustering accuracy are comparatively the same for reduced k-anonym zed datasets and the original data sets.

Keywords


Privacy Preservation, Data Mining, K-Anonymity, Feature Subset Selection, Gain Ratio.