Pearson Correlation Coefficient k-Nearest Neighbor Outlier Classification on Real-Time Data Set

Dr. D. Rajakumari; S. Karthika

Pearson Correlation Coefficient k-Nearest Neighbor Outlier Classification on Real-Time Data Set

Dr. D. Rajakumari , S. Karthika

Affiliations
1 Department of Computer Science, Nandha Arts and Science College, Erode, Tamilnadu, India

Subscribe/Renew Journal

Detection and classification of data that do not meet the expected behavior (outliers) plays the major role in wide variety of applications such as military surveillance, intrusion detection in cyber security, fraud detection in on-line transactions. Nowadays, an accurate detection of outliers with high dimension is the major issue. The trade-off between the high-accuracy and low computational time is the major requirement in outlier prediction and classification. The presence of large size diverse features need the reduction mechanism prior to classification approach. To achieve this, the Distance-based Outlier Classification (DOC) is proposed in this paper. The proposed work utilizes the Pearson Correlation Coefficient (PCC) to measure the correlation between the data instances. The minimum instance learning through PCC estimation reduces the dimensionality. The proposed work is split up into two phases namely training and testing. During the training process, the labeling of most frequent samples isolates them from the infrequent reduce the data size effectively. The testing phase employs the k-Nearest Neighborhood (k-NN) scheme to classify the frequent samples effectively. The dimensionality and the k-value are inversely proportional to each other. In proposed work, the selection of large value of k offers the significant reduction in dimensionality. The combination of PCC-based instance learning and the high value of k reduces the dimensionality and noise respectively. The comparative analysis between the proposed PCC-k-NN with the conventional algorithms such as Decision Tree, Naïve Bayes, Instance-Based K-means (IBK), Triangular Boundary-based Classification (TBC) regarding sensitivity, specificity, accuracy, precision, and recall proves its effectiveness in OC. Besides, the experimental validation of proposed PCC-k-NN with the state-of art methods regarding the execution time assures trade-off between the low-time consumption and high-accuracy.

Keywords

Data Mining, Distance-based Instance Learning, Outlier Detection, Outlier Classification, Pearson Correlation Coefficient, k-Nearest Neighbor.

I-Scholar

Journal Help

User

Subscription Login to verify subscription

Notifications

Journal Content
Browse

Font Size

Information

Pearson Correlation Coefficient k-Nearest Neighbor Outlier Classification on Real-Time Data Set

Abstract Views: 308 | PDF Views: 1

Authors

Dr. D. Rajakumari
Department of Computer Science, Nandha Arts and Science College, Erode, Tamilnadu, India

S. Karthika
Department of Computer Science, Nandha Arts and Science College, Erode, Tamilnadu, India

Abstract

Keywords

Data Mining, Distance-based Instance Learning, Outlier Detection, Outlier Classification, Pearson Correlation Coefficient, k-Nearest Neighbor.

Username
Password
Remember me

Username
Password
Remember me

Programmable Device Circuits and Systems

Programmable Device Circuits and Systems

Pearson Correlation Coefficient k-Nearest Neighbor Outlier Classification on Real-Time Data Set

Subscribe/Renew Journal

Keywords

Pearson Correlation Coefficient k-Nearest Neighbor Outlier Classification on Real-Time Data Set

Authors

Abstract

Keywords