Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

Handling Missing Information for Approximate Association Rule Mining


Affiliations
1 Department of Information Technology, A.D. Patel Institute of Technology, Gujarat Technological University (GTU), New V.V. Nagar-388121, India
     

   Subscribe/Renew Journal


Data mining is based upon searching the concatenation of multiple databases that usually contain some amount of missing data along with a variable percentage of inaccurate data, pollution, outliers and noise. Data warehouses usually have some missing values due to unavailable data which affect the number and the quality of the generated rules. Missing values creates a problem while extracting useful information from the data set. Handling missing data without affecting the quality of the data is challenging task. Association rule algorithms identify patterns from the database. Handling Missing Information for Approximate Association Rule mining allows data that approximately matches the pattern to contribute toward the overall support of the pattern. This approach is also useful in processing missing data, which probabilistically contributes to the support of possibly matching patterns. Apriori like candidate-generation-and-test approach may encounter serious challenges when mining datasets with long patterns. Hotspot algorithm is faster than some recently reported new frequent pattern mining methods. With Hotspot algorithm, many interesting patterns can also be mined efficiently. The actual data mining process deals significantly with prediction, estimation, classification, pattern recognition and the development of association rules. Therefore, the significance of the analysis depends heavily on the accuracy of the database and on the chosen sample data to be used for training and testing. The issue of missing data must be addressed because ignoring this can introduce bias into the models being evaluated and lead to inaccurate data mining conclusions. The objective of this paper is to perform data mining process for the database with missing information effectively.

Keywords

Data Cleansing, Data Mining, Knowledge Discovery, Missing Values, Preprocessing.
User
Subscription Login to verify subscription
Notifications
Font Size

Abstract Views: 258

PDF Views: 2




  • Handling Missing Information for Approximate Association Rule Mining

Abstract Views: 258  |  PDF Views: 2

Authors

Dinesh J. Prajapati
Department of Information Technology, A.D. Patel Institute of Technology, Gujarat Technological University (GTU), New V.V. Nagar-388121, India
Jagruti H. Prajapati
Department of Information Technology, A.D. Patel Institute of Technology, Gujarat Technological University (GTU), New V.V. Nagar-388121, India

Abstract


Data mining is based upon searching the concatenation of multiple databases that usually contain some amount of missing data along with a variable percentage of inaccurate data, pollution, outliers and noise. Data warehouses usually have some missing values due to unavailable data which affect the number and the quality of the generated rules. Missing values creates a problem while extracting useful information from the data set. Handling missing data without affecting the quality of the data is challenging task. Association rule algorithms identify patterns from the database. Handling Missing Information for Approximate Association Rule mining allows data that approximately matches the pattern to contribute toward the overall support of the pattern. This approach is also useful in processing missing data, which probabilistically contributes to the support of possibly matching patterns. Apriori like candidate-generation-and-test approach may encounter serious challenges when mining datasets with long patterns. Hotspot algorithm is faster than some recently reported new frequent pattern mining methods. With Hotspot algorithm, many interesting patterns can also be mined efficiently. The actual data mining process deals significantly with prediction, estimation, classification, pattern recognition and the development of association rules. Therefore, the significance of the analysis depends heavily on the accuracy of the database and on the chosen sample data to be used for training and testing. The issue of missing data must be addressed because ignoring this can introduce bias into the models being evaluated and lead to inaccurate data mining conclusions. The objective of this paper is to perform data mining process for the database with missing information effectively.

Keywords


Data Cleansing, Data Mining, Knowledge Discovery, Missing Values, Preprocessing.