Decision Trees for Uncertain Data

A. Pandian; J. Venkata Subramanian; S. Balamanikandan

Decision Trees for Uncertain Data

A. Pandian ¹, J. Venkata Subramanian ¹, S. Balamanikandan ²

Affiliations
1 Department of MCA, SRM University, Chennai, India
2 Thiruvalluvar University, Kallakurichi, India

Subscribe/Renew Journal

Abstract
References
Article Metrics
Refbacks

Classification based on decision trees is one of the important problems in data mining and has applications in many fields. In recent years, database systems have become highly distributed, and distributed system paradigms such as federated and peer-to-peer databases are being adopted. In this paper, we consider the problem of inducing decision trees in a large distributed network of high dimensional databases. Our work is motivated by the existence of distributed databases in healthcare and in bioinformatics, and by the vision that these databases are soon to contain large amounts of genomic data, characterized by its high dimensionality. Current decision tree algorithms would require high communication bandwidth when executed on such data, which is not likely to exist in large-scale distributed systems. We present an algorithm that sharply reduces the communication overhead by sending just a fraction of the statistical data. A fraction which is nevertheless sufficient to derive the exact same decision tree learned by a sequential learner on all the data in the network. Value uncertainty arises in many applications during the data collection process. Example sources of uncertainty include measurement/quantization errors, data staleness, and multiple repeated measurements. With uncertainty, the value of a data item is often represented not by one single value, but by multiple values forming a probability distribution. Rather than abstracting uncertain data by statistical derivatives (such as mean and median), we discover that the accuracy of a decision tree classifier can be much improved if the "complete information" of a data item (taking into account the probability density function (pdf)) is utilized. We extend classical decision tree building algorithms to handle data tuples with uncertain values. Extensive experiments have been conducted that show that the resulting classifiers are more accurate than those using value averages. Since processing pdf's is computationally more costly than processing single values.

Keywords

Data Mining Distributed Algorithms, Decision Trees, Classification, High Dimension Data.

I-Scholar

Journal Help

User

Subscription Login to verify subscription

Notifications

Journal Content
Browse

Font Size

Information

Abstract Views: 250

PDF Views: 2

Decision Trees for Uncertain Data

Abstract Views: 250 | PDF Views: 2

Authors

A. Pandian
Department of MCA, SRM University, Chennai, India

J. Venkata Subramanian
Department of MCA, SRM University, Chennai, India

S. Balamanikandan
Thiruvalluvar University, Kallakurichi, India

Abstract

Keywords

Data Mining Distributed Algorithms, Decision Trees, Classification, High Dimension Data.

Username
Password
Remember me

Username
Password
Remember me

Data Mining and Knowledge Engineering

Data Mining and Knowledge Engineering

Decision Trees for Uncertain Data

Subscribe/Renew Journal

Keywords

Decision Trees for Uncertain Data

Authors

Abstract

Keywords