A B C D E F G H I J K L M N O P Q R S T U V W X Y Z All
Panda, S. N.
- Profit and Quantity Oriented Two Efficient Approaches for Utility Pattern Mining
Authors
1 Deptt. of Computer Science & Engineering at Rayat & Bahra Institute of Engineering & Bio-Technology, Mohali, IN
2 Deptt. of Computer Science & Engineering at RIMIT Institute of Engg. & Technology, Punjab, IN
3 Regional Institute of Management & Technology, Mandi Gobindgarh, Punjab, IN
Source
Data Mining and Knowledge Engineering, Vol 3, No 4 (2011), Pagination: 200-206Abstract
Traditional methods of association rule mining consider the appearance of an item in a transaction, whether or not it is purchased, as a binary variable. But, the quantity of an item purchased by the customers may be more than one, and the unit cost may not be the same for all items. A generalized form of the share mining model introduced to overcome this problem is utility mining. Developing an efficient algorithm is vital for utility mining because high utility itemsets cannot be identified by the pruning strategy. In this paper, we present two efficient approaches for utility pattern mining with the aid of FP-growth algorithm. The efficiency of utility pattern mining is achieved with two major concepts: 1) Incorporating the utility values after mining the frequent patterns (IUA-FP). Here, the patterns that are mined from the FP-growth algorithm are utilized to generate high utility patterns using internal and external utility. 2) Incorporating the utility values before mining the frequent patterns (IUB-FP). At this point, individual items that are less significant are taken out from the input database by considering their frequency along with their internal and external utility. Then, we apply the FP-growth algorithm in the transformed database to mine high utility patterns. Experimentation is carried out on these two concepts using synthetic dataset, T10I4D100K, attained from the IBM dataset generator and the performance study shows that the proposed two approaches are efficient in mining high utility patterns.
Keywords
Data Mining, Association Rule Mining, FP-Growth Algorithm, Frequent Patterns, Utility, Transaction Utility.- CAK-NN Algorithm:Cluster and Attribute Weightage-Based Algorithm for Effective Classification
Authors
1 Department of Computer Science & Engineering at Rayat & Bahra Institute of Engineering & Bio-Technology, Mohali, IN
2 Department of Computer Science & Engineering, RIMIT Institute of Engg. & Technology, Punjab, IN
3 Regional Institute of Management & Technology, Mandi Gobindgarh, IN
Source
Data Mining and Knowledge Engineering, Vol 3, No 4 (2011), Pagination: 216-222Abstract
The task of classification is to assign a new object to a class from a given set of classes based on the attribute values of the object. The k-Nearest Neighbor (k-NN) is one of the simplest classification methods used in data mining and machine learning. Although k-NN can be applied broadly, it has few inherent problems, which is why researchers have proposed different extensions of the k-NN, or even ensemble formulations of k-NN classifiers. In our proposed CAk-NN (cluster and attribute weighted k-NN algorithm) algorithm, weight is assigned to each and every attribute of the training dataset so that the accurate distance matching can be possible. In addition to, clustering the training dataset reduces the execution time that is taken for classification and the resultant clusters are used to classify test instances. For this, we have proposed an attribute weighted k-means clustering algorithm that is used for partition the training dataset. After that, each centroid of the obtained cluster constitutes the sub-sample of input database, which is then used for classification. For testing case, distance measure based on attribute weight is calculated between a test instances with the mean of each cluster of training dataset. According to the computed distance measure, k-nearest neighbor cluster are identified and the class label is assigned if every cluster is from the same class. Otherwise, the relevant data records from the k-nearest cluster are retrieved and k-nearest neighbor data records are identified. Finally, the performance of the proposed CAk-NN algorithm is compared with the k-NN algorithm in terms of computation time and Classification accuracy using IRIS dataset.