Refine your search
Collections
Co-Authors
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z All
Verma, Shrish
- On Studying the Effect of Sample Size in Evaluation of Bug Classifiers
Abstract Views :412 |
PDF Views:121
Authors
Affiliations
1 Computer Science & Engineering, National Institute of Technology Raipur
2 Electronics & Tel. Communication Engg. National Institute of Technology Raipur
1 Computer Science & Engineering, National Institute of Technology Raipur
2 Electronics & Tel. Communication Engg. National Institute of Technology Raipur
Source
Indian Journal of Science and Technology, Vol 6, No 1 (2013), Pagination: 3849-3855Abstract
Sampling is an important and necessary step in mining large size databases and is also very useful in performing mining operations, where performance is a critical issue. This study focuses on identifying the effect of sample size in classification of software bugs. To analyze the effect of sample size, experiments are performed using a number of classification algorithms with varities of sample sizes using the software bug repositories of three large open source software's namely Android, Mozilla and MySql. The relationship between the sample size with two primary classification performance parameters accuracy and F-measure is explored in this study. From experiments, it is identified that the parameter F-measure is affected more by the sample size than accuracy.Keywords
Sampling, Sample Size, Classification, Software Bug, Performance, Classifier EvaluationReferences
- Android Bug Repository - available at https://code.google. com/p/android/issues/list
- Antoniol G, Ayari K, Penta M D (2008) Is it a Bug or an Enhancement? A Text-based Approach to Classify Change Requests. Proceedings of the 2008 conference of the center for advanced studies on collaborative research (CASCON ’08), New York, USA, 304–318.
- Chang C C, Lin C J (2001) LIBSVM - A Library for Support Vector Machines. URL http://www.csie.ntu.edu.tw/~cjlin/ libsvm/.
- EL-Manzalawy Y (2005) WLSVM: Integrating libsvm into WEKA environment. Software available at http://www. cs.iastate.edu/~yasser/wlsvm/.
- Ferzund J, Ahsan S N, Wotawa F (2009) Software Change Classification using Hunk Metrics. Proceedings of IEEE International Conference on Software Maintenance (ICSM 2009), Edmonton, Canada, 471-474.
- Fluri B, Giger E, Gall H C (2008) Discovering Patterns of Change Types. Proceedings of the 23rd International Conference on Automated Software Engineering (ASE), L’Aquila, Italy, 463-466.
- Grottke M, Trivedi K S (2005) A Classification of Software Faults. Journal of Reliability Engineering Association of Japan, 27(7), 425-438.
- Guo Y, Sampath S (2008) Web Application Fault Classification - An Exploratory Study. Proceedings of the International Symposium on Empirical Software Engineering and Measurement (ESEM 2008), Kaiserslautern, Germany, 303-305.
- Jalbert N, Weimer W (2008) Automated Duplicate Detection for Bug Tracking Systems. IEEE International Conference on Dependable Systems & Networks, Anchorage, Alaska, 52-61.
- Kyriakopoulou A, Kalamboukis T (2006) Text Classification Using Clustering. Proceedings of The 17th European Conference on Machine Learning and the 10th European Conference on Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD),Burlin, Germany, 28- 38.
- Li W (1992) Random Texts Exhibit Zipf’s-Law-Like Word Frequency Distribution. IEEE Transactions on Information Theory, 38(6), 1842-1845.
- Mccallum A, Nigam K (1998) A Comparison of Event Models for Naive Bayes Text Classification. Proceedings of the Fifteenth National Conference on Artificial Intelligence (AAAI-98) Workshop on Learning for Text Categorization, Madison, Wisconsin, 41-48.
- Mozilla (An open-source browser)Bug Repository, available at https://bugzilla.mozilla.org/
- MySql - A free relational database management system, Bug Repository, available at http://bugs.mysql.com/
- Nagwani N K, Verma S (2012) A Frequent Term Based Approach for Generating Discriminative Terms in Software Bug Repositories. IEEE 1st International Conference on Recent Advances in Information Technology (RAIT – 2012), Dhanbad, Jharkhand, India, 433-435.
- Nagwani N K, Verma S (2012) CLUBAS: An Algorithm and Java Based Tool for Software Bug Classification Using Bug Attributes Similarities. Journal of Software Engineering and Applications, 5(6), 436-447.
- Quinlan R (1993) C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Mateo, CA, ISBN 1-55860-238-0, 1-16.
- Reed W J (2001) The Pareto, Zipf and other power laws. Economics Letters, 74(1), 15-19.
- Vapnik V (1995) The Nature of Statistical Learning Theory. Springer-Verlag, ISBN:0-387-94559-8, 138-167.
- Weka, available at http://www.cs.waikato.ac.nz/ml/weka/
- A Comparative Study of Software Bug Clustering Using Lingo and STC Web Clustering Algorithms
Abstract Views :402 |
PDF Views:4
Authors
Affiliations
1 National Institute of Technology, Raipur-492001, CG, IN
1 National Institute of Technology, Raipur-492001, CG, IN
Source
Data Mining and Knowledge Engineering, Vol 3, No 13 (2011), Pagination: 793-802Abstract
Software bug classification is one of the important and popular problems in software engineering. Recently number of algorithms and techniques are presented to automate this process. Software bug data contains number of attributes like bug-id, summary (title), description, comments, status, version etc. Most of the important attributes holds text data. Lingo and STC (Suffix Tree Clustering) both are popular text clustering algorithms used in web mining. In this paper Lingo and STC algorithms are used to classify the software bugs. Classification using clustering methodology is used to create the software bug classes from software bug clusters. In this methodology first clusters are created and then appropriate labels are assigned to the clusters, which indicate the class label for the clusters. Both of these algorithms Lingo and STC are implemented as the part of Carrot2 framework. The software bug repository data is integrated and passed to Carrot2 framework for applying Lingo and STC algorithms. Lingo and STC algorithms are compared for software bug classification task. The comparison is done using various clustering parameters: the number of clusters generated, purity of the clusters and entropy of the clusters created etc.Keywords
Software Bug Classification, Lingo Clustering, STC Clustering, Software Bug Clustering, Software Bug Repository.- TARPIN:Discovering Temporal Association Rules Using P-Tree Based Incremental Algorithm
Abstract Views :202 |
PDF Views:2
A new pattern tree algorithm for mining temporal association rules in databases is introduced. This algorithm uses P-Tree (Pattern-Trees) structures for finding temporal association rules in databases. According to different time periods associated with transactions in temporal databases, it will initiate the number of P-Trees and according to time information in transactions it inserts the transactions in created appropriate trees, then using P-Tree association rule mining algorithm it finds out the frequent sets in this P-Tree and then these frequent items are merged with different time periods which will give the association rules with valid time periods. The proposed algorithm is divided in two phases in first phase all item within the transactions are inserted in different P-Trees on which the frequent item-sets are taken out and in second (merge phase) these frequent items are merged and time associates with these items are in listed which indicates that these frequent items are frequent in this time periods. Algorithm is implemented in C++ under Linux platform and evaluated results are compared with existing popular algorithm PPM (Progressive Partition Miner) for discovering temporal association rule.
Authors
Affiliations
1 National Institute of Technology, Raipur-492001, CG, IN
1 National Institute of Technology, Raipur-492001, CG, IN
Source
Data Mining and Knowledge Engineering, Vol 1, No 8 (2009), Pagination: 392-404Abstract
Association rule mining is one of very popular data mining method and number of organizations uses this technique to find the frequent item-sets of products to improve the benefits of organizations. There are number of available algorithms for association rule mining which takes multiple scans of database. The complexities of association rule algorithms primarily depends on number of database scan, so by reducing the number of database scans one can improve the time complexity of these algorithms. The purpose of this proposed algorithm is to reduce the number of database scans for discovering the temporal association rules by applying P-Tree algorithm for temporal association rules which takes just one scan of database to find out the association rules.A new pattern tree algorithm for mining temporal association rules in databases is introduced. This algorithm uses P-Tree (Pattern-Trees) structures for finding temporal association rules in databases. According to different time periods associated with transactions in temporal databases, it will initiate the number of P-Trees and according to time information in transactions it inserts the transactions in created appropriate trees, then using P-Tree association rule mining algorithm it finds out the frequent sets in this P-Tree and then these frequent items are merged with different time periods which will give the association rules with valid time periods. The proposed algorithm is divided in two phases in first phase all item within the transactions are inserted in different P-Trees on which the frequent item-sets are taken out and in second (merge phase) these frequent items are merged and time associates with these items are in listed which indicates that these frequent items are frequent in this time periods. Algorithm is implemented in C++ under Linux platform and evaluated results are compared with existing popular algorithm PPM (Progressive Partition Miner) for discovering temporal association rule.
Keywords
Temporal Association Rules, Temporal Data Mining, P-Tree, Incremental Data Mining.- An Open Source Framework for Data Pre-Processing of Online Software Bug Repositories
Abstract Views :211 |
PDF Views:4
Authors
Affiliations
1 National Institute of Technology, Raipur-492001, CG, IN
1 National Institute of Technology, Raipur-492001, CG, IN
Source
Data Mining and Knowledge Engineering, Vol 1, No 7 (2009), Pagination: 329-338Abstract
Software bug repositories are great source of knowledge. It contains lot of useful information related to software development, software design and common error patterns for a software project. Most of the projects uses some bug tracking system to manage the bugs associated with the software. These bug tracking system works as an online bug repositories, which can be accessed by all of the project members situated at different locations. All project members can update and read the software bug related information from these online bug repositories. In order to extract knowledge from these online software bug repositories some mechanism is required to extract, parse and save the data locally for analysis. In this paper a framework is proposed and implemented using open source API's (Application Programming Interfaces) for the preprocessing of the online software bug repositories for data mining, also performance is evaluated for the implemented framework in terms of software bug data fetch and parse timings from online repositories.Keywords
Software Bug Repositories, Fetching Bug Repositories, Parsing Software Bugs, Data Preprocessing of Bug Repositories.- Quantitative Evaluation of Web user Session Dissimilarity measures using Medoids based Relational Fuzzy clustering
Abstract Views :146 |
PDF Views:0
Authors
Affiliations
1 Department of Computer Science and Engineering, National Institute of Technology Raipur – 492010, Chhattisgarh, IN
2 Department of Electronics and Telecommunication, National Institute of Technology, Raipur - 492010, Chhattisgarh,, IN
3 Department of Information Technology, Indian Institute of Information Technology, Allahabad - 211011, Uttar Pradesh, IN
1 Department of Computer Science and Engineering, National Institute of Technology Raipur – 492010, Chhattisgarh, IN
2 Department of Electronics and Telecommunication, National Institute of Technology, Raipur - 492010, Chhattisgarh,, IN
3 Department of Information Technology, Indian Institute of Information Technology, Allahabad - 211011, Uttar Pradesh, IN
Source
Indian Journal of Science and Technology, Vol 9, No 28 (2016), Pagination:Abstract
Background/Objectives: Proficient relational clustering of web users’ sessions not only depends on clustering algorithm’s character but also profoundly influenced by the used dissimilarity measures. Therefore, determining the right dissimilarity measure to capture the actual access behaviour of the web user is imperative for the significant clustering.Methods: In this paper, the concept of an augmented session is used to derive different augmented session dissimilarity measures. The quantitative performance evaluation of different session dissimilarity measures are performed using a relational fuzzy c-medoid clustering approach. The intra-cluster and inter-cluster distance based cluster quality ratio is used for performance evaluation. Findings: The experimental results demonstrated that augmented web user session dissimilarity in general, and intuitive augmented session dissimilarity, in particular, performed better than the other dissimilarity measures. Improvements: It is argued that augmented session similarity measures are more realistic and represent session similarities based on the web user’s habits, interest, and expectations as compared to simple binary session similarity measures.Keywords
Augmented user Sessions, Cluster Evaluation, Dissimilarity Measures, Fuzzy Clustering, Page Relevance, Web User Sessions.- An Empirical Analysis on Reducing Open Source Software Development Tasks using Stack Overflow
Abstract Views :119 |
PDF Views:0
Authors
Affiliations
1 Department of Information Technology, National Institute of Technology Raipur, Raipur - 492010, Chhattisgarh, IN
2 Department of Computer Science and Engineering, National Institute of Technology Raipur, Raipur - 492010, Chhattisgarh, IN
3 Department of Electronics and Telecommunication, National Institute of Technology Raipur, Raipur - 492010, Chhattisgarh, IN
1 Department of Information Technology, National Institute of Technology Raipur, Raipur - 492010, Chhattisgarh, IN
2 Department of Computer Science and Engineering, National Institute of Technology Raipur, Raipur - 492010, Chhattisgarh, IN
3 Department of Electronics and Telecommunication, National Institute of Technology Raipur, Raipur - 492010, Chhattisgarh, IN
Source
Indian Journal of Science and Technology, Vol 9, No 21 (2016), Pagination:Abstract
Objectives: The cross repository analysis between Open Source Software (OSS) and Community Question Answering (CQA) site is presented in order to speed the development process of OSS. Methods/Analysis: The OSS development is becoming popular nowadays due to fact that the source codes, the developer specifications and bug lists are made available online to the public. Anyone can contribute to the development of software by referring these files. Similarly, Stack Overflow is an interactive CQA site that caters programming related questions with their answers online and turned into repositories of software engineering knowledge. In order to track the correlation of such sites with software development tasks, we employ the two repositories to find the semantic similarity between bugs and Question and Answer (Q&A) posts posted on OSS projects and Stack Overflow respectively. The semantic similarity is analyzed by integrating the contents of the repositories based on text mining approach. The relationship between a bug and Q&A post is established through the semantic similarity and metadata features. Findings: The statistics of our analysis is presented for five OSS projects in terms of number of bugs and average bug fix time. The statistical result shows that the bug fix time can be reduced by posting the bugs into Stack Overflow. Application/Improvement: The presented approach can be utilized to find the similar Q&A posts for reported OSS bug and helps developers of OSS projects to resolve the bugs quickly by leveraging programming skills of users' in the form of Q&A posts.Keywords
Open Source Software, Community Question Answering, Stack Overflow, Cross Repository Analysis, Bug Tracking System, Bug Fixing.- Analysis of Electroencephalography (EEG) Signals using Visualization Techniques
Abstract Views :155 |
PDF Views:0
Authors
Affiliations
1 National Institute of Technology, Raipur - 492010, Chhattisgarh, IN
1 National Institute of Technology, Raipur - 492010, Chhattisgarh, IN
Source
Indian Journal of Science and Technology, Vol 9, No 48 (2016), Pagination:Abstract
The feasibility of machine learning and data mining has been recognized by past research in the analysis of biomedical device data. The proposed work analyses the biomedical Electroencephalography (EEG) device data using visualization techniques and these techniques have potential for extracting information from huge data sets. The presented work includes extraction of visual patterns using different visualization techniques. The pattern identification task is performed in several stages like outlier removal from EEG. After outlier detection data is divided into manageable segments to avoid overcrowding of data points. Then random method is applied to select a segment and identify prominent patterns by application of visual mapping techniques.Keywords
Corrogram, EEG, Parallel Coordinates, Radar Chart, Sampling, Visual Data Mining.- Applying Auto Regression Techniques on Amyotrophic Lateral Sclerosis Patients EEG Dataset with P300 Speller
Abstract Views :170 |
PDF Views:0
Authors
Affiliations
1 National Institute of Technology, Raipur - 492010, Chhattisgarh, IN
1 National Institute of Technology, Raipur - 492010, Chhattisgarh, IN