On Studying the Effect of Sample Size in Evaluation of Bug Classifiers

Naresh Kumar Nagwani; Shrish Verma

doi:10.17485/ijst/2013/v6i1/30553

On Studying the Effect of Sample Size in Evaluation of Bug Classifiers

Affiliations
1 Computer Science & Engineering, National Institute of Technology Raipur
2 Electronics & Tel. Communication Engg. National Institute of Technology Raipur

Abstract
References
Article Metrics
Refbacks

Sampling is an important and necessary step in mining large size databases and is also very useful in performing mining operations, where performance is a critical issue. This study focuses on identifying the effect of sample size in classification of software bugs. To analyze the effect of sample size, experiments are performed using a number of classification algorithms with varities of sample sizes using the software bug repositories of three large open source software's namely Android, Mozilla and MySql. The relationship between the sample size with two primary classification performance parameters accuracy and F-measure is explored in this study. From experiments, it is identified that the parameter F-measure is affected more by the sample size than accuracy.

Keywords

Sampling, Sample Size, Classification, Software Bug, Performance, Classifier Evaluation

About the Journal

Editorial Board

Current Issue

Archives

Advanced Search

Article Submission

Registration

Subscription

User

Information

Journal Content
Browse

Donations

Android Bug Repository - available at https://code.google. com/p/android/issues/list

Antoniol G, Ayari K, Penta M D (2008) Is it a Bug or an Enhancement? A Text-based Approach to Classify Change Requests. Proceedings of the 2008 conference of the center for advanced studies on collaborative research (CASCON ’08), New York, USA, 304–318.

Chang C C, Lin C J (2001) LIBSVM - A Library for Support Vector Machines. URL http://www.csie.ntu.edu.tw/~cjlin/ libsvm/.

EL-Manzalawy Y (2005) WLSVM: Integrating libsvm into WEKA environment. Software available at http://www. cs.iastate.edu/~yasser/wlsvm/.

Ferzund J, Ahsan S N, Wotawa F (2009) Software Change Classification using Hunk Metrics. Proceedings of IEEE International Conference on Software Maintenance (ICSM 2009), Edmonton, Canada, 471-474.

Fluri B, Giger E, Gall H C (2008) Discovering Patterns of Change Types. Proceedings of the 23rd International Conference on Automated Software Engineering (ASE), L’Aquila, Italy, 463-466.

Grottke M, Trivedi K S (2005) A Classification of Software Faults. Journal of Reliability Engineering Association of Japan, 27(7), 425-438.

Guo Y, Sampath S (2008) Web Application Fault Classification - An Exploratory Study. Proceedings of the International Symposium on Empirical Software Engineering and Measurement (ESEM 2008), Kaiserslautern, Germany, 303-305.

Jalbert N, Weimer W (2008) Automated Duplicate Detection for Bug Tracking Systems. IEEE International Conference on Dependable Systems & Networks, Anchorage, Alaska, 52-61.

Kyriakopoulou A, Kalamboukis T (2006) Text Classification Using Clustering. Proceedings of The 17th European Conference on Machine Learning and the 10th European Conference on Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD),Burlin, Germany, 28- 38.

Li W (1992) Random Texts Exhibit Zipf’s-Law-Like Word Frequency Distribution. IEEE Transactions on Information Theory, 38(6), 1842-1845.

Mccallum A, Nigam K (1998) A Comparison of Event Models for Naive Bayes Text Classification. Proceedings of the Fifteenth National Conference on Artificial Intelligence (AAAI-98) Workshop on Learning for Text Categorization, Madison, Wisconsin, 41-48.

Mozilla (An open-source browser)Bug Repository, available at https://bugzilla.mozilla.org/

MySql - A free relational database management system, Bug Repository, available at http://bugs.mysql.com/

Nagwani N K, Verma S (2012) A Frequent Term Based Approach for Generating Discriminative Terms in Software Bug Repositories. IEEE 1st International Conference on Recent Advances in Information Technology (RAIT – 2012), Dhanbad, Jharkhand, India, 433-435.

Nagwani N K, Verma S (2012) CLUBAS: An Algorithm and Java Based Tool for Software Bug Classification Using Bug Attributes Similarities. Journal of Software Engineering and Applications, 5(6), 436-447.

Quinlan R (1993) C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Mateo, CA, ISBN 1-55860-238-0, 1-16.

Reed W J (2001) The Pareto, Zipf and other power laws. Economics Letters, 74(1), 15-19.

Vapnik V (1995) The Nature of Statistical Learning Theory. Springer-Verlag, ISBN:0-387-94559-8, 138-167.

Weka, available at http://www.cs.waikato.ac.nz/ml/weka/

Abstract Views: 412

PDF Views: 121

Username
Password
Remember me

Username
Password
Remember me

Indian Journal of Science and Technology

On Studying the Effect of Sample Size in Evaluation of Bug Classifiers

Keywords

On Studying the Effect of Sample Size in Evaluation of Bug Classifiers

Authors

Abstract

Keywords

References