Open Access Open Access  Restricted Access Subscription Access

On Studying the Effect of Sample Size in Evaluation of Bug Classifiers


Affiliations
1 Computer Science & Engineering, National Institute of Technology Raipur
2 Electronics & Tel. Communication Engg. National Institute of Technology Raipur
 

Sampling is an important and necessary step in mining large size databases and is also very useful in performing mining operations, where performance is a critical issue. This study focuses on identifying the effect of sample size in classification of software bugs. To analyze the effect of sample size, experiments are performed using a number of classification algorithms with varities of sample sizes using the software bug repositories of three large open source software's namely Android, Mozilla and MySql. The relationship between the sample size with two primary classification performance parameters accuracy and F-measure is explored in this study. From experiments, it is identified that the parameter F-measure is affected more by the sample size than accuracy.

Keywords

Sampling, Sample Size, Classification, Software Bug, Performance, Classifier Evaluation
User

  • Android Bug Repository - available at https://code.google. com/p/android/issues/list
  • Antoniol G, Ayari K, Penta M D (2008) Is it a Bug or an Enhancement? A Text-based Approach to Classify Change Requests. Proceedings of the 2008 conference of the center for advanced studies on collaborative research (CASCON ’08), New York, USA, 304–318.
  • Chang C C, Lin C J (2001) LIBSVM - A Library for Support Vector Machines. URL http://www.csie.ntu.edu.tw/~cjlin/ libsvm/.
  • EL-Manzalawy Y (2005) WLSVM: Integrating libsvm into WEKA environment. Software available at http://www. cs.iastate.edu/~yasser/wlsvm/.
  • Ferzund J, Ahsan S N, Wotawa F (2009) Software Change Classification using Hunk Metrics. Proceedings of IEEE International Conference on Software Maintenance (ICSM 2009), Edmonton, Canada, 471-474.
  • Fluri B, Giger E, Gall H C (2008) Discovering Patterns of Change Types. Proceedings of the 23rd International Conference on Automated Software Engineering (ASE), L’Aquila, Italy, 463-466.
  • Grottke M, Trivedi K S (2005) A Classification of Software Faults. Journal of Reliability Engineering Association of Japan, 27(7), 425-438.
  • Guo Y, Sampath S (2008) Web Application Fault Classification - An Exploratory Study. Proceedings of the International Symposium on Empirical Software Engineering and Measurement (ESEM 2008), Kaiserslautern, Germany, 303-305.
  • Jalbert N, Weimer W (2008) Automated Duplicate Detection for Bug Tracking Systems. IEEE International Conference on Dependable Systems & Networks, Anchorage, Alaska, 52-61.
  • Kyriakopoulou A, Kalamboukis T (2006) Text Classification Using Clustering. Proceedings of The 17th European Conference on Machine Learning and the 10th European Conference on Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD),Burlin, Germany, 28- 38.
  • Li W (1992) Random Texts Exhibit Zipf’s-Law-Like Word Frequency Distribution. IEEE Transactions on Information Theory, 38(6), 1842-1845.
  • Mccallum A, Nigam K (1998) A Comparison of Event Models for Naive Bayes Text Classification. Proceedings of the Fifteenth National Conference on Artificial Intelligence (AAAI-98) Workshop on Learning for Text Categorization, Madison, Wisconsin, 41-48.
  • Mozilla (An open-source browser)Bug Repository, available at https://bugzilla.mozilla.org/
  • MySql - A free relational database management system, Bug Repository, available at http://bugs.mysql.com/
  • Nagwani N K, Verma S (2012) A Frequent Term Based Approach for Generating Discriminative Terms in Software Bug Repositories. IEEE 1st International Conference on Recent Advances in Information Technology (RAIT – 2012), Dhanbad, Jharkhand, India, 433-435.
  • Nagwani N K, Verma S (2012) CLUBAS: An Algorithm and Java Based Tool for Software Bug Classification Using Bug Attributes Similarities. Journal of Software Engineering and Applications, 5(6), 436-447.
  • Quinlan R (1993) C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Mateo, CA, ISBN 1-55860-238-0, 1-16.
  • Reed W J (2001) The Pareto, Zipf and other power laws. Economics Letters, 74(1), 15-19.
  • Vapnik V (1995) The Nature of Statistical Learning Theory. Springer-Verlag, ISBN:0-387-94559-8, 138-167.
  • Weka, available at http://www.cs.waikato.ac.nz/ml/weka/

Abstract Views: 412

PDF Views: 121




  • On Studying the Effect of Sample Size in Evaluation of Bug Classifiers

Abstract Views: 412  |  PDF Views: 121

Authors

Naresh Kumar Nagwani
Computer Science & Engineering, National Institute of Technology Raipur
Shrish Verma
Electronics & Tel. Communication Engg. National Institute of Technology Raipur

Abstract


Sampling is an important and necessary step in mining large size databases and is also very useful in performing mining operations, where performance is a critical issue. This study focuses on identifying the effect of sample size in classification of software bugs. To analyze the effect of sample size, experiments are performed using a number of classification algorithms with varities of sample sizes using the software bug repositories of three large open source software's namely Android, Mozilla and MySql. The relationship between the sample size with two primary classification performance parameters accuracy and F-measure is explored in this study. From experiments, it is identified that the parameter F-measure is affected more by the sample size than accuracy.

Keywords


Sampling, Sample Size, Classification, Software Bug, Performance, Classifier Evaluation

References





DOI: https://doi.org/10.17485/ijst%2F2013%2Fv6i1%2F30553