Author Details

Scroll

Refine your search

Collections

Engineering Collection

Co-Authors

Journals

International Journal of Advanced Networking and Applications

Year

2021

Authors

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z All

Malik, Maaz Rasheed

A Model Vector Machine Tree Classification for Software Fault Forecast Model (TSMO/TSVM)

Abstract Views :178 | PDF Views:1

Authors

Maaz Rasheed Malik ¹, Liu Yining ¹, Salahuddin Shaikh ²

Affiliations
1 Dept. of Information Communication Engineering, Guilin University of Electronic Technology, Guilin, CN
2 School of Control & Computer Engineering, North China Electric Power University, Beijing, CN

Source

International Journal of Advanced Networking and Applications, Vol 12, No 4 (2021), Pagination: 4650-4655

Abstract

Many researchers have worked on the Software fault forecast model because the software fault forecast is very important in software development projects. In terms of the Software fault forecast model, earlier researchers have examined defective datasets models with the help of metrics and classification methods. Classification is assuming an exceptionally major job in the software fault forecast model, which is an important issue in data mining. The machine learning system as a finding way for the information securing or information extraction issue has examined it widely. The contribution to a classifier is a training data set of precedents, every one of which is labeled with a class name. Classification separates data tests into target classes. Software modules are categorized as defected models or not defected models by classification draws near. In Classification, class categories are known thus it is a supervised learning approach. In our research, software fault forecast datasets models are examined with the help of tree vector machine classification. Our proposed model is a tree vector machine, which is used for increasing the positive accuracy and efficiency of the software fault forecast model. We have used multiple tree classifiers for getting more accurate results and compare them with each other. During the analysis of the experiments, j48, random forest and random tree have increased their performance in accuracy as well as efficiency. However, the performance of REP Tree, Hoeffding Tree and Decision Stump is not so good at all measure rates. The experiment's analysis results showed that not every tree classifier could be good in all in measure unit.

Keywords

Software, Fault Forecast, Classification, Defect prone, Support Vector Machine, J48, Random Tree.

Full Text

References

Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine learning,20(3), 273-297.

Dietterich, T. G., & Bakiri, G. (1995). Solving multiclass learning problems via error-correcting output codes. arXiv preprint cs/9501101.

K.O. Elish, M.O. Elish, Predicting defect-prone software modules using support vector machines, Journal of Systems and Software 81 (2008) 649–660.

L. Etzkorn, J. Bansiya, C. Davis, Design and code complexity metrics for OO classes, Journal of ObjectOriented Programming 12 (1999) 35–40.

N. Fenton, S. Pfleeger, Software Metrics: A Rigorous and Practical Approach, vol. 5, PWS Publishing Company (An International Thomson Publishing Company), 1997.

F. Garcia, M. Bertoa, C. Calero, A. Vallecillo, F. Ruiz, M. Piattini, M. Genero,Towards a consistent terminology for software measurement, Information and Software Technology 48 (2006) 631–644.

Fenton, N. E., Marsh, W., Neil, M., Cates, P., Forey, S. and Tailor, T. Making Resource Decisions for Software Projects. In Proceedings of 26th International Conference on Software Engineering (ICSE 2004), (Edinburgh, United Kingdom, May 2004) IEEE Computer Society 2004, ISBN 0- 76952163-0, 397-406

Fenton, N. E. and Neil, M. A Critique of Software Defect Prediction Models, IEEE Transactions on Software Engineering, 25(5), 675-689, 1999.

Alenezi, , et al, “Efficient bug triaging using text mining. Journal of Software “8;2003. no. 9.

Runeson, , et al, “Detection of duplicate defect reports using natural language processing”, In Software Engineering, 2007. ICSE 2007. 29th International Conference on., IEEE ; 2007.pp. 499-510.

S. Adiu , et al , “Classification of defects in software using decision tree algorithm”, International Journal of Engineering Science and Technology (IJEST), Vol.5, Issue 6, pp. 1332-1340. 12.

Murphy, Gail C., and D. Cubranic. Automatic bug triage using text categorization. In Proceedings of the Sixteenth International Conference on Software Engineering & Knowledge Engineering; 2004.

Anvik, et al, Who should fix this bug?. InProceedings of the 28th international conference on Software engineering,. ACM ;2006. pp. 361-370 .

Cathrin et al, 2007, “How Long will it Take to Fix This Bug? Intl. conf. on software engineering”, IEEE Computer Society Wanshington, DC, USA, pp. 1-8.

Cathrin et al, “Predicting Effort to Fix Software Bugs”, Proceedings of the 9th Workshop Software Reengineering,2007.

Sunghun Kim, Kai Pan, E. James Whitehead, Jr., 2006, Memories of Bug Fixes, SIGSOFT'06/FSE-14, November 5–11, Portland, Oregon, USA.

Menzies, et al, “Automated severity assessment of software defect reports in Software Maintenance”, 2008. ICSM 2008. IEEE International Conference on, IEEE; 2008.pp. 346-355.

T. Gyim´othy, R. Ferenc, and I. Siket. Empirical validation of objectoriented metrics on open source software for fault prediction. IEEE Transactions on Software Engineering (TSE), 31(10):897–910, 2005.

M. A. Hall. Correlation-based feature selection for machine learning.1999.

T. Hall, S. Beecham, D. Bowes, D. Gray, and S. Counsell. A systematic literature review on fault prediction performance in software engineering. IEEE Transactions on Software Engineering, 38(6):1276– 1304, 2012.

J. A. Hanley and B. J. McNeil. The meaning and use of the area under a receiver operating characteristic (roc) curve. Radiology, 143(1):29–36, 1982.

A. E. Hassan. Predicting faults using the complexity of code changes. In ICSE, pages 78–88, Vancouver, Canada, 2009. IEEE Press.

H. Hata, O. Mizuno, and T. Kikuno. Bug prediction based on finegrained module histories. In Proceedings of the 34th International Conference on Software Engineering, pages 200–210. IEEE Press, 2012.

C. Huang, L. Davis, and J. Townshend. An assessment of support vector machines for land cover classification. International Journal of remote sensing, 23(4):725–749, 2002.

Q. Huang, X. Xia, and D. Lo. Supervised vs unsupervised models: A holistic look at effort-aware just-in-time defect prediction. In Software Maintenance and Evolution (ICSME), 2017 IEEE International Conference on, pages 159–170. IEEE, 2017.

Z. Jiang and S. Sarkar. Free software offer and software diffusion: The monopolist case. ICIS 2003 proceedings, page 81, 2003.

Y. Kamei, S. Matsumoto, A. Monden, K.-i. Matsumoto, B. Adams, and A. E. Hassan. Revisiting common bug prediction findings using effort-aware models. In Software Maintenance (ICSM), 2010 IEEE International Conference on, pages 1–10. IEEE, 2010.

Y. Kamei, E. Shihab, B. Adams, A. E. Hassan, A. Mockus, A. Sinha, and N. Ubayashi. A large-scale empirical study of just-in-time quality assurance. IEEE Transactions on Software Engineering, 39(6):757–773, 2013.

W. M. Khaled El Emam and J. C. Machado. The prediction of faulty classes using object-oriented design metrics. Journal of Systems and Software, 56(1):63–75, 2001.

F. Khomh, M. Di Penta, Y.-G. Gu´eh´eneuc, and G. Antoniol. An exploratory study of the impact of antipatterns on class change-and faultproneness. Empirical Software Engineering, 17(3):243–275, 2012.

S. Kim, E. J. Whitehead Jr, and Y. Zhang. Classifying software changes: Clean or buggy? IEEE Transactions on Software Engineering, 34(2):181–196, 2008.

J. Kittler et al. Pattern recognition. a statistical approach. 1982.

S. Kpodjedo, F. Ricca, P. Galinier, Y.-G. Gu´eh´eneuc, and G. Antoniol. Design evolution metrics for defect prediction in object oriented systems. Empirical Software Engineering, 16(1):141– 175, 2011.

K. Krippendorff. Content analysis: An introduction to its methodology. Sage, 2004.

M. Lanza, A. Mocci, and L. Ponzanelli. The tragedy of defect prediction, prince of empirical software engineering research. IEEE Software, 33(6):102–105, 2016.

M. M. Lehman and L. A. Belady. Program evolution: processes of software change. Academic Press Professional, Inc., 1985.

C. Lewis, Z. Lin, C. Sadowski, X. Zhu, R. Ou, and E. J. Whitehead Jr. Does bug prediction support human developers? Findings from a Google case study. In Proceedings of the 2013 International Conference on Software Engineering, ICSE 2013, pages 372–381. IEEE Press, 2013.

Fenton, N. E. and Neil, M. SCULLY: Scaling up Bayesian Nets for Software Risk Assessment, Queen Mary University of London, www.dcs.qmul.ac.uk/research/radar/Projects, 2001.

Fenton, N. E. and Pfleeger, S.L. Software Metrics: A Rigorous and Practical Approach (2nd Edition), PWS, ISBN: 0534-95429-1, 1998.

Jensen, F.V. An Introduction to Bayesian Networks, UCL Press, 1996.

Jones, C. Programmer Productivity, McGraw Hill, 1986.

Jones, C. Software sizing, IEE Review 45(4), 165167, 1999.

Koller, D., Lerner, U. and Angelov, D. A General Algorithm for Approximate Inference and its Application to Hybrid Bayes Nets, In Proceedings of the 15th Annual Conference on Uncertainty in AI (UAI), Stockholm, Sweden, August 1999, pages 324—333

Kozlov, A.V. and Koller, D. Nonuniform dynamic discretization in hybrid networks, Proceedings of the 13th Annual Conference on Uncertainty in AI (UAI), Providence, Rhode Island, August 1997, pages 314325.

I. Gondra, Applying machine learning to software fault-proneness prediction, Journal of Systems and Software 81 (2008) 186–195.

L.A. Goodman, Snowball sampling, The Annals of Mathematical Statistics 32 (1961) 148–170.

G.a. Hall, J.C. Munson, Software evolution: code delta and code churn, Journal of Systems and Software 54 (2000) 111–118.

T. Hall, S. Beecham, D. Bowes, D. Gray, S. Counsell, A systematic literature review on fault prediction performance in software engineering, IEEE Transactions on Software Engineering (2011) 1–31.

M.H. Halstead, Elements of Software Science, Elsevier Science Inc., New York, NY, USA, 1977.

B. Henderson-Sellers, Software Metrics, PrenticeHall, Hemel Hempstaed, UK, 1996.

M. Hitz, B. Montazeri, Measuring coupling and cohesion in object-oriented systems, in: Proceedings of the International Symposium on Applied Corporate Computing, vol. 50, 1995, pp. 75–76.

J.K. Methew ISO/IEC, IEEE, ISO/IEC 12207:2008 systems and software engineering software life cycle processes, 2008.

J.M. Juran, F.M. Gryna, Juran’s Quality Control Handbook, McGraw-Hill, 1988.

S. Kanmani, V.R. Uthariaraj, V. Sankaranarayanan, P. Thambidurai, Objectoriented software fault prediction using neural networks, Information and Software Technology 49 (2007) 483–492.

T.M. Khoshgoftaar, N. Seliya, Comparative assessment of software quality classification techniques: an empirical case study, Empirical Software Engineering 9 (2004) 229–257.

Gnanambal, Thangaraj. Classification algorithm with attribute selction: an evaluation study WEKA. Int. J. Advanced Networking and Applications Volume: 09 Issue: 06 Pages: 3640-3644 (2018) ISSN: 0975-0290.

Raghavendra B. K., Dr. Jay B. Simha. Evaluation of Feature Selection Methods for Predictive Modeling Using Neural Networks in Credits Scoring. Int. J. Advanced Networking and Applications Volume:02, Issue: 03, Pages: 714-718 (2010)

An Empirical and Comparatively Research on Under-Sampling & Over-Sampling Defect-Prone Data-Sets Model in Light of Machine Learning

Abstract Views :148 | PDF Views:1

Authors

Salahuddin Shaikh ¹, Liu Changan ¹, Maaz Rasheed Malik ²

Affiliations
1 School of Control & Computer Engineering, North China Electric Power University, Beijing, CN
2 Dept. of Information Communication Engineering, Guilin University of Electronic Technology, Guilin, CN

Source

International Journal of Advanced Networking and Applications, Vol 12, No 5 (2021), Pagination: 4719-4724

Abstract

The few researchers have put their ideas about class-imbalance during analysis of datasets, two types of class imbalances are present in datasets. First type in which some classes have many models than others and that is called between class imbalance. Second type in which few subsets of one class have less models than other subsets of similar class and that is within class-imbalance. Over-sampling and Under-sampling innovation assume noteworthy jobs in tackling the class-imbalance issue. There are numerous dissimilarities of over-sampling and under-sampling methods which utilized for class imbalanced dataset model. We have used two sampling techniques in our research paper for our imbalanced datasets models. One is over-sampling using SMOTE technique and another one is under-sampling using spread-sub-sample. During experiments, all results are measured in evaluation performance measure. Mostly they all are class imbalanced measurements, in which precision, recall, f-measure, area under curve and 12 different classifiers we have used in our experiments to get the comparatively results of both sampling techniques. The over-all analysis showed that the efficiency of correctly classified in over-sampling techniques is enhanced in few classifiers as compared to under-sampling techniques. The TP-rate and positive accuracy of both techniques, the stacking is worst classifier in these experiments and multi classification and LMT couldn’t increase the TP-rate in under-sampling techniques. The over-all comparative analysis of both techniques as compared with without using sample techniques have increased but over-sampling technique is more valuable to use for solving the class imbalance issue.

Keywords

- Software prediction, Under-sampling, Over-sampling, Sampling, Class imbalance, Defect-Prone.

Full Text

References

Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine learning,20(3), 273-297.

M. Kubat and S. Matwin, “Addressing the curse of imbalanced training sets: One sided Selection”, In Proceedings of the Fourteenth International Conference on Machine Learning, Nashville, Tennesse, Morgan Kaufmann,1997, pp. 179-186.

N. V. Chawla, L. O. Hall, K. W. Bowyer, and W. P.Kegelmeyer, “SMOTE: Synthetic Minority Oversampling Technique”, Journal of Artificial Intelligence Research, 16, 2002, pp. 321-357.

H. Han, W.Y. Wang, and B.H. Mao, “BorderlineSMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning”, in Proceedings of the International Conference on Intelligent Computing 2005, Part I, LNCS 3644, 2005, pp. 878–887.

G. M. Weiss and F. Provost, “Learning when training data are costly: the effect of class distribution on tree induction”, Journal of Artificial Intelligence Research, 19, 2003, pp. 315-354.

H. Han, L. Wang, M. Wen, and W. Y. Wang, “Oversampling Algorithm Based on Preliminary Classification in Imbalanced Data Sets Learning”, Journal of computer allocations (in Chinese), 2006 Vol.26 No.8, pp.1894-1897.

Miroslav Kubat, Robert C. Holte, and Stan Matwin. 1998. Machine Learning for the Detection of Oil Spills in Satellite Radar Images. Machine Learning 30, 2-3 (1998), 195–215. http: //dblp.unitrier.de/db/journals/ml/ml30.html#KubatHM98

Miroslav Kubat and Stan Matwin. 1997. Addressing the Curse of Imbalanced Training Sets: One-Sided Selection. In In Proceedings of the Fourteenth International Conference on Machine Learning. Morgan Kaufmann, 179–186. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.43.4487

Jorma Laurikkala. 2001. Improving Identification of Difficult Small Classes by Balancing Class Distribution.. In AIME (Lecture Notes in Computer Science), Silvana Quaglini, Pedro Barahona, and Steen Andreassen (Eds.), Vol. 2101. Springer, 63–66. http://dblp.unitrier.de/db/conf/aime/aime2001.html#Laurikkala01; http://dx.doi.org/10.1007/3-540-48229-6 9; http://www.bibsonomy.org/bibtex/299ad2efa02d1ffb2 9dced2ee0d3a23b4/dblp

Guillaume Lemaˆıtre, Fernando Nogueira, and Christos K. Aridas. 2017. Imbalanced-learn: A Python Toolbox to Tackle the Curse of Imbalanced Datasets in Machine Learning. Journal of Machine Learning Research 18, 17 (2017), 1–5. http://jmlr.org/papers/ v18/16-365

Xu-Ying Liu, Jianxin Wu, and Zhi-Hua Zhou. 2006.Exploratory Under-Sampling for Class-Imbalance Learning.. In ICDM. IEEE Computer Society, 965– 969. http://dblp.uni-trier.de/db/conf/icdm/icdm2006.html#LiuWZ06

David Mease, Aj Wyner, and a Buja. 2007. Boosted classification trees and class probability/quantile estimation. The Journal of Machine Learning Research 8 (2007), 409–439. http://dl.acm.org/citation.cfm?id=1248675

Iman Nekooeimehr and Susana K. Lai-Yuen. 2016. Adaptive semiunsupervised weighted oversampling (A-SUWO) for imbalanced datasets. Expert Syst. Appl. 46 (2016), 405–416. http://dblp. unitrier.de/db/journals/eswa/eswa46.html#NekooeimehrL 16

Yuxin Peng. 2015. Adaptive Sampling with Optimal Cost for Class-Imbalance Learning.. In AAAI, Blai Bonet and Sven Koenig (Eds.). AAAI Press, 2921– 2927. http://dblp.uni-trier.de/db/conf/aaai/aaai2015.html#Peng15 Jonathan K. Pritchard, Matthew Stephens, and Peter Donnelly.

2000. Inference of Population Structure Using Multilocus Genotype Data. Genetics 155 (June 2000), 945–959. http://pritch.bsd.uchicago.edu/publications/structure.pdf

Muhammad Atif Tahir, Josef Kittler, and Fei Yan. 2012. Inverse random under sampling for class imbalance problem and its application to multi-label classification. Pattern Recognition 45, 10 (2012), 3738–3750. http://dblp.unitrier.de/db/journals/pr/pr45. html#TahirKY12

Japkowicz, N. Class Imbalance: Are We Focusing on the Right Issue? in Notes from the ICML Workshop on Learning from Imbalanced Data Sets II. 2003.

Chawla, N.V.Data mining for imbalanced datasets: An overview, in Data mining and knowledge discovery handbook 2005, Springer. p. 853-867.

Batista, G.E., R.C. Prati, and M.C. Monard, Balancing strategies and class overlapping, in Advances in Intelligent Data Analysis VI2005, Springer. p.24-35.

Vaasa, S. Ralescu, A. Issues in mining imbalanced data sets -a review paper in Proceedings of the Midwest Artificial Intelligence and Cognitive Science Conference. 2005. Dayton.

Hen, H. and E.A. Garcia, Learning from imbalanced dataKnowledge and Data Engineering, IEEE Transactions on, 2009. 21(9): p. 1263-1284.

Fan, W., Stolfo, S. J., Zhang, J., and Chan, P. K., Adacost: misclassification cost-sensitive boosting, in MACHINE LEARNING-INTERNATIONAL WORKSHOP THEN CONFERENCE-, pages 97–105, Cite- seer, 1999.

Domingos, P., Metacost: a general method for making classifiers cost-sensitive, in Proceedings of the fifth ACM SIGKDD international conference onKnowledge discovery and data mining, pages 155– 164, ACM, 1999.

Kotsiantis, S. et al., GESTS International Transactions on Computer Science and Engineering 30 (2006) 25.

He, H. and Garcia, E. A., Knowledge and Data Engineering, IEEE Transactions on 21 (2009) 1263.

Bhowan, U., Zhang, M., and Johnston, M., Genetic programming for image classification with unbalanced data, in Proceeding of the 24th International Conference Image and Vision Computing New Zealand, IVCNZ ’09, pages 316– 321, Wellington, 2009, IEEE.

Bhowan, U., Johnston, M., and Zhang, M., Differentiating between individual class performance in genetic programming fitness for classification with unbalanced data, in Evolutionary Computation, 2009. CEC’09. IEEE Congress on, pages 2802–2809, IEEE, 2009.

K. Veropoulos, C. Campbell, and N. Cristianini, “Controlling the sensitivity of support vector machines”, in Proceedings of the International Joint Conference on AI, 1999, pp. 55–60.

K.Z. Huang, H.Q. Yang, I. King, and M.R. Lyu,“Learning Classifiers from Imbalanced Data Based on BiasedMinimax Probability Machine”, in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition , 2004.

P. Domingos, “MetaCost: A general method for making classifiers cost-sensitive”, in Proceedings of the Fifth International Conference on Knowledge Discovery and Data Mining, ACM Press, 1999, pp.155-164.

W. Fan, S. J. Stolfo, J. Zhang, and P. K. Chan, “AdaCost: misclassification cost-sensitive boosting”, inmProceedings of the Sixteenth International Conference on Machine Learning, 1999, pp. 99-105.

N. Japkowicz, “Supervised versus unsupervised binary learning by feed forward neural networks”, Machine Learning, 42(1/2), 2001, pp. 97-122.

B. Scholkopf, J. C. Platt, J. Shawe-Taylor, A. J. Smola, and R. C. Williamson, “Estimating the support of a highdimensional distribution”, Neural Computation, 13(7), 2001, pp. 1443-1472.

D. Tax, “One-class classification”, Ph.D. dissertation, Delft University of Technology, 2001.

L. M. Manevitz and M. Yousef, “One-class SVMs for document classification”, Journal of Machine Learning Research, 2, 2001, pp. 139-154.

P. Riddle, R. Segal, and O. Etzioni, “Representation design and brute-force induction in a Boeing manufacturing design”, Applied Artificial Intelligence, 8, 1994, pp. 125-147.Lucia, A.D., Fasano, F., Grieco, C., Tortora, G.: Recovering design rationale from email repositories. In: Proceedings of ICSM 2009 (25th IEEE International Conference on Software Maintenance), IEEE CS Press (2009)

Pattison, D., Bird, C., Devanbu, P.: Talk andWork: a Preliminary Report. In: Proceedings of the Fifth International Working Conference on Mining Software Repositories, ACM (2008) 113–116.

Maaz Rasheed Malik, Liu Yining, “A Model Vector Machine Tree Classification For Software Fault Forecast Model (TSMO/TSVM)” IJANA JOURNAL, VOLUME 12, ISSUE 4, Page No : 4650-4655, DOI :10.35444/IJANA.2021.12407.

Username
Password
Remember me