A B C D E F G H I J K L M N O P Q R S T U V W X Y Z All
Zaman, Majid
- A Framework for Class Imbalance Problem Using Hybrid Sampling
Authors
1 University of Kashmir, IN
2 Dept. of Computer Science, University of Kashmir, IN
Source
Artificial Intelligent Systems and Machine Learning, Vol 10, No 4 (2018), Pagination: 83-89Abstract
The skewness in underlying data distribution is natural in most of the datasets that are generated in real world applications and such datasets are commonly known as class imbalanced datasets. The examples of one class are very less in number than the examples in other class(es).The multi-class classification with imbalanced datasets has grabbed much attention by data mining and machine learning research communities in recent years. The main aim of this paper is to ameliorate the classification performance of minority class without reducing the classification performance of majority class(es).The problem is studied as one while as usually the researchers have studied it as two individual problems multi-class and imbalance problem.
This paper addresses the problem by devising a novel framework based on data solution (Random Hybrid Sampling) and well known binarization algorithm (OVO-Binarization).Eventually performance improvement of our frame work is shown using several performance measures such as Precision,Recall,F1-score and G-Mean on benchmark data-sets imported from UCI machine learning repository.
Keywords
Class Imbalance, Classification, Over-Sampling, Under-Sampling, OVO-Binarization, Performance Metrics.References
- T. W. Liao, “Classification of weld flaws with imbalanced class data,”Expert Syst. Appl., vol. 35, no. 3, pp. 1041–1052, Oct. 2008.
- X.-M. Zhao, X. Li, L. Chen, and K. Aihara, “Protein classification with imbalanced data,” Proteins: Structure, Function, and Bioin-formatics,vol. 70, no. 4, pp. 1125–1132, Mar. 2008.
- A. C. Tan, D. Gilbert, and Y. Deville, “Multi-class protein fold classification using a new ensemble machine learning approach,” Genome Inf.,vol. 14, pp. 206–217, 2003.
- N. V. Chawla, N. Japkowicz, and A. Kotcz, “Editorial: Special issue on learning from imbalanced data sets,” ACM Sigkdd Explo-rations Newslett., vol. 6, no. 1, pp. 1–6, 2004.
- L. Breiman, “Bagging predictors,” Mach. Learn., vol. 24, no. 2,pp. 123–140, 1996.
- Y. Freund and R. E. Schapire, “Experiments with a new boosting algorithm,” in Proc. Int. Conf. Mach. Learn., 1996, vol. 96, pp. 148–156.
- N. V. Chawla, A. Lazarevic, L. O. Hall, and K. W. Bow-yer,“SMOTEBoost: Improving prediction of the minority class in boosting,” in Proc. 7th Eur. Conf. Principles Practice Knowl. Dis-covery Databases, 2003, pp. 107–119.
- M. V. Joshi, V. Kumar, and R. C. Agarwal, “Evaluating boosting algorithms to classify rare classes: Comparison and improve-ments,” in Proc. IEEE Int. Conf. Data Mining, 2001, pp. 257–264.
- Z.-H. Zhou and X.-Y. Liu, “Training cost-sensitive neural net-works with methods addressing the class imbalance problem,” IEEE Trans. Knowl. Data Eng., vol. 18, no. 1, pp. 63–77, Jan. 2006
- G. Ou and Y. L. Murphey, “Multi-class pattern classification using neural networks,” Pattern Recognit., vol. 40, no. 1, pp. 4–18, Jan. 2007.
- R. Rifkin and A. Klautau, “In defense of one-vs-all classification,” J. Mach. Learn. Res., vol. 5, pp. 101–141, Dec. 2004.
- R. Jin and J. Zhang, “Multi-class learning by smoothed boosting,” Mach. Learn., vol. 67, no. 3, pp. 207–227, Jun. 2007.
- H. Valizadegan, R. Jin, and A. K. Jain, “Semi-supervised boosting for multi-class classification,” Mach. Learn. Knowl. Discovery Databases, vol. 5212, pp. 522–537, 2008.
- T. Hastie, R. Tibshirani, et al., “Classification by pairwise cou-pling,” The Ann. Statist., vol. 26, no. 2, pp. 451–471, 1998.
- A. Frank and A. Asuncion. (2010). UCI machine learning reposi-tory [Online]. Available: http://archive.ics.uci.edu/ml
- J. Alcal a, A. Fern andez, J. Luengo, J. Derrac, S. Garc# ıa, L. S# anchez, and F. Herrera, “KEEL data-mining software tool: Data set repository, integration of algorithms and experimental analysis frame-work,” J. Multiple-Valued Logic Soft. Comput., vol. 17, pp. 255–287,2010.
- A. Fern# andez, V. L “Analysing the classification of imbalanced data-sets with multi-ple classes: Binarization techniques and ad-hoc approaches,” Knowl.-Based Syst., vol. 42, pp. 97–110, 2013.
- H. He and E. A. Garcia, “Learning from imbalanced data,” IEEE Trans. Knowl. Data Eng., vol. 21, no. 9, pp. 1263–1284, Sep. 2009.
- G. M. Weiss and F. Provost, “The Effect of Class Distribution on Classifier Learning: An Empirical Study,” Department of Comput-er Science, Rutgers University, New Jersey, Tech. Rep. ML-TR-44, 2001.
- I. Tomek, “Two modifications of CNN,” IEEE Trans. Syst., Man Cybern., vol. 6, no. 11, pp. 769–772, Nov. 1976.
- D. Wilson, “Asymptotic properties of nearest neighbor rules using edited data,” IEEE Trans. Syst., Man Cybern., no. 3, pp. 2:408–421, 1972.
- P. E. Hart, “The condensed nearest neighbor rule (corresp.),” IEEE Trans. Inf. Theory, vol. 14, no. 3, pp. 515–516, May 1968.
- M. Kubat, S. Matwin, et al., “Addressing the curse of imbalanced training sets: One-sided selection,” in Proc. 14th Int. Conf. Mach.Learn., 1997, vol. 97, pp. 179–186.
- G. E. A. P. A. Batista, R. C. Prati, and W. C. Monard, “A study of the behavior of several methods for balancing machine learning training data,” ACM Sigkdd Explorations Newslett., vol. 6, no. 1, pp. 20–29, 2004.
- R. C. Prati, G. E. A. P. A. Batista, and M. C. Monard, “Class im-balances versus class overlapping: An analysis of a learning sys-tem behavior,” in Proc. Adv. Artif. Intell., 2004, pp. 312–321.
- N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: Synthetic minority over-sampling technique,” J. Artif. Intell. Res., no. 16, pp. 341–378, 2002.
- B. X. Wang and N. Japkowicz, “Imbalanced data set learning with synthetic samples,” in Proc. IRIS Mach. Learn. Workshop, Otta-wa,Canada, Jun. 2004.
- C. Bunkhumpornpat, K. Sinapiromsaran, and C. Lursinsap, “Safe-level-smote: Safe-level-synthetic minority over-sampling tech-nique for handling the class imbalanced problem,” in Proc. 13thPacific-Asia Conf. Adv. Knowl. Discovery Data Mining, 2009, pp. 475–482.
- X. Fan, K. Tang, and T. Weise, “Margin-based over-sampling method for learning from imbalanced datasets,” in Proc. 15th Pa-cific-Asia Conf. Adv. Knowl. Discovery Data Mining, 2011, pp. 309–320.
- H. Han, W.-Y. Wang, and B.-H. Mao, “Borderline-SMOTE: A new over-sampling method in imbalanced data sets learning,” in Proc. Int. Conf. Adv. Intell. Comput., 2005, pp. 878–887.
- H. He, Y. Bai, E. A. Garcia, and S. Li, “ADASYN: Adaptive syn-thetic sampling approach for imbalanced learning,” in Proc. IEEE-Int. Joint Conf. Neural Netw (IEEE World Congress Comput. In-tell)., 2008, pp. 1322–1328.
- K. Puntumapon and K. Waiyamai, “A pruning-based approach for searching precise and generalized region for synthetic minority over-sampling,” in Proc. 16th Pacific-Asia Conf. Adv. Knowl.Discovery Data Mining, 2012, pp. 371–382.
- Haibo He, Yang Bai, Edwardo A. Garcia, and Shutao Li. ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In Neural Networks, 2008. IJCNN 2008. (IEEE World Congress onComputational Intelligence). IEEE International Joint Conference on, pages 1322–1328. IEEE, 2008.
- Francisco Fernández-Navarro, César Hervás-Martı́nez, and Pe-dro Antonio Gutiérrez. A dynamic over-sampling procedure based on sensitivity for multi-class problems. Pattern Recognition, 44(8):1821–1833, 2011.
- Minlong Lin, Ke Tang, and Xin Yao. Dynamic sampling approach to training neural networks for multiclass imbalance classification. Neural Networks and Learning Systems, IEEE Transactions on, 24(4):647–660, 2013.
- R.C. Holte, L. Acker, and B.W. Porter, “Concept Learning and the Problem of Small Disjuncts,”Proc. Int’l J. Conf. ArtificialIntelli-gence,pp. 813-818, 1989.
- D. Mease, A.J. Wyner, and A. Buja, “Boosted Classification Trees and Class Probability/QuantileEstimation,”J. Machine Learning Research,vol. 8, pp. 409-439, 2007.
- C. Drummond and R.C. Holte, “C4.5, Class Imbalance, and Cost Sensitivity: Why Under Sampling Beats Over-Sampling,”Proc.Int’l Conf. Machine Learning, Workshop Learning from Imbalanced Data Sets II,2003.
- Albert Orriols-Puig and Ester Bernadó-Mansilla. Evolutionary rule-based systems for imbalanced data sets. Soft Computing, 13(3):213–225, 2009.
- Boser, Bernhard E., Isabelle M. Guyon, and Vladimir N. Vapnik. "A training algorithm for optimal margin classifiers." Proceedings of the fifth annual workshop on Computational learning theory. ACM, 1992.
- Scikit-learn: Machine Learning in Python, Pedregosaet al., JMLR 12, pp. 2825-2830, 2011.
- C. V. Rijsbergen. Information Retrieval. London, Butterworths, 1979.
- MiroslavKubat, Robert Holte, and Stan Matwin. Learning when negative examples abound. In Machine Learning: ECML-97, pag-es146–153. Springer, 1997.
- Yanmin Sun, Mohamed S. Kamel, and Yang Wang. Boosting for learning multiple classes with imbalanced class distribution.In Da-ta Mining, 2006. ICDM’06. Sixth International Conference on,pages 592–602. IEEE, 2006.
- Sidiq, S. J., Zaman, M., & Butt, M. (2017). An Experimental Comparison of Extensible Algorithms for Multi-class Imbalance Problem.
- Sidiq, S. J., Ahmed, M., & Ashraf, M. (2017). An Empirical Com-parison of Supervised Classifiers for Diabetic Diagnosis. Interna-tional Journal, 8(1).
- A Comprehensive Review on Class Imbalance Problem
Authors
1 University of Kashmir, IN
2 Dept. of Computer Science, University of Kashmir, IN
Source
Artificial Intelligent Systems and Machine Learning, Vol 10, No 3 (2018), Pagination: 59-65Abstract
Classification of imbalanced data distribution using the standard learning algorithms which assume a relatively equal misclassification costs and relatively balanced underlying class distribution has encountered a serious drawbacks. This paper presents a comprehensive review of learning from Class imbalanced data. Our aim is to provide a review of the class imbalance problem, the state-of-art techniques and the performance measurement metrics used for evaluation under class imbalance scenario. Class imbalance problem in presence of multiple classes is also discussed.
Keywords
Class Imbalance, Classification, Multi-Class, Performance Measures.- Role of Effectual Predictors of Academic Achievement:An Analytical Study on Select Course Adopted
Authors
1 Department of Computer Science, University of Kashmir, IN
2 University of Kashmir, IN
Source
Artificial Intelligent Systems and Machine Learning, Vol 10, No 4 (2018), Pagination: 98-101Abstract
The contemporary challenges in higher education that are positioned under limelight include academic achievement, teaching, learning activities and the overall development of students. The intend of this investigation is to discover imperative facets within a precise course (Bachelors of Arts) that may designate which variables/predictors are possibly to effect and optimize academic performance. Therefore, quantitative and statistical techniques such as discriminant and analysis of variance (ANOVA) were utilized to explore imperative characteristics of students responsible for their success. Furthermore, these techniques were deployed in the realm of academic mining keeping in view their novel nature in discovering valid patterns from educational settings. In this study, association among various individual variables of the course and students overall performance were put under examination, to get an insight which elements are accountable for the student’s performance. Moreover, real dataset that was acquired from university of Kashmir was put under investigation, and it was examined that economics as a subject played a predominant role in the overall performance of the students.
Keywords
Knowledge Discovery, Educational Data Mining, Discriminant, ANOVA, Correlation, Structure Matrix.References
- BakerRSJd, Yacef K. The state of educational data mining in 2009. A review and future visions. J EduData Min 2009.
- Romero C, Ventura S, Pechenizky M, Baker R. Handbook of educational data mining. Data Mining and Knowledge Discovery Series. Boca Raton, FL: Chapman and Hall/CRCpress; 2010.
- J. Han and M. Kamber, Data mining: Concepts and Techniques. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 2000.
- Klecka, W.R. 1980. Discriminant analysis. Sage, Beverley Hills.
- X.chen, M.Vorvoreanu, K.Madhavan, Mining Social Media Data for Understanding Student’s Learning Experiences, Ieeexplore.Ieee.Org, 7(3), 2014, 246–259.
- Sheikh, L., Tanveer, B. and Hamdani, S. 2004. Interesting measures for mining association rules. IEEE-NMIC Conference. held at Lahore (Pakistan), 24−26 Dec. 2004.
- Romero, C. and Ventura, S. (2007) ‘Educational data Mining: A Survey from 1995 to 2005’, Expert Systems pp. 135-146.with Applications (33),
- El-Halees, A. 2009. Mining students data to analyze learning behavior: a case study. https://uqu.edu.sa/files2/tiny_mce/plugins/fi lemanager/fi les/30/papers/f158.pdf.
- Kifaya. 2009. Mining student evaluation using associative classification and clustering. Communications of the IBIMA. 11, IISN 1943−7765.
- Ayesha, S., Mustafa, T., Sattar, A.R. and Khan, M.I. 2010. Data mining model for higher education system. European Journal of Scientific Research. 43(1): pp. 24−29.
- Sunil Kumar, P., Panda, A.K. and Jena, D.2013. “Mining the factors affecting the high school dropouts in rural areas”, International Journal of Advance Computer Engineering and Communication Technology (IJACECT), 2(1); pp. 1−6.
- Sembiring, S., Zarlis, M., Hartama, D., Ramliana, S., & Wani, E. (2011, April). Prediction of student academic performance by an application of data mining techniques. In International Conference on Management and Artificial Intelligence IPEDR (Vol. 6, No. 1, pp. 110-114).
- Sahedani, K., and B. Reddy. "A Review: Mining Educational Data to Forecast Failure of Engineering Students." International Journal of Advanced Research in Computer Science and Software Engineering 3.12 (2013).
- Alaskar, K. M., Prashant G. Tandale, and A. A. Basade. "Data Mining Applications in Higher Education." Proceedings of National Conference on Emerging Trends: Innovations and Challenges in IT. Vol. 19. 2013.
- Prabha, S. Lakshmi, and AR Mohamed Shanavas. "Educational data mining applications." Operations Research and Applications: An International Journal (ORAJ) 1.1 (2014): 1-6.
- Fatima, D., Sameen Fatima, and AV Krishna Prasad. "A survey on research work in educational data mining." IOSR Journal of Computer Engineering (IOSR-JCE) 17.2 (2015): 43-49.