Open Access Subscription Access
An Approach for Sub Selecting Variables that have Higher Influence on the Outcome in Developing Predictive Model using Staff Turnover
Predictive models are built by learning the combined effects of several independent variables that directly or indirectly influence the outcome. H. Response or dependent variable. In practice, data collection has data on a large number of independent variables that are outcome-sensitive and may or may not be related to the outcome. Some independent variables have a large impact on the results, while others may have little or no impact on the results. The presence of some independent variables that are irrelevant to the outcome can affect the performance of the predictive model. In this context, it is desirable and essential to identify the independent variables that most influence the forecast model to keep it lean and efficient. In this work, we used a dataset containing employee turnover rates and explored how to identify a subset of outcome-sensitive variables, thus eliminating variables that hinder the development of effective predictive models. By partially selectively influencing the independent variables, we developed lean and efficient predictive models that enabled us to act on an actionable subset of the variables to reduce staff turnover, thereby improving corporate save effort and cost.
Predictive model, Sensitive parameter, Dimensionality.
- Aerts, Stein, et al. (2006): “Gene prioritization through genomic data fusion.” Nature biotechnology 24.5: 537.
- André Altmann”, †, Laura Tolo ¸si”,†, Oliver Sander‡ and Thomas Lengauer. (2010): “Permutation importance: a corrected feature importance measure” Vol.26 no.10, pages 1340–1347 doi:10.1093/ bioinformatics/btq134.
- Andrea Bommert, Xudong Sun, Bernd Bischl, JörgRahnenführer, Michel Lang (2020): “Benchmark for filter methods forfeature selection in high-dimensional classification data. Computational Statistics & Data Analysis Volume 143, March 2020, 106839.
- Breiman, Leo, et al. (1984): Book “Classification and regression trees. Belmont, CA: Wadsworth.” International Group: 432.
- Chehata, Nesrine, Li Guo, and Clément Mallet. (2009): “Airborne lidar feature selection for urban classification using random forests.” International Archives of Photogrammetry, Remote Sensing and Spatial Information Sciences 38. Part 3: W8.
- Chen, Tianqi, and Carlos Guestrin. (2016): “Xgboost: A scalable tree boosting system.” Proceedings of the 22nd acmsigkdd international conference on knowledge discovery and data mining. ACM.
- Definition of Algorithm. https://www.merriamwebster. com/dictionary/algorithm.
- Díaz-Uriarte, Ramón, and Sara Alvarez De Andres. “Gene selection and classification of microarray data using random forest.” BMC bioinformatics 7.1 (2006): 3. (2021).
- Friedman, Jerome, Trevor Hastie, and Rob Tibshirani. (2010): “Regularization paths for generalized linear models via coordinate descent.” Journal of statistical software 33.1: 1.
- Griffith, Obi L. Melck, Adrienne, Steven JM Wiseman, Sam M. Jones, and S. M. Wiseman. (2006): “Meta-analysis and meta-review of thyroid cancer gene expression profiling studies identifies important diagnostic biomarkers.” Journal of Clinical Oncology 24.31: 5043-5051.
- Geurts, Pierre, Damien Ernst, and Louis Wehenkel. (2006): “Extremely randomized trees.” Machine learning 63.1: 3-42.
- Guyon, Isabelle, and André Elisseeff. (2003): “An introduction to variable and feature selection.” Journal of machine learning research 3. Mar (2003): 1157-1182.
- Hans, Chris.(2009): “Bayesian lasso regression.” Biometrika 96.4 : 835-845.
- Hoerl, Arthur E., and Robert W. Kennard. (1970): “Ridge regression: Biased estimation for nonorthogonal problems.” Technometrics 12.1: 55-67.
- Kolde, Raivo, et al. (2012): “Robust rank aggregation for gene list integration and meta-analysis.” Bioinformatics 28.4: 573-580.
- Liaw, Andy, and Matthew Wiener. “Classification and regression by randomForest.” R news 2.3 (2002): 18- 22. Predrag Radivojac1, Zoran Obradovic2, A. Keith Dunker1, and Slobodan Vucetic2; J.-F. Boulicaut et al “Feature Selection Filters Based on the Permutation Test”. (Eds.): ECML 2004, LNAI 3201, pp. 334–346, 2004. © Springer-Verlag Berlin Heidelberg 2004
- Menze, Bjoern H., et al. (2009): “A comparison of random forest and its Gini importance with standard chemometric methods for the feature selection and classification of spectral data.” BMC bioinformatics 10.1: 213.
- Molnar, Christoph. 2019: “Interpretable machine learning. A Guide for Making Black Box Models Explainable”, https://christophm.github.io/ interpretable-ml-book/.
- Xing, Eric P., Michael I. Jordan, and Richard M. Karp. (2001): “Feature selection for high-dimensional genomic microarray data.” ICML. Vol.1.
Abstract Views: 51
PDF Views: 0