Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

Normalization and Feature Selection Using Ensemble Methods for Crop Yield Prediction


Affiliations
1 Department of Computer Science, Sri Ramakrishna College of Arts and Science for Women, India
     

   Subscribe/Renew Journal


In machine learning study proposes an ensemble-based strategy for both feature selection and data standardization to enhance model performance and interpretability. To maintain consistency across datasets, it employ average filling and weighted K-means clustering. Weighted K-means assigns distinct values to samples based on their distances to cluster centers, offering a more precise representation of the data distribution. Meanwhile, average filling replaces missing values with the average of corresponding features, ensuring a complete dataset for subsequent analysis. For feature selection, adopt an ensemble approach that combines Random Forest (RF) with Logistic Regression (LR) and ElasticNet. RF captures feature importance through tree-based analysis, while LR and ElasticNet provide additional insights into feature relevance and coefficients. This amalgamation aims to provide a comprehensive understanding of feature importance within the dataset. Principal Component Analysis (PCA) is employed to reduce dataset complexity while preserving key properties, facilitating more effective feature selection. By identifying orthogonal components that best explain data variation, PCA enables efficient representation and feature selection. In the final stage, Support Vector Machines (SVM) are utilized for categorization. SVM, a powerful classification method, establishes strong decision boundaries that optimize the gap between classes. Leveraging the selected features, the SVM model effectively categorizes new instances.

Keywords

Dataset Normalization, Feature Selection, Weighted K-Means Clustering, Decision Tree Regressor, Random Forest.
Subscription Login to verify subscription
User
Notifications
Font Size

  • O. Araque and C.A. Iglesias, “Enhancing Deep Learning Sentiment Analysis with Ensemble Techniques in Social Applications”, Expert Systems with Applications, Vol. 77, pp. 236-246, 2017.
  • R. Banerjee and M. Singh, “Efficient Genomic Selection using Ensemble Learning and Ensemble Feature Reduction”, Journal of Crop Science and Biotechnology, Vol. 45, No. 1, pp. 1-112, 2020.
  • S. Contiu and A. Groza, “Improving Remote Sensing Crop Classification by Argumentation-Based Conflict Resolution in Ensemble Learning”, Expert Systems with Applications, Vol. 64, pp. 269-286, 2016.
  • H. Elghazel and A. Aussem, “Unsupervised Feature Selection with Ensemble Learning”, Machine Learning, Vol. 98, No. 1-2), pp. 157-180, 2013.
  • N. Fayyazifar and N. Samadiani, “Parkinson’s Disease Detection using Ensemble Techniques and Genetic Algorithm”, Proceedings of International Conference on Artificial Intelligence and Signal Processing, pp. 1-4, 2017.
  • D.P. Gaikwad and R.C. Thool, “Intrusion Detection System using Bagging Ensemble Method of Machine Learning”, Proceedings of International Conference on Computing Communication Control and Automation, pp. 1-6, 2015.
  • I. Kaur and A. Kaur, “A Novel Four-Way Approach Designed with Ensemble Feature Selection for Code Smell Detection”, IEEE Access, Vol. 9, pp. 8695-8707, 2021.
  • I.H. Laradji, M. Alshayeb and L. Ghouti, “Software Defect Prediction using Ensemble Learning on Selected Features”, Information and Software Technology, Vol. 58, pp. 388-402, 2015.
  • A. Moghimi, C. Yang and P.M. Marchetto, “Ensemble Feature Selection for Plant Phenotyping: A Journey from Hyperspectral to Multispectral Imaging”, IEEE Access, Vol. 8, 1-13, 2018.
  • B.T. Pham and I. Prakash, “Coupling RBF Neural Network with Ensemble Learning Techniques for Landslide Susceptibility Mapping”, Catena, Vol. 195, pp. 104805-104814, 2020.
  • J.D. Prusa and A. Napolitano, “Using Feature Selection in Combination with Ensemble Learning Techniques to Improve Tweet Sentiment Classification Performance”, Proceedings of International Conference on Tools with Artificial Intelligence, pp. 1-8, 2015.
  • A. Rai, “Optimizing a New Intrusion Detection System Using Ensemble Methods and Deep Neural Network”, Proceedings of International Conference on Trends in Electronics and Informatics, pp. 1-5, 2020.
  • A. Safiyari and R. Javidan, “Predicting Lung Cancer Survivability using Ensemble Learning Methods”, Proceedings of International Conference on Tools with Artificial Intelligence, pp. 1-6, 2017.
  • V. Shorewala, “Early Detection of Coronary Heart Disease using Ensemble Techniques”, Informatics in Medicine Unlocked, Vol. 67, No. 2, pp. 100655-100665, 2021.
  • S. Tajik, S. Ayoubi and M. Zeraatpisheh, “Digital Mapping of Soil Organic Carbon using Ensemble Learning Model in Mollisols of Hyrcanian Forests, Northern Iran”, Proceedings of International Conference on Geoderma Regional, pp. 1-12, 2020.
  • M. Tan and F. He, “Ultra-Short-Term Industrial Power Demand Forecasting using LSTM based Hybrid Ensemble Learning”, IEEE Transactions on Power Systems, Vol. 88, 1-9, 2020.
  • Z. Tang, W. Cai and C. Han, “An Object-Based Approach for Mapping Crop Coverage using Multiscale Weighted and Machine Learning Methods”, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, Vol. 76, No. 2, pp. 1-14, 2020.
  • A.K. Verma, S. Pal and S. Kumar, “Comparison of Skin Disease Prediction by Feature Selection using Ensemble Data Mining Techniques”, Informatics in Medicine Unlocked, Vol. 65, No. 2, 100202-100209, 2019.
  • B. Weng and W. Martinez, “Predicting Short-Term Stock Prices using Ensemble Methods and Online Data Sources”, Expert Systems with Applications, Vol. 112, pp. 258-273, 2018.
  • I. Yekkala, S. Dixit and M.A. Jabbar, “Prediction of Heart Disease using Ensemble Learning and Particle Swarm Optimization”, Proceedings of International Conference on Smart Technologies for Smart Nation, pp. 1-4, 2017.
  • Y. Zheng, Y. Li and B. Wei, “Feature Selection with Ensemble Learning Based on Improved Dempster-Shafer Evidence Fusion”, IEEE Access, Vol. 8, 1-9, 2019.
  • J.P. Bharadiya and M. Reddy, “Forecasting of Crop Yield using Remote Sensing Data, Agrarian Factors and Machine Learning Approaches”, Journal of Engineering Research and Reports, Vol. 24, No. 12, pp. 29-44, 2023.
  • Z. Zhou, Z. Wu and Y. Qiao, “Comparison of Ensemble Strategies in Online NIR for Monitoring the Extraction Process of Pericarpium Citri Reticulatae Based on Different Variable Selections”, Planta Medica, Vol. 82, No. 1-2, pp. 154-162, 2015.
  • A. Oikonomidis and A. Kassahun, “Hybrid Deep Learning-Based Models for Crop Yield Prediction”, Applied Artificial Intelligence, Vol. 36, No. 1, pp. 2031822-2031829, 2022.
  • B. Panigrahi and M. Sujatha, “A Machine Learning-Based Comparative Approach to Predict the Crop Yield using Supervised Learning with Regression Models”, Procedia Computer Science, Vol. 218, pp. 2684-2693, 2023.
  • M. Kuradusenge, K. Mtonga, A. Mukasine and A. Uwamahoro, “Crop Yield Prediction using Machine Learning Models: Case of Irish Potato and Maize”, Agriculture, Vol. 13, No. 1, pp. 225-237, 2023.
  • S.S. Olofintuyi and D. Olanike, “An Ensemble Deep Learning Approach for Predicting Cocoa Yield”, Heliyon, Vol. 9, No. 4, pp. 1-13, 2023.
  • H.R. Seireg and A. Elmahalawy, “Ensemble Machine Learning Techniques using Computer Simulation Data for Wild Blueberry Yield Prediction”, IEEE Access, Vol. 10, pp. 64671-64687, 2022.
  • H.T. Pham and M. Kuhn, “Evaluation of Three Feature Dimension Reduction Techniques for Machine Learning-Based Crop Yield Prediction Models”, Sensors, Vol. 22, No. 17, pp. 6609-6615, 2022.
  • R. Aworka, F.K. Mutombo, C.L.M. Kimpolo and M. Krichen, “Agricultural Decision System based on Advanced Machine Learning Models for Yield Prediction: Case of East African Countries”, Smart Agricultural Technology, Vol. 2, pp. 1-13, 2022.

Abstract Views: 29

PDF Views: 0




  • Normalization and Feature Selection Using Ensemble Methods for Crop Yield Prediction

Abstract Views: 29  |  PDF Views: 0

Authors

A. Chitradevi
Department of Computer Science, Sri Ramakrishna College of Arts and Science for Women, India
N. Tajunisha
Department of Computer Science, Sri Ramakrishna College of Arts and Science for Women, India

Abstract


In machine learning study proposes an ensemble-based strategy for both feature selection and data standardization to enhance model performance and interpretability. To maintain consistency across datasets, it employ average filling and weighted K-means clustering. Weighted K-means assigns distinct values to samples based on their distances to cluster centers, offering a more precise representation of the data distribution. Meanwhile, average filling replaces missing values with the average of corresponding features, ensuring a complete dataset for subsequent analysis. For feature selection, adopt an ensemble approach that combines Random Forest (RF) with Logistic Regression (LR) and ElasticNet. RF captures feature importance through tree-based analysis, while LR and ElasticNet provide additional insights into feature relevance and coefficients. This amalgamation aims to provide a comprehensive understanding of feature importance within the dataset. Principal Component Analysis (PCA) is employed to reduce dataset complexity while preserving key properties, facilitating more effective feature selection. By identifying orthogonal components that best explain data variation, PCA enables efficient representation and feature selection. In the final stage, Support Vector Machines (SVM) are utilized for categorization. SVM, a powerful classification method, establishes strong decision boundaries that optimize the gap between classes. Leveraging the selected features, the SVM model effectively categorizes new instances.

Keywords


Dataset Normalization, Feature Selection, Weighted K-Means Clustering, Decision Tree Regressor, Random Forest.

References