Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

Normalization and Feature Selection Using Ensemble Methods for Crop Yield Prediction


Affiliations
1 Department of Computer Science, Sri Ramakrishna College of Arts and Science for Women, India
     

   Subscribe/Renew Journal


In machine learning study proposes an ensemble-based strategy for both feature selection and data standardization to enhance model performance and interpretability. To maintain consistency across datasets, it employ average filling and weighted K-means clustering. Weighted K-means assigns distinct values to samples based on their distances to cluster centers, offering a more precise representation of the data distribution. Meanwhile, average filling replaces missing values with the average of corresponding features, ensuring a complete dataset for subsequent analysis. For feature selection, adopt an ensemble approach that combines Random Forest (RF) with Logistic Regression (LR) and ElasticNet. RF captures feature importance through tree-based analysis, while LR and ElasticNet provide additional insights into feature relevance and coefficients. This amalgamation aims to provide a comprehensive understanding of feature importance within the dataset. Principal Component Analysis (PCA) is employed to reduce dataset complexity while preserving key properties, facilitating more effective feature selection. By identifying orthogonal components that best explain data variation, PCA enables efficient representation and feature selection. In the final stage, Support Vector Machines (SVM) are utilized for categorization. SVM, a powerful classification method, establishes strong decision boundaries that optimize the gap between classes. Leveraging the selected features, the SVM model effectively categorizes new instances.

Keywords

Dataset Normalization, Feature Selection, Weighted K-Means Clustering, Decision Tree Regressor, Random Forest.
Subscription Login to verify subscription
User
Notifications
Font Size


  • Normalization and Feature Selection Using Ensemble Methods for Crop Yield Prediction

Abstract Views: 181  |  PDF Views: 0

Authors

A. Chitradevi
Department of Computer Science, Sri Ramakrishna College of Arts and Science for Women, India
N. Tajunisha
Department of Computer Science, Sri Ramakrishna College of Arts and Science for Women, India

Abstract


In machine learning study proposes an ensemble-based strategy for both feature selection and data standardization to enhance model performance and interpretability. To maintain consistency across datasets, it employ average filling and weighted K-means clustering. Weighted K-means assigns distinct values to samples based on their distances to cluster centers, offering a more precise representation of the data distribution. Meanwhile, average filling replaces missing values with the average of corresponding features, ensuring a complete dataset for subsequent analysis. For feature selection, adopt an ensemble approach that combines Random Forest (RF) with Logistic Regression (LR) and ElasticNet. RF captures feature importance through tree-based analysis, while LR and ElasticNet provide additional insights into feature relevance and coefficients. This amalgamation aims to provide a comprehensive understanding of feature importance within the dataset. Principal Component Analysis (PCA) is employed to reduce dataset complexity while preserving key properties, facilitating more effective feature selection. By identifying orthogonal components that best explain data variation, PCA enables efficient representation and feature selection. In the final stage, Support Vector Machines (SVM) are utilized for categorization. SVM, a powerful classification method, establishes strong decision boundaries that optimize the gap between classes. Leveraging the selected features, the SVM model effectively categorizes new instances.

Keywords


Dataset Normalization, Feature Selection, Weighted K-Means Clustering, Decision Tree Regressor, Random Forest.

References