Normalization and feature selection using ensemble methods for crop yield prediction
Chitradevi, A.
Normalization and feature selection using ensemble methods for crop yield prediction - Vol.14(3), Jan - Chennai ICT Academy 2024 - 3293-3303p.
In machine learning study proposes an ensemble-based strategy for
both feature selection and data standardization to enhance model
performance and interpretability. To maintain consistency across
datasets, it employ average filling and weighted K-means clustering.
Weighted K-means assigns distinct values to samples based on their
distances to cluster centers, offering a more precise representation of
the data distribution. Meanwhile, average filling replaces missing
values with the average of corresponding features, ensuring a complete
dataset for subsequent analysis. For feature selection, adopt an
ensemble approach that combines Random Forest (RF) with Logistic
Regression (LR) and ElasticNet. RF captures feature importance
through tree-based analysis, while LR and ElasticNet provide
additional insights into feature relevance and coefficients. This
amalgamation aims to provide a comprehensive understanding of
feature importance within the dataset. Principal Component Analysis
(PCA) is employed to reduce dataset complexity while preserving key
properties, facilitating more effective feature selection. By identifying
orthogonal components that best explain data variation, PCA enables
efficient representation and feature selection. In the final stage,
Support Vector Machines (SVM) are utilized for categorization. SVM,
a powerful classification method, establishes strong decision
boundaries that optimize the gap between classes. Leveraging the
selected features, the SVM model effectively categorizes new instances.
Computer Engineering
Normalization and feature selection using ensemble methods for crop yield prediction - Vol.14(3), Jan - Chennai ICT Academy 2024 - 3293-3303p.
In machine learning study proposes an ensemble-based strategy for
both feature selection and data standardization to enhance model
performance and interpretability. To maintain consistency across
datasets, it employ average filling and weighted K-means clustering.
Weighted K-means assigns distinct values to samples based on their
distances to cluster centers, offering a more precise representation of
the data distribution. Meanwhile, average filling replaces missing
values with the average of corresponding features, ensuring a complete
dataset for subsequent analysis. For feature selection, adopt an
ensemble approach that combines Random Forest (RF) with Logistic
Regression (LR) and ElasticNet. RF captures feature importance
through tree-based analysis, while LR and ElasticNet provide
additional insights into feature relevance and coefficients. This
amalgamation aims to provide a comprehensive understanding of
feature importance within the dataset. Principal Component Analysis
(PCA) is employed to reduce dataset complexity while preserving key
properties, facilitating more effective feature selection. By identifying
orthogonal components that best explain data variation, PCA enables
efficient representation and feature selection. In the final stage,
Support Vector Machines (SVM) are utilized for categorization. SVM,
a powerful classification method, establishes strong decision
boundaries that optimize the gap between classes. Leveraging the
selected features, the SVM model effectively categorizes new instances.
Computer Engineering