Open Access Open Access  Restricted Access Subscription Access

Machine Learning Approach-Based Big Data Imputation Methods for Outdoor Air Quality Forecasting


Affiliations
1 Department of Mathematics, Srinivasa Ramanujan Centre, SASTRA Deemed to be University, Kumbakonam 612 001, Tamil Nadu, India
2 Department of Computer Science and Engineering, Srinivasa Ramanujan Centre, SASTRA Deemed to be University, Kumbakonam 612 001, Tamil Nadu, India
 

Missing data from ambient air databases is a typical issue, but it is much worse in small towns or cities. Missing data is a significant concern for environmental epidemiology. These settings have high pollution exposure levels worldwide, and dataset gaps obstruct health investigations that could later affect local and international policies. When a substantial number of observations contain missing values, the standard errors increase due to the smaller sample size, which may significantly affect the final result. Generally, the performance of various missing value imputation algorithms is proportional to the size of the database and the percentage of missing values within it. This paper proposes and demonstrates an ensemble – imputation – classification framework approach to rebuild air quality information using a dataset from Beijing, China, to forecast air quality. Various single and multiple imputation procedures are utilized to fill the missing records. Then ensemble of diverse classifiers is used on the imputed data to find the air pollution level. The recommended model aims to reduce the error rate and improve accuracy. Extensive testing of datasets with actual missing values has revealed that the suggested methodology significantly enhances the air quality forecasting model’s accuracy with multiple imputation and ensemble techniques when compared to other conventional single imputation techniques.

Keywords

Air Quality, Big Data Analytics, Classification, Ensemble, Multiple Imputation.
User
Notifications
Font Size


  • Machine Learning Approach-Based Big Data Imputation Methods for Outdoor Air Quality Forecasting

Abstract Views: 318  |  PDF Views: 100

Authors

Narasimhan D
Department of Mathematics, Srinivasa Ramanujan Centre, SASTRA Deemed to be University, Kumbakonam 612 001, Tamil Nadu, India
Vanitha M
Department of Computer Science and Engineering, Srinivasa Ramanujan Centre, SASTRA Deemed to be University, Kumbakonam 612 001, Tamil Nadu, India

Abstract


Missing data from ambient air databases is a typical issue, but it is much worse in small towns or cities. Missing data is a significant concern for environmental epidemiology. These settings have high pollution exposure levels worldwide, and dataset gaps obstruct health investigations that could later affect local and international policies. When a substantial number of observations contain missing values, the standard errors increase due to the smaller sample size, which may significantly affect the final result. Generally, the performance of various missing value imputation algorithms is proportional to the size of the database and the percentage of missing values within it. This paper proposes and demonstrates an ensemble – imputation – classification framework approach to rebuild air quality information using a dataset from Beijing, China, to forecast air quality. Various single and multiple imputation procedures are utilized to fill the missing records. Then ensemble of diverse classifiers is used on the imputed data to find the air pollution level. The recommended model aims to reduce the error rate and improve accuracy. Extensive testing of datasets with actual missing values has revealed that the suggested methodology significantly enhances the air quality forecasting model’s accuracy with multiple imputation and ensemble techniques when compared to other conventional single imputation techniques.

Keywords


Air Quality, Big Data Analytics, Classification, Ensemble, Multiple Imputation.

References