Open Access Open Access  Restricted Access Subscription Access

A Decadal Study of PM2.5 Concentrations over Delhi using MERRA-2 and Ground Measurements: Predictive Insights via Machine Learning


Affiliations
1 Department of Civil Engineering, Institute of Engineering and Technology, Lucknow, UP 226 021, India
2 Department of Civil Engineering, Integral University, Lucknow, UP 226 026, India
3 Centre for Atmospheric Sciences, Indian Institute of Technology, Hauz Khas, New Delhi 110 016, India
4 Indian Institute of Tropical Meteorology, Ministry of Earth Sciences, New Delhi 110 060, India

This study investigates the spatial and temporal variations of PM2.5 concentrations in Delhi from 2014 to 2023, utilizing ground-based measurements from the Central Pollution Control Board (CPCB) and MERRA-2 reanalysis data. The analysis reveals strong positive correlations (r > 0.90) across all districts, highlighting city-wide factors influencing PM2.5 levels, such as vehicular emissions, industrial activities, and regional weather patterns. Seasonal patterns show PM2.5 concentrations peaking during winter, attributed to lower temperatures, reduced wind speeds, and increased emissions from heating sources.To enhance the accuracy of PM2.5 predictions, various machine learning (ML) models were employed, including Extra Trees Regressor, Random Forest Regressor, Light Gradient Boosting Machine (LGBM) Regressor, and a Stacking Regressor. These models utilized MERRA-2 sub-parameters like Dust, Organic Carbon, Black Carbon, Sea Salt, and Sulfate. The Stacking Regressor demonstrated the best performance, achieving an R² value of 0.67 and a significant improvement in correlation with CPCB measurements (r = 0.86). The ML models significantly improved the prediction accuracy of PM2.5 concentrations compared to the original MERRA-2 data, reducing the Mean Bias from -39.4 µg/m3 to around 10.4µg/m3 and the Root Mean Squared Error (RMSE) from 71.1 µg/m3 to below 40 µg/m3. Additionally, the Fraction of predictions within a factor of 2 increased from 0.61 for MERRA-2 to over 0.89 for all ML models.These findings underscore the effectiveness of integrating machine learning models with MERRA-2 sub-parameters to accurately estimate PM2.5 concentrations. This approach provides more reliable predictions of air quality, essential for developing targeted and effective air quality management strategies in Delhi.

Keywords

PM2.5 concentrations; Delhi; Machine learning models; Air pollution; MERRA-2
User
Notifications
Font Size

Abstract Views: 50




  • A Decadal Study of PM2.5 Concentrations over Delhi using MERRA-2 and Ground Measurements: Predictive Insights via Machine Learning

Abstract Views: 50  | 

Authors

Sumit Singh
Department of Civil Engineering, Institute of Engineering and Technology, Lucknow, UP 226 021, India
Vikash Singh
Department of Civil Engineering, Integral University, Lucknow, UP 226 026, India
Ajay Kumar
Department of Civil Engineering, Institute of Engineering and Technology, Lucknow, UP 226 021, India
Amarendra Singh
Centre for Atmospheric Sciences, Indian Institute of Technology, Hauz Khas, New Delhi 110 016, India
Atul Kumar Srivastava
Indian Institute of Tropical Meteorology, Ministry of Earth Sciences, New Delhi 110 060, India
Virendra Pathak
Department of Civil Engineering, Institute of Engineering and Technology, Lucknow, UP 226 021, India

Abstract


This study investigates the spatial and temporal variations of PM2.5 concentrations in Delhi from 2014 to 2023, utilizing ground-based measurements from the Central Pollution Control Board (CPCB) and MERRA-2 reanalysis data. The analysis reveals strong positive correlations (r > 0.90) across all districts, highlighting city-wide factors influencing PM2.5 levels, such as vehicular emissions, industrial activities, and regional weather patterns. Seasonal patterns show PM2.5 concentrations peaking during winter, attributed to lower temperatures, reduced wind speeds, and increased emissions from heating sources.To enhance the accuracy of PM2.5 predictions, various machine learning (ML) models were employed, including Extra Trees Regressor, Random Forest Regressor, Light Gradient Boosting Machine (LGBM) Regressor, and a Stacking Regressor. These models utilized MERRA-2 sub-parameters like Dust, Organic Carbon, Black Carbon, Sea Salt, and Sulfate. The Stacking Regressor demonstrated the best performance, achieving an R² value of 0.67 and a significant improvement in correlation with CPCB measurements (r = 0.86). The ML models significantly improved the prediction accuracy of PM2.5 concentrations compared to the original MERRA-2 data, reducing the Mean Bias from -39.4 µg/m3 to around 10.4µg/m3 and the Root Mean Squared Error (RMSE) from 71.1 µg/m3 to below 40 µg/m3. Additionally, the Fraction of predictions within a factor of 2 increased from 0.61 for MERRA-2 to over 0.89 for all ML models.These findings underscore the effectiveness of integrating machine learning models with MERRA-2 sub-parameters to accurately estimate PM2.5 concentrations. This approach provides more reliable predictions of air quality, essential for developing targeted and effective air quality management strategies in Delhi.

Keywords


PM2.5 concentrations; Delhi; Machine learning models; Air pollution; MERRA-2