Air Quality Index Prediction Using Machine Learning and Deep Learning: A Comparative Analysis of Dehradun and Kashipur in Uttarakhand, India

Divyanshu Bhatt *

Department of Information Technology, College of Technology, G. B. Pant University of Agriculture and Technology, Pantnagar-263145 (U.S. Nagar, Uttarakhand), India.

Shikha Goswami

Department of Information Technology, College of Technology, G. B. Pant University of Agriculture and Technology, Pantnagar-263145 (U.S. Nagar, Uttarakhand), India.

Govind Verma

Department of Information Technology, College of Technology, G. B. Pant University of Agriculture and Technology, Pantnagar-263145 (U.S. Nagar, Uttarakhand), India.

Binay Kumar Pandey

Department of Information Technology, College of Technology, G. B. Pant University of Agriculture and Technology, Pantnagar-263145 (U.S. Nagar, Uttarakhand), India.

*Author to whom correspondence should be addressed.


Abstract

Predicting the Air Quality Index (AQI) is important for environmental monitoring, public health protection, and pollution-control planning. This study compares seven classical machine learning models — Linear Regression, Decision Tree, Random Forest, Support Vector Regressor (SVR), K-Nearest Neighbors (KNN), Gradient Boosting, and XGBoost — and two deep learning architectures — Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) — for AQI prediction in Dehradun and Kashipur, Uttarakhand, India. AQI values were computed using the CPCB sub-index methodology across six major pollutants: PM2.5, PM10, SO₂, CO, NO₂, and O₃.. Model performance was assessed using hold-out testing and walk-forward time-series cross-validation with five folds. Results show that ensemble and neighbor-based methods significantly outperform linear and deep learning approaches for the available dataset sizes. In Dehradun, Random Forest achieved the best hold-out performance with R² = 99.50% and RMSE = 4.60, while under walk-forward temporal validation, KNN led with R² = 91.37%, while Random Forest achieved the lowest RMSE = 13.43. In Kashipur, Random Forest and Gradient Boosting exceeded 95% R² in hold-out testing, and XGBoost, KNN, Random Forest, and Gradient Boosting all achieved approximately 96% R² under walk-forward validation. LSTM and GRU captured temporal AQI patterns but achieved lower accuracy than the best classical models, with R² values between 75% and 83%. The study concludes that walk-forward validation provides a more reliable estimate of AQI forecasting performance than random train-test splits, and that KNN and ensemble learning methods are promising approaches for air quality forecasting in Himalayan foothill cities.

Keywords: Air Quality Index (AQI), machine learning, deep learning, K-Nearest neighbors, Random forest, XGBoost, time-series forecasting, Uttarakhand


How to Cite

Bhatt, Divyanshu, Shikha Goswami, Govind Verma, and Binay Kumar Pandey. 2026. “Air Quality Index Prediction Using Machine Learning and Deep Learning: A Comparative Analysis of Dehradun and Kashipur in Uttarakhand, India”. Advances in Research 27 (4):116-30. https://doi.org/10.9734/air/2026/v27i41661.

Downloads

Download data is not yet available.