An outlier detection framework for Air Quality Index prediction using linear and ensemble models
Pradeep Kumar Dongre, Viral Patel, Upendra Bhoi, Nilesh N. Maltare
The Air Quality Index (AQI) is a key indicator for assessing air quality and its associated health impacts. Accurate AQI calculations are crucial for reliable air quality assessments, but outliers in air quality data can distort these calculations, leading to inaccurate predictions. This paper presents a comprehensive framework for air quality prediction that integrates multiple outlier detection methods with machine learning models, focusing on enhancing the accuracy and robustness of predictions. The study investigates various outlier detection techniques, including the Interquartile Range (IQR), robust Z-score, and Mahalanobis distance, and evaluates their impact when integrated into machine learning models. Unlike traditional approaches that remove outliers without considering seasonal effects, this research proposes retaining extreme data points after seasonal validation to improve model generalization and prediction accuracy for unseen data. The framework is evaluated using a dataset from Jaipur city, testing multiple machine learning models, including linear regression, ensemble methods, and K-Nearest Neighbor (KNN) regression. Results show that the integrated framework significantly improves model performance, with the Extra Trees Regressor achieving the best results (MAE = 11.9161, RMSE = 16.1660, and R2 = 0.8884) after refinement, compared to baseline performance (MAE = 12.6765, RMSE = 17.8452, and R2 = 0.8737). This study demonstrates the empirical effectiveness of the proposed framework and provides practical guidelines for air quality prediction in real-world applications.
Year of publication: |
2025
|
---|---|
Authors: | Patel, Viral ; Bhoi, Upendra ; Maltare, Nilesh N. |
Subject: | Air Quality Index | Ensemble learning | Extra Tree Regressor | Linear regression | Machine learning | Outlier detection | Predictive analytics | Prognoseverfahren | Forecasting model | Künstliche Intelligenz | Artificial intelligence | Luftverschmutzung | Air pollution | Regressionsanalyse | Regression analysis | Theorie | Theory | Luftreinhaltung | Air pollution control |
Saved in:
Saved in favorites
Similar items by subject
-
Predictive analysis of air pollution using machine learning techniques
Israfil, Mohd. Afaque, (2022)
-
Smart "predict, then optimize"
Elmachtoub, Adam N., (2022)
-
Suriyan Jomthanachai, (2024)
- More ...