Climate Change Data Portal
DOI | 10.1016/j.atmosenv.2020.117649 |
Advancing methodologies for applying machine learning and evaluating spatiotemporal models of fine particulate matter (PM2.5) using satellite data over large regions | |
Just A.C.; Arfer K.B.; Rush J.; Dorman M.; Shtein A.; Lyapustin A.; Kloog I. | |
发表日期 | 2020 |
ISSN | 1352-2310 |
卷号 | 239 |
英文摘要 | Reconstructing the distribution of fine particulate matter (PM2.5) in space and time, even far from ground monitoring sites, is an important exposure science contribution to epidemiologic analyses of PM2.5 health impacts. Flexible statistical methods for prediction have demonstrated the integration of satellite observations with other predictors, yet these algorithms are susceptible to overfitting the spatiotemporal structure of the training datasets. We present a new approach for predicting PM2.5 using machine-learning methods and evaluating prediction models for the goal of making predictions where they were not previously available. We apply extreme gradient boosting (XGBoost) modeling to predict daily PM2.5 on a 1 × 1 km2 resolution for a 13 state region in the Northeastern USA for the years 2000–2015 using satellite-derived aerosol optical depth and implement a recursive feature selection to develop a parsimonious model. We demonstrate excellent predictions of withheld observations but also contrast an RMSE of 3.11 μg/m3 in our spatial cross-validation withholding nearby sites versus an overfit RMSE of 2.10 μg/m3 using a more conventional random ten-fold splitting of the dataset. As the field of exposure science moves forward with the use of advanced machine-learning approaches for spatiotemporal modeling of air pollutants, our results show the importance of addressing data leakage in training, overfitting to spatiotemporal structure, and the impact of the predominance of ground monitoring sites in dense urban sub-networks on model evaluation. The strengths of our resultant modeling approach for exposure in epidemiologic studies of PM2.5 include improved efficiency, parsimony, and interpretability with robust validation while still accommodating complex spatiotemporal relationships. © 2020 Elsevier Ltd |
关键词 | Aerosol optical depthAir pollutionMAIACPM2.5Spatial cross-validation |
语种 | 英语 |
scopus关键词 | Forecasting; Machine learning; Particles (particulate matter); Satellites; Security of data; Urban growth; Fine particulate matter (PM2.5); Machine learning approaches; Machine learning methods; Satellite observations; Spatial cross validations; Spatio-temporal models; Spatio-temporal relationships; Spatio-temporal structures; Predictive analytics; atmospheric pollution; data set; health impact; machine learning; methodology; numerical model; particulate matter; pollution exposure; pollution monitoring; satellite data; spatiotemporal analysis; aerosol; air pollutant; article; cross validation; exposure science; feature selection; optical depth; particulate matter; prediction; United States |
来源期刊 | ATMOSPHERIC ENVIRONMENT
![]() |
文献类型 | 期刊论文 |
条目标识符 | http://gcip.llas.ac.cn/handle/2XKMVOVA/249008 |
作者单位 | Department of Environmental Medicine and Public Health, Icahn School of Medicine at Mount Sinai, New York, NY, United States; The Department of Geography and Environmental Development, Ben-Gurion University of the Negev, Beer Sheva, Israel; NASA Goddard Space Flight Center, Greenbelt, MD, United States |
推荐引用方式 GB/T 7714 | Just A.C.,Arfer K.B.,Rush J.,et al. Advancing methodologies for applying machine learning and evaluating spatiotemporal models of fine particulate matter (PM2.5) using satellite data over large regions[J],2020,239. |
APA | Just A.C..,Arfer K.B..,Rush J..,Dorman M..,Shtein A..,...&Kloog I..(2020).Advancing methodologies for applying machine learning and evaluating spatiotemporal models of fine particulate matter (PM2.5) using satellite data over large regions.ATMOSPHERIC ENVIRONMENT,239. |
MLA | Just A.C.,et al."Advancing methodologies for applying machine learning and evaluating spatiotemporal models of fine particulate matter (PM2.5) using satellite data over large regions".ATMOSPHERIC ENVIRONMENT 239(2020). |
条目包含的文件 | 条目无相关文件。 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。