SC23 Proceedings

The International Conference for High Performance Computing, Networking, Storage, and Analysis

Workshops Archive

Environmental Factors and Lung Cancer: A Predictive Spatial Approach


Workshop: Ninth Computational Approaches for Cancer Workshop (CAFCW23)

Authors: Wenhuan Tan, Xiange Wang, and Silvia Crivelli (Lawrence Berkeley National Laboratory (LBNL)) and Xinlian Liu (National Institutes of Health (NIH), National Institute of Child Health and Human Development; Lawrence Berkeley National Laboratory (LBNL))


Abstract: Lung cancer has witnessed a substantial increase in prevalence over the past few decades. While studies have established that the environment is the primary cause of most lung cancer cases, the development of lung cancer may be the result of the combined impact of multiple environmental factors.

Our objective is to investigate the relationship between lung cancer incidence and various physical ambient factors, including climatology, air quality, meteorological conditions, and soil vegetation. To address the issue of missing data on lung cancer cases at the county level in 2020, we use a Bayesian spatial and temporal modeling approach to mapping geographic variation in lung cancer mortality rates for subnational areas with R-INLA.

Our predictive model is constructed using multiple independent variables obtained from various satellite sources, covering the period from 1960 to 2016. Climate data such as heatwaves, extreme temperatures are from National Oceanic and Atmospheric Administration (NOAA) and PRISM climate group (PRISM). Air quality indicators such as PM2.5, NO2, and ozone levels are sourced from NASA's Earth Data. Observational meteorological data, encompassing temperature, dew point, wind direction, wind speed, cloud cover, cloud layers, ceiling height, visibility, current weather, and precipitation amount, are obtained from the EPA's high-resolution gridded dataset. Soil vegetation and cropland data are acquired from the United States Department of Agriculture (USDA) using satellite imagery. Furthermore, we explore additional geophysical data available through the Google Earth Engine platform.

Our predictive model reveals an increasing positive association between multiple environmental factors and lung cancer incidence over the years. We apply a linear model with group fixed effects to 2012-2016 data, assessing lung cancer's relative risk and generating a 2017-2021 environmental vulnerability map. This work highlights AI and integrated data analysis' potential in interpreting and predicting complex health phenomena like lung cancer.





Back to Ninth Computational Approaches for Cancer Workshop (CAFCW23) Archive Listing



Back to Full Workshop Archive Listing