SC23 Proceedings

The International Conference for High Performance Computing, Networking, Storage, and Analysis

ACM Student Research Competition Poster Archive

Utilizing Large Language Models for Disease Phenotyping in Obstructive Sleep Apnea

Student: Ifrah Khurram (San Juan Bautista School of Medicine, Lawrence Berkeley National Laboratory (LBNL))
Supervisor: Silvia Crivelli (Lawrence Berkeley National Laboratory (LBNL))

Abstract: Obstructive sleep apnea (OSA) impacts millions, linking to severe complications yet understanding its influence on comorbidities lags. Complications can be avoided by using expensive continuous positive airway pressure (CPAP) machines, but physicians cannot identify those at risk. Large language models (LLMs) have recently made impressive advancements in sequence modeling, and clinical applications are quickly emerging. However, the medical relevance of pre-trained LLM latent spaces remains uncertain.

This study gauges 12 pre-trained clinical LLMs, clustering OSA-related phenotypes and comorbidities (atrial fibrillation, coronary artery disease, heart failure, hypertension, stroke, type 2 diabetes). Using 40 A100 GPUs on NERSC’s Perlmutter, document-level embeddings for 331,793 MIMIC-IV discharge reports were computed for each LLM. K-Means models were ranked by clustering entropy of phenotype classes, guiding model selection. The top models successfully subset patients with similar histories and outcomes. This work will support ongoing OSA research by identifying phenotypes and assist physicians by informing CPAP allocation.

ACM-SRC Semi-Finalist: no

Poster: PDF
Poster Summary: PDF

Back to Poster Archive Listing