Unsupervised Deep Learning of Electronic Health Records to Characterize Heterogeneity Across Alzheimer Disease and Related Dementias: Cross-Sectional Study

Published in JMIR Aging, 2025

Recommended citation: West, M., Cheng, Y., He, Y., Leng, Y., Magdamo, C., Hyman, B. T., Dickson, J. R., Serrano-Pozo, A., Blacker, D., & Das, S. (2025). Unsupervised Deep Learning of Electronic Health Records to Characterize Heterogeneity Across Alzheimer Disease and Related Dementias: Cross-Sectional Study. JMIR Aging, 8(1). https://doi.org/10.2196/65178

We applied unsupervised deep learning to electronic health records from 3,454 memory clinic patients at Massachusetts General Hospital to identify subtypes of Alzheimer disease and related dementias (ADRD). Using both structured ICD diagnostic codes and large language model-derived embeddings of clinical notes, we discovered patient clusters with distinct clinical profiles. Two ADRD subtypes showed consistent patterns across both data types: one characterized by psychiatric manifestations with higher female prevalence (1.59×), and another with cardiovascular and motor complications and higher male prevalence (1.75×). These findings demonstrate how combining different EHR data modalities can reveal clinically meaningful disease heterogeneity with potential applications for precision medicine approaches in dementia care.

Download paper here

Download Appendix 1 here