Genitourinary Cancer—Prostate, Testicular, and Penile
Prostate cancer risk in African American men evaluated via digital histopathology multi-modal deep learning models developed on NRG Oncology phase III clinical trials.
Background: Artificial intelligence (AI) tools can display racial bias as a result of existing systemic health inequities and biased datasets. We have previously developed multi-modal AI (MMAI) prognostic models based on digital pathology images from five phase III randomized radiotherapy prostate cancer trials that outperform NCCN risk groups for prediction of distant metastasis (DM), biochemical failure (BF), prostate cancer-specific mortality (PCSM) and all-cause mortality (OS). In this study, we assessed the algorithmic fairness of the locked MMAI models between African American (AA) and non-AA populations in the five randomized trials. Methods: Patients enrolled in NRG/RTOG 9202, 9408, 9413, 9910, and 0126 with digitized biopsy histopathology slides were included in this study. The locked MMAI models were applied, and subgroup analyses were conducted by comparing distributions of clinical variables and MMAI scores (medians for continuous variables and proportions for categorical variables reported), and evaluating MMAI models’ prognostic ability among AA and non-AA men. The performance of the models were compared using DM as the primary endpoint and secondary endpoints of BF, PCSM, OS (death without an event as a competing risk) with Fine-Gray or Cox Proportional Hazards models. Either Kaplan Meier or cumulative incidence estimates were computed and compared using log-rank or Gray’s test. Results: This study included 5,624 men: 932 (17%) AA, 4503 (80%) white, and 189 (3%) other races. AA had younger median age (69 vs 71 year [yr]), higher median baseline PSA (12 vs 10 ng/mL), more T1-T2a (62% vs 57%), more Gleason < 7 (42% vs 36%) and 8-10 (15% vs 12%), and more NCCN low and high risk (12% vs 10% and 41% vs 33%). AA and non-AA had estimated 5-yr BF rates 27% and 27%, 5-yr DM rates 5% and 5%, 10-yr PCSM 5% and 7%, and 10-yr OS 58% and 60%, respectively. The median (interquartile range) score of the model optimizing for 5-yr DM (5-yr DM MMAI) was 0.044 (0.037–0.059) in AA and 0.043 (0.036–0.057) in non-AA. Similarly, all other MMAI models had differences in the medians between AA and non-AA ranging from 0.001 to 0.02. For all endpoints, the 5-yr DM MMAI model showed strong prognostic signal (hazard ratio [HR] per one standard deviation increase: 1.6 for DM, 1.4 for BF, 1.6 for PCSM and 1.3 for OS, all p-values < 0.001) and had comparable trends within AA vs. non-AA in the entire cohort (e.g., HR for DM 1.4 vs 1.6). Similar results were observed for the MMAI model optimizing for 10-yr PCSM. Conclusions: To our knowledge, this represents the first comparative analyses of a digital pathology AI prognostic model in AA vs. non-AA prostate cancer patients. The prognostic performance of the AI models was found to be comparable between subgroups. Our data supports the use of these models across racial groups, though further validation in AA cohorts is ongoing.