[Nature Communications] A Large Model for Non-invasive and Personalized Management of Breast Cancer from Multiparametric MRI

The Smart Lab team at the Hong Kong University of Science and Technology, working with a consortium of institutions including Harvard University, Shenzhen People’s Hospital, the PLA Middle Military Command General Hospital, the Shenzhen Institute of Advanced Technology, The University of Hong Kong, The Chinese University of Hong Kong, and Yunnan Cancer Hospital, has pioneered a novel approach for non-invasive and personalized management of breast cancer patients. Published in Nature Communications, the study introduces a Large Mixture of Modality Experts (MOME) model that integrates multiparametric MRI (mpMRI) data, and is developed based on the largest breast MRI dataset to date. The model enables comprehensive, non-invasive diagnosis, grading, and treatment prediction for breast cancer.

MOME Cover

Figure 1: Overview of the multiparametric breast MRI-based study design. a. Data collection from three different hospitals, covering the population of the north, southeast, and southwest of China. b. The generation of multiparametric breast MRI, where T2-weighted MRI, Diffusion-weighted MRI, and DCE-MRI were mainly used in this study. c. MOME first takes multi-parametric MRI as input. Then, based on pre-trained foundation model parameters, mixture of sparse modality experts and soft modality experts are leveraged for unimodal feature extraction and multimodal information integration. d. MOME can be used for malignancy screening, molecular subtyping, and NACT response prediction, offering non-invasive personalized management for breast cancer patients.

Introduction

Breast cancer remains the leading cause of cancer-related mortality among women worldwide. Personalized management of breast cancer relies on accurate detection of malignant lesions, molecular subtyping, and prediction of patient responses to various treatments. Breast magnetic resonance imaging (MRI), as a highly sensitive imaging modality, plays a critical role in breast cancer screening and staging. However, traditional MRI interpretation primarily relies on single modalities, such as dynamic contrast-enhanced MRI (DCE-MRI). While DCE-MRI offers high sensitivity, its specificity is limited. The integration of multiparametric MRI (mpMRI), including DCE-MRI, T2-weighted imaging (T2WI), and diffusion-weighted imaging (DWI), holds promise for improving diagnostic performance. However, the high dimensionality and heterogeneity of mpMRI data present significant challenges for the design of AI models.

To address these challenges, a collaborative research team collected data from 5,205 patients across three hospitals in northern, southeastern, and southwestern China, constructing the world’s largest mpMRI dataset for breast cancer to date. Leveraging this dataset, the team proposed the Mixture of Modality Experts (MOME) model. Built upon a foundation model architecture, MOME inherits the Transformer-based model’s long-range modeling capabilities for multimodal information fusion, enabling effective learning, modeling, and inference from high-dimensional mpMRI data (Figure 1). In the task of distinguishing between benign and malignant lesions, MOME achieved performance on par with four senior radiologists and significantly outperformed a junior radiologist. Specifically, in comparison experiments with radiologists, MOME achieved an AUROC of 0.913, an AUPRC of 0.948, an F1 score of 0.905, and an MCC of 0.723.

Further investigations revealed that MOME demonstrates significant clinical value in personalized breast cancer management. For instance, MOME was able to reduce unnecessary biopsies in 7.3% of BI-RADS category 4 patients. Additionally, it achieved an AUROC of 0.709 for triple-negative breast cancer classification and an AUROC of 0.694 for predicting pathological complete response (pCR) to neoadjuvant chemotherapy using pre-treatment mpMRI. Moreover, thanks to its mixture-of-modality-experts design, MOME supports scalable and interpretable multimodal inference. It can adaptively handle missing modalities during flexible inference and provide decision explanations by highlighting lesions and assessing modality contributions.

Dataset Characteristics

This study utilized a total of 5,220 multiparametric breast MRI examinations from 5,205 patients, collected over ten years from three institutions: The Fifth Medical Center of Chinese PLA General Hospital (DS1), Shenzhen People’s Hospital (DS2), and Yunnan Cancer Hospital (DS3). DS1 included 1,824 examinations (November 2012–July 2017), DS2 had 735 (December 2018–March 2022), and DS3 comprised 2,661 (November 2015–October 2022).

For malignancy classification, DS1 was divided into training (n=1,167), validation (n=150), and internal testing sets (n=507), with no patient overlap. DS2 and DS3 were used for external testing. Additionally, subsets of DS1 were used for triple-negative breast cancer (TNBC) subtyping (n=1,005) and neoadjuvant chemotherapy (NACT) response prediction (n=358), with performance evaluated via five-fold cross-validation.

This dataset, containing 5,220 mpMRI examinations, represents the largest multiparametric breast MRI dataset to date.

Main Experimental Results

Comparable to radiologists

We compared the malignancy diagnosis performance of MOME with six radiologists on the internal testing set 2 (n=200). The radiologists included one with less than five years of breast MRI experience (reader 1), two with five to ten years of experience (readers 2 and 3), and three with over ten years of experience (readers 4, 5, and 6).

MOME performed comparably to four radiologists (readers 2, 3, 5, and 6) with no statistically significant differences in F1 and MCC scores. It outperformed reader 1, achieving significantly higher F1 (+0.065, 95% CI 0.019–0.117) and MCC (+0.228, 95% CI 0.090–0.384). However, MOME’s performance was slightly lower than reader 4 in F1 (-0.051, 95% CI -0.087 to -0.019) and MCC (-0.145, 95% CI -0.240 to -0.048).

MOME achieved an AUROC of 0.913 (95% CI 0.864–0.952) and an AUPRC of 0.948 (95% CI 0.911–0.977), with five out of six radiologists’ performance points lying on or below the curves, indicating MOME’s comparable performance to radiologists with appropriate decision thresholds.

MOME Performance

Figure 2: Discriminative malignancy detection performance of MOME. MOME achieved comparable MCC (a) and F1 (b) score performance to four experienced radiologists out of six readers, and significantly outperformed one junior radiologist, on the internal testing set 2 (n=200). MOME also showed high AUROC (c), pAUROC (d), and AURPC (e), on the internal testing set 2 (n=200). Moreover, MOME outperformed other unimodal or multimodal methods in both AUROC (f) and AUPRC (g), on the the combination of DS1 internal testing set 1 and DS2 (n=1042).

Generality across hospitals

MOME demonstrated strong and generalizable performance in distinguishing malignant from benign tumors across different hospitals (Fig. 3). On the internal testing set 1 (n=307), MOME achieved an AUROC of 0.912 (95% CI: 0.877–0.944) and an AUPRC of 0.942 (95% CI: 0.907–0.970). The partial AUROC (pAUROC) at 90% sensitivity was 0.735 (95% CI: 0.666–0.811) and 0.800 (95% CI: 0.706–0.880) at 90% specificity, indicating high accuracy in identifying breast cancer patients. External validation on DS2 and DS3, which differed in MRI protocols, demographics, and case distributions, further confirmed MOME’s robustness. On DS2, MOME achieved an AUROC of 0.899 (95% CI: 0.877–0.922) and an AUPRC of 0.887 (95% CI: 0.847–0.923), with a pAUROC of 0.740 (95% CI: 0.695–0.788) at 90% sensitivity and 0.753 (95% CI: 0.698–0.805) at 90% specificity. On DS3, MOME achieved an AUROC of 0.806 (95% CI: 0.790–0.822) and an AUPRC of 0.807 (95% CI: 0.785–0.827), with a pAUROC of 0.617 (95% CI: 0.594–0.642) at 90% sensitivity and 0.621 (95% CI: 0.600–0.643) at 90% specificity. These results highlight MOME’s strong discriminatory ability and robustness across varying hospital settings.

MOME Performance

Figure 3: Malignancy diagnosis performance of MOME across different hospitals. The results correspond to the ROC curve (a), ROC curve with partial AUC (b), and precision-recall curve (c) on DS1 internal testing set; the ROC curve (d), ROC curve with partial AUC (e), and precision-recall curve (f) on DS2; and the ROC curve (g), ROC curve with partial AUC (h), and precision-recall curve (i) on DS3.

Handling missing sequences

For missing sequences, MOME performs well using only DCE for breast MRI, achieving an AUROC of 0.877 and an AUPRC of 0.926 (Table 1). With test-time augmentation, AUROC improves across datasets: 0.886 (internal set 1), 0.897 (internal set 2), 0.881 (DS2), and 0.790 (DS3) (Table 1). These results show that while MOME benefits from complete multiparametric input for optimal performance, it remains robust even with missing sequences.

Table 1: AUROC and AUPRC performance in ablation study on MOME.

MOME Missing

Model decision interpretation

MOME provides interpretability by highlighting lesions and analyzing the contribution of each modality. Local interpretation is conducted using Integrated Gradient and Shapley Value for individual cases. Integrated Gradient shows that MOME correctly focuses on breast lesions when diagnosing malignant (Figs. 4a, 4g) and benign (Figs. 4b, 4h) cases, with consistency across DS1 and DS2. The Shapley Value analysis reveals the contributions of different modalities to the final prediction (Figs. 4e, 4f, 4k, 4l). For malignant cases, DCE and DWI play key roles, whereas for benign cases, DCE and T2WI are more influential. For a broader perspective, global interpretation is derived from the global Shapley Value (Figs. 4m, 4n). It shows that DCE has the highest overall contribution to malignancy detection, while DWI and T2WI are primarily responsible for identifying malignant and benign cases, respectively. These patterns remain consistent across DS1 and DS2.

MOME Decision Explanations

Figure 4: Decision Interpretation of MOME. The illustrations correspond to DCE subtraction 3D visualization overlaid with saliencies in red computed by integrated gradient (a,b,g,h), the zoomed-in axial view of DCE subtraction, DWI, and T2WI (c,d,i,j), the local Shapley value (e,f,k,l), and global Shapley value of the DS1 internal testing set (m) and DS2 (n). Four typical cases of a BI-RADS 5 patient with a malignant lesion (a,c,e), a BI-RADS 4 patient with a benign lesion (b,d,f) from DS1 internal testing set, and a BI-RADS 5 patient with a malignant lesion (g,i,k), a BI-RADS 4 patient with a benign lesion (h,j,i) from DS2 are shown.

Personalized management

MOME enhances personalized management in breast cancer care by aiding malignancy detection, reducing unnecessary biopsies, and supporting treatment decisions. Decision curve analysis on DS1, DS2, and DS3 (Figs. 5a–d) shows that MOME provides a high net benefit across a wide range of preference thresholds, making it a valuable tool for malignancy screening. For BI-RADS 4 patients, MOME helps minimize unnecessary biopsies. By adjusting its operating point, it correctly downgraded 7.3% of benign BI-RADS 4 cases (8 out of 109) without missing any malignancies. Decision curve analysis (Fig. 5e) confirms that MOME offers a higher net benefit than the standard “biopsy all” approach. Compared to ResNet34-DCE and the Feature Fuse model (best-performing alternatives in Fig. 2), MOME was the only model able to reduce unnecessary biopsies on DS2. MOME also supports triple-negative breast cancer (TNBC) subtyping (Fig. 5f) and neoadjuvant chemotherapy (NACT) response prediction (Fig. 6g). Using five-fold cross-validation, it achieved an AUROC of 0.709±0.067 for TNBC subtyping (n=1,005) and 0.694±0.029 for NACT response prediction (n=358). Notably, TNBC patients are more likely to achieve a complete response after NACT. These findings demonstrate MOME’s potential in non-invasive personalized breast cancer management, including malignancy screening, biopsy recommendations, and treatment decision support.

MOME Decision Explanations

Figure 5: Potential of noninvasive personalized treatment based on MOME. The decision curves on DS1 internal testing set 1 (a), DS1 internal testing set 2 (b), DS2 (c), and DS3 (d) show a long range of preference in using MOME for malignancy screening. The decision curve also shows high net benefit of reducing biopsy for BI-RADS 4 patients on DS2 (e). MOME also demonstrated potential with ROC curves for TNBC patient subtyping (f) and NACT response prediction (g).

Conclusion

Overall, MOME represents a discriminative, robust, and scalable multimodal model, paving the way for non-invasive personalized management of breast cancer based on multiparametric imaging data.

For more details, please refer to the original paper or contact Smart Lab. You may also find more information about Dr.Luo’s research on his homepage.

Acknowledgments

This work was supported by the following institutions:

  1. Department of Computer Science and Technology, The Hong Kong University of Science and Technology, Hong Kong, China.
  2. Department of Biomedical Informatics, Harvard University, Boston, U.S.A.
  3. Department of Radiology, Shenzhen People’s Hospital, Shenzhen, China.
  4. Department of Radiology, PLA Middle Military Command General Hospital, Wuhan, China.
  5. Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China.
  6. Department of Diagnostic Radiology, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong, China.
  7. Department of Imaging and Interventional Radiology, The Chinese University of Hong Kong, Hong Kong, China.
  8. Department of Oncology, Yunnan Cancer Hospital, Kunming, China.
  9. Department of Radiology, 5th Medical Center of Chinese PLA General Hospital, Beijing, China.
  10. The Second School of Clinical Medicine, Southern Medical University, Guangzhou, China.
  11. Department of Chemical and Biological Engineering, The Hong Kong University of Science and Technology, Hong Kong, China.
  12. Division of Life Science, The Hong Kong University of Science and Technology, Hong Kong, China.

References

[1] Luo, L., …, Chen, H. (2024) Towards Non-invasive and Personalized Management of Breast Cancer Patients from Multiparametric MRI via A Large Mixture-of-Modality-Experts Model”, arXiv e-prints, 2024. doi:10.48550/arXiv.2408.12606.
[2] Luo, L., Wang, X., Lin, Y., Ma, X., Tan, A., Chan, R., … & Chen, H. (2024). Deep learning in breast cancer imaging: A decade of progress and future directions. IEEE Reviews in Biomedical Engineering.
[3] Luo, L., Chen, H., Wang, X., Dou, Q., Lin, H., Zhou, J., … & Heng, P. A. (2019). Deep angular embedding and feature correlation attention for breast MRI cancer analysis. In Medical Image Computing and Computer Assisted Intervention–MICCAI 2019: 22nd International Conference, Shenzhen, China, October 13–17, 2019, Proceedings, Part IV 22 (pp. 504-512). Springer International Publishing.
[4] Zhou, J., Luo, L., Dou, Q., Chen, H., Chen, C., Li, G. J., … & Heng, P. A. (2019). Weakly supervised 3D deep learning for breast cancer classification and localization of the lesions in MR images. Journal of Magnetic Resonance Imaging, 50(4), 1144-1151.
[5] Bejnordi, B. E., Veta, M., Van Diest, P. J., Van Ginneken, B., Karssemeijer, N., Litjens, G., … & CAMELYON16 Consortium. (2017). Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer. Jama, 318(22), 2199-2210.
[6] Chen, H., Dou, Q., Wang, X., Qin, J., & Heng, P. (2016, February). Mitosis detection in breast cancer histology images via deep cascaded networks. In Proceedings of the AAAI conference on artificial intelligence (Vol. 30, No. 1).
[7] Lin, H., Chen, H., Dou, Q., Wang, L., Qin, J., & Heng, P. A. (2018, March). Scannet: A fast and dense scanning framework for metastatic breast cancer detection from whole-slide image. In 2018 IEEE winter conference on applications of computer vision (WACV) (pp. 539-546). IEEE.
[8] Xiang, H., Wang, X., Xu, M., Zhang, Y., Zeng, S., Li, C., … & Lin, X. (2023). Deep Learning-assisted diagnosis of breast lesions on us images: a multivendor, multicenter study. Radiology: Artificial Intelligence, 5(5), e220185.