Non-Invasive and Personalized Management of Breast Cancer Patients through a Large Mixture of Modality Experts Model for Multiparametric MRI

The Smart Lab team at the Hong Kong University of Science and Technology collaborated with Harvard University, Shenzhen People’s Hospital, the PLA Middle Military Command General Hospital, the Shenzhen Institute of Advanced Technology, The University of Hong Kong, The Chinese University of Hong Kong, Yunnan Cancer Hospital, and other institutions to propose a novel approach for non-invasive and personalized management of breast cancer patients using multiparametric MRI (mpMRI). The team developed a Large Mixture of Modality Experts model (MOME) that integrates mpMRI information, providing a comprehensive model for non-invasive personalized breast cancer diagnosis, grading, and treatment prediction.

MOME Cover

Figure 1: Overview of the Large Mixture of Modality Experts model (MOME) architecture. MOME integrates multiparametric MRI data to enhance the diagnostic and predictive capabilities for breast cancer management.

Introduction

The research team collected data from over 5,200 patients across three hospitals in northern, southeastern, and southwestern China, creating the world’s largest multiparametric breast MRI dataset for model development and evaluation. MOME is designed based on a foundation model, inheriting the long-range modeling capabilities of the Transformer architecture for learning and modeling high-dimensional mpMRI data. Furthermore, the team introduced a mixture-of-modality expert model to achieve multimodal integration and flexible inference.

MOME demonstrated outstanding performance in breast cancer detection, matching the diagnostic accuracy of four experienced radiologists while significantly outperforming a less experienced one. For instance, in a comparative study with radiologists, MOME achieved an AUROC of 0.913. Overall, our research indicates that MOME can reduce unnecessary biopsies for BI-RADS 4 patients by 7.3% and effectively classify triple-negative breast cancer. Additionally, MOME supports scalable and interpretable inference, allowing adaptive reasoning with missing modalities and providing decision explanations by highlighting lesions and measuring modality contributions.

Dataset Characteristics

This study included a total of 5,205 patients and 5,220 multiparametric breast MRI examinations collected over ten years from three hospitals. Dataset 1 (DS1) comprised 1,824 examinations from November 2012 to July 2017. Dataset 2 (DS2) included 735 examinations from December 2018 to March 2022. Dataset 3 (DS3) contained 2,661 examinations from November 2015 to October 2022.

For the malignant classification task, DS1 was divided into a training group (n=1,167) and a validation group (n=150) for model development, with internal test group 1 (n=307) for evaluation and internal test group 2 (n=200) for comparison with radiologists. There were no overlapping patients between different datasets. DS2 and DS3 were used for external testing.

Additionally, 1,005 patients from DS1 and 358 patients from DS1 were used for triple-negative breast cancer (TNBC) subtype classification and new auxiliary chemotherapy (NACT) response prediction, respectively, with five-fold cross-validation reporting performance.

Main Experimental Results

Comparable to radiologists

We compared MOME’s diagnostic capabilities on internal test set 2 (n=200) with six radiologists: one with less than five years of experience, two with 5-10 years, and three with over ten years of experience. Compared to each individual radiologist, MOME showed no statistically significant difference in F1 score and MCC with four of the radiologists. Notably, MOME significantly outperformed the less experienced radiologist in both F1 score (0.065, 95% CI 0.019-0.117) and MCC (0.228, 95% CI 0.090-0.384). However, MOME’s performance was slightly lower than that of the fourth radiologist (F1: -0.051, 95% CI -0.087~-0.019; MCC: -0.145, 95% CI -0.240~-0.048). MOME achieved an AUC of 0.913 (95% CI 0.864-0.952) and an AUPRC of 0.948 (95% CI 0.911-0.977), indicating its diagnostic capabilities are on par with those of these radiologists. The comparison results of various metrics with radiologists are shown in Figure 2.

MOME Performance

Figure 2: MOME’s distinguishing ability in malignant detection. MOME’s performance in terms of MCC (a) and F1 (b) scores was comparable to that of four experienced radiologists while significantly outperforming the less experienced radiologist. MOME also excelled in AUC (c), pAUC (d), and AUPRC (e) metrics. Additionally, MOME outperformed other single-modal or multi-modal methods across both DS1 (f,g) and DS2 (h,i) datasets in AUC and AUPRC metrics.

Generality across hospitals

On internal test set 1 (sample size 307), MOME achieved an AUC of 0.912 (95% CI: 0.876-0.942) and an AUPRC of 0.942 (95% CI: 0.906-0.970) (Figure 3a,c). At 90% sensitivity, the pAUC was 0.735 (95% CI: 0.666-0.811); at 90% specificity, the pAUC was 0.800 (0.706-0.880) (Figure 3b). These results indicate MOME’s high accuracy in identifying breast cancer patients. External validation was performed on DS2 and DS3, which differed from the internal dataset in MRI protocols, demographics, and malignant case distributions. Specifically, MOME achieved an AUC of 0.899 (95% CI: 0.877-0.922) and an AUPRC of 0.887 (95% CI: 0.847-0.923) on DS2. At 90% sensitivity, the pAUC was 0.740 (95% CI: 0.695-0.788); at 90% specificity, the pAUC was 0.753 (95% CI: 0.698-0.805). On DS3, MOME’s AUC was 0.806 (95% CI: 0.790-0.822) and AUPRC was 0.807 (95% CI: 0.785-0.827). At 90% sensitivity, the pAUC was 0.617 (95% CI: 0.594-0.642); at 90% specificity, the pAUC was 0.621 (95% CI: 0.600-0.643). These results indicate that MOME possesses excellent distinguishing ability and generalizability.

MOME Performance

Figure 3: (a), (c) ROC curves and precision-recall curves for the internal test set of DS1; (b) partial AUC ROC curves for the internal test set of DS1; (d), (f) ROC curves and precision-recall curves for DS2; (e) partial AUC ROC curves for DS2; (g), (i) ROC curves and precision-recall curves for DS3; (h) partial AUC ROC curves for DS3.

Model decision interpretation

MOME can accurately highlight tumors and analyze the contribution of each modality: using integrated gradients, we can observe that MOME correctly focuses on breast tumors when diagnosing malignant (Figures 4a and 4g) or benign (Figures 4b and 4h) cases, consistent across DS1 and DS2. Shapley values illustrate the contribution of each modality to the final prediction (Figures 4e, f, k, and l). Notably, DCE and DWI play more significant roles in identifying malignant patients, while DCE and T2WI contribute more to distinguishing benign patients. As shown by the global Shapley values (Figures 4m and n), DCE contributes the most to determining malignant diagnoses, whereas DWI primarily aids in diagnosing malignant cases and T2WI is more instrumental in diagnosing benign cases, a decision rule that is consistent across DS1 and DS2.

MOME Decision Explanations

Figure 4: Decision explanations of MOME. Images correspond to DCE subtraction 3D display (a,b,g,h), DCE subtraction, DWI, and T2WI axial views (c,d,i,j), local Shapley values (e,f,k,l), and global Shapley values for DS1 internal test set (m) and DS2 (n). Shown are four representative cases: a BI-RADS 5 malignant tumor patient (a,c,e) and a BI-RADS 4 benign tumor patient (b,d,f) from the DS1 internal test set, as well as a BI-RADS 5 malignant tumor patient (g,i,k) and a BI-RADS 4 benign tumor patient (h,j,i) from DS2.

Personalized management

We first analyzed the net benefit of using MOME for detecting malignant tumor patients. Decision curve analyses for internal test set 1 of DS1, internal test set 2 of DS1, DS2, and DS3 (shown in Figures 5a, b, c, d) indicate that MOME provides high net benefits within the long-term preference threshold range, demonstrating its potential for decision support in malignant screening. We also examined how MOME can reduce unnecessary biopsies for BI-RADS 4 patients. By adjusting MOME’s operational points, we explored the trade-off between the number of correctly downgraded cases and the rate of undetected true positives. Based on an operational point derived from DS1 and its results in DS2, we found that at one operational point, 7.3% of BI-RADS 4 benign cases (8 out of 109) could be spared from biopsy without missing any cancers (totaling 86 cases). Through decision curve analysis (Figure 5g), the net benefit provided by MOME is also higher than the comprehensive biopsy strategy commonly used for BI-RADS 4 patients within the long-term threshold range. These results suggest that using MOME for personalized biopsy recommendations for BI-RADS 4 patients has a higher preference. We also investigated MOME’s performance in triple-negative breast cancer subtype classification (Figure 5f, AUC=0.709±0.067, sample size 1005) and predicting pathological complete response to new auxiliary chemotherapy (Figure 5e, AUC=0.694±0.029, sample size 358). It is important to note that patients with triple-negative breast cancer have a higher likelihood of achieving pathological complete response after new auxiliary chemotherapy. These results suggest that MOME contributes to non-invasive personalized breast cancer management, including malignant screening, biopsy recommendations, and treatment decision support.

MOME Decision Explanations

Figure 5: Potential for non-invasive personalized treatment using MOME. Decision curve analyses indicate: (a) long-term preference range for malignant screening using MOME in DS1 internal test set 1; (b) the same for DS1 internal test set 2; (c) the same for DS2; (d) the same for DS3; (e) high net benefit from MOME in reducing biopsies for BI-RADS 4 patients in DS2; (f) potential of MOME in ROC curve performance for triple-negative breast cancer subtype classification; (g) potential of MOME in ROC curve performance for predicting response to new auxiliary treatment.

Conclusion

Overall, MOME represents a discriminative, robust, and scalable multimodal model, paving the way for non-invasive personalized management of breast cancer based on multiparametric imaging data.

For more details, please refer to the original paper or contact Smart Lab. You may also find more information about Dr.Luo’s research on his homepage.

Acknowledgments

This work was supported by the following institutions:

  1. Department of Computer Science and Technology, The Hong Kong University of Science and Technology, Hong Kong, China.
  2. Department of Biomedical Informatics, Harvard University, Boston, U.S.A.
  3. Department of Radiology, Shenzhen People’s Hospital, Shenzhen, China.
  4. Department of Radiology, PLA Middle Military Command General Hospital, Wuhan, China.
  5. Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China.
  6. Department of Diagnostic Radiology, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong, China.
  7. Department of Imaging and Interventional Radiology, The Chinese University of Hong Kong, Hong Kong, China.
  8. Department of Oncology, Yunnan Cancer Hospital, Kunming, China.
  9. Department of Radiology, 5th Medical Center of Chinese PLA General Hospital, Beijing, China.
  10. The Second School of Clinical Medicine, Southern Medical University, Guangzhou, China.
  11. Department of Chemical and Biological Engineering, The Hong Kong University of Science and Technology, Hong Kong, China.
  12. Division of Life Science, The Hong Kong University of Science and Technology, Hong Kong, China.

References

[1] Luo, L., …, Chen, H. (2024) Towards Non-invasive and Personalized Management of Breast Cancer Patients from Multiparametric MRI via A Large Mixture-of-Modality-Experts Model”, arXiv e-prints, 2024. doi:10.48550/arXiv.2408.12606.
[2] Luo, L., Wang, X., Lin, Y., Ma, X., Tan, A., Chan, R., … & Chen, H. (2024). Deep learning in breast cancer imaging: A decade of progress and future directions. IEEE Reviews in Biomedical Engineering.
[3] Luo, L., Chen, H., Wang, X., Dou, Q., Lin, H., Zhou, J., … & Heng, P. A. (2019). Deep angular embedding and feature correlation attention for breast MRI cancer analysis. In Medical Image Computing and Computer Assisted Intervention–MICCAI 2019: 22nd International Conference, Shenzhen, China, October 13–17, 2019, Proceedings, Part IV 22 (pp. 504-512). Springer International Publishing.
[4] Zhou, J., Luo, L., Dou, Q., Chen, H., Chen, C., Li, G. J., … & Heng, P. A. (2019). Weakly supervised 3D deep learning for breast cancer classification and localization of the lesions in MR images. Journal of Magnetic Resonance Imaging, 50(4), 1144-1151.
[5] Bejnordi, B. E., Veta, M., Van Diest, P. J., Van Ginneken, B., Karssemeijer, N., Litjens, G., … & CAMELYON16 Consortium. (2017). Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer. Jama, 318(22), 2199-2210.
[6] Chen, H., Dou, Q., Wang, X., Qin, J., & Heng, P. (2016, February). Mitosis detection in breast cancer histology images via deep cascaded networks. In Proceedings of the AAAI conference on artificial intelligence (Vol. 30, No. 1).
[7] Lin, H., Chen, H., Dou, Q., Wang, L., Qin, J., & Heng, P. A. (2018, March). Scannet: A fast and dense scanning framework for metastatic breast cancer detection from whole-slide image. In 2018 IEEE winter conference on applications of computer vision (WACV) (pp. 539-546). IEEE.
[8] Xiang, H., Wang, X., Xu, M., Zhang, Y., Zeng, S., Li, C., … & Lin, X. (2023). Deep Learning-assisted diagnosis of breast lesions on us images: a multivendor, multicenter study. Radiology: Artificial Intelligence, 5(5), e220185.