Recently, a review paper titled “Advancing Healthcare with Foundation Models: Challenges, Opportunities, and Future Directions,” led by the HKUST Smart Lab, was published in IEEE Reviews in Biomedical Engineering (IEEE RBME, IF=17.2). This comprehensive survey explores the current developments of Healthcare Foundation Models (HFM), spanning areas such as medical imaging, language, bioinformatics, and multimodal subfields. It covers over 200 research methodology papers, more than 100 relevant datasets, and sixteen application domains. The paper further discusses the significant challenges currently facing HFMs and offers insights into future directions for the development of foundation models in healthcare. For the latest HFMs papers and resources, please refer to our website: Awesome Foundation Models for Advancing Healthcare.
Paper Link: https://arxiv.org/abs/2404.03264
Overall process of Healthcare Foundation Models, from data to foundation models and applications.
Introduction
In the past decade, the development of artificial intelligence (AI), particularly deep learning (DL), has brought about disruptive advancements in medical technology. AI models, benefiting from learning from medical data, can decouple relevant information within the data and assist in medical practice. In some impactful clinical diseases, AI models have achieved performance close to that of medical experts, demonstrating tremendous application potential in the field. However, there remains a significant gap between expert AI models designed for specific medical tasks and the diverse medical scenarios and needs, hindering their widespread application in medical practice. Thus, a long-standing open question in healthcare is, “Can we build general AI models to benefit various medical tasks?”
Recent research on foundation models offers a hopeful answer to this question, enabling AI models to learn general capabilities and apply them across various medical scenarios. In relevant subfields of healthcare AI, including language, vision, bioinformatics, and multimodal domains, Healthcare Foundation Models (HFM) have shown impressive successes. These models possess the ability to solve complex clinical problems, laying the groundwork for improving the efficiency and effectiveness of medical practice, thereby advancing the healthcare field.
The emergence of HFMs is attributed to the continuous accumulation of medical data, advancements in AI algorithms, and improvements in computational infrastructure. However, significant deficiencies still exist in data, algorithms, and computational infrastructure, which remain the root of various challenges faced by HFMs. Thus, HFMs present a new future filled with opportunities and challenges. This review aims to address the following questions: 1) What is the current progress of foundation models in healthcare? 2) What challenges are foundation models currently facing in healthcare? 3) What potential future directions should we focus on and explore to further advance HFMs? Through the exploration of these questions, this paper outlines the current progress of HFMs and provides a clear vision for their future development.
A Brief History of Healthcare Foundation Models
According to the definition by Bommasani et al., “foundation models” refer to any models that are pre-trained on large-scale data and possess the ability to adapt to various tasks. A sociological characteristic of the foundation model era is the widespread acceptance of applying a certain foundation AI model to a multitude of different tasks. A representative turning point of the foundation model era occurred at the end of 2018 with the introduction of the BERT model in the field of natural language processing (NLP). Following this, pre-trained models were widely applied as foundation models in the NLP field and gradually expanded to other related areas.
Since 2018, the number of research papers on foundation models in healthcare has seen exponential growth.
With the rapid development of foundation models, AI in healthcare has gradually shifted from expert models to general models. In early 2019, BioBERT, based on BERT, emerged as a language foundation model (LFM) in the healthcare domain. By the end of 2022, ChatGPT garnered widespread attention for its powerful generality, benefiting more healthcare professionals from foundation models and further sparking research interest in HFMs. In August 2023 alone, over 200 papers related to ChatGPT in healthcare were published. For visual foundation models (VFM), many preliminary works focused on pre-training or transfer learning. The significant impact of the Segment Anything Model (SAM) has also triggered a research surge for general visual models in healthcare. In the field of bioinformatics, AlphaFold2 achieved first place in protein structure prediction at CASP14 in 2020, igniting great interest in bioinformatics foundation models (BFM) and further advancing research in RNA, DNA, protein, and more. In early 2021, OpenAI developed the CLIP model, achieving significant performance in large-scale cross-modal learning of vision and language. Due to the inherently multimodal nature of healthcare data, this technology was quickly applied in healthcare to integrate images, omics, text, and other multimodal data, constructing powerful multimodal foundation models (MFM). As of February 2024, the number of representative papers on foundation models across four subfields has shown exponential growth, with emerging paradigms and technologies rapidly developing within HFMs.
Current Methods in Healthcare Foundation Models
Figure 1 illustrates how HFMs learn representations of information from large-scale, diverse healthcare data and then adapt to various healthcare application pathways. Therefore, we summarize LFM, VFM, BFM, and MFM from the perspectives of pre-training and adaptation. Based on learning characteristics, we categorize pre-training paradigms into the following types: Generative Learning (GL) enables models to generate meaningful information from features, thereby acquiring representation capabilities; Contrastive Learning (CL) brings similar instances closer together in feature space while pushing dissimilar instances apart; Hybrid Learning (HL) combines different learning methods to learn data representations; and Supervised Learning (SL) trains models using labeled data to predict specified outcomes. We categorize adaptation paradigms into the following types: Fine-Tuning (FT) adjusts the internal parameters of pre-trained models; Adapter Tuning (AT) adds new parameters (adapters) to pre-trained models and only trains these additional parameters; and Prompt Engineering (PE) designs or learns prompts to guide pre-trained models in executing desired tasks.
A Sankey diagram illustrating the connections between pre-training paradigms, HFM subfields, and adaptation paradigms. “Non” indicates that the work directly applies existing pre-trained models to its tasks rather than pre-training its models.
Through research across the four fields, we identified five observations regarding the current progress of HFMs:
- General foundation models can adapt to some healthcare domains. Over one-third of the work directly adapts existing pre-trained models to tasks in language, vision, and multimodal domains, except for bioinformatics. Compared to bioinformatics, the visual and linguistic aspects in healthcare and general domains are relatively unified, allowing some successful pre-trained models from general domains to be applicable to healthcare tasks. Bioinformatics lacks general pre-trained models tailored to its highly specific omics data, resulting in only two works studying adaptation methods for predicting embeddings from LFMs.
- Most LFMs directly adapt existing pre-trained language models to healthcare tasks. We speculate that this is due to language being a human-created construct with strong transferability. Therefore, LFMs pre-trained in general domains possess this transferability and can adapt to healthcare tasks.
- Most pre-training efforts focus on unlabeled learning. Due to the vast amount of pre-training data, using supervised learning requires a significant amount of manually labeled data, which is costly. Consequently, most work concentrates on self-supervised paradigms, such as GL, CL, or HL, to learn general representations at low labeling costs.
- Supervised learning is still used for VFM pre-training. The continuity of visual information makes it challenging to decouple semantics within image contexts using only self-supervised paradigms. Therefore, some work attempts to leverage supervised learning to provide more direct optimization objectives for models, driving them to decouple internal semantics of images and learn meaningful features. Supervised learning has also been applied in general scenarios for language, bioinformatics, and multimodal domains, demonstrating its potential as a general pre-training paradigm for healthcare applications.
- Fine-tuning remains widely used for adaptation. Over half of the work utilizes fine-tuning paradigms, as fine-tuning can provide a stable learning process. Some new fine-tuning techniques, such as LoRA, are being explored to achieve low-parameter, high-efficiency adaptation methods.
A tree diagram summarizing key methods in HFMs.
As shown in Figure 4, based on relevant reviews and our research, we further classify key technologies in healthcare foundation models from a methodological perspective:
- For pre-training methods, supervised learning (SL) and self-supervised learning (SSL) are the two main branches. Due to the high data demands of HFMs, SSL pre-training methods are widely researched. The GL paradigm can be divided based on the type of generative prediction goals into parameter transformation prediction, sequence prediction, context prediction, cross-modal prediction, and hybrid prediction. The CL paradigm can be classified based on the types of contrastive pairs into instance discrimination, inter-group discrimination, relational discrimination, cross-modal discrimination, and hybrid discrimination.
- For adaptation methods, FT, PE, and AT are widely used to adapt pre-trained HFMs to downstream tasks. FT can be subdivided based on the number and type of parameters tuned in the network into full-parameter fine-tuning, partial parameter fine-tuning, and reparameterization-based fine-tuning. Reparameterization-based fine-tuning typically employs low-rank adaptation (LoRA) and its variants for efficient tuning without increasing parameter count. PE can be categorized based on prompt types into handcrafted prompts, learnable prompts, automatically synthesized prompts, and few-shot prompting methods. AT can be divided based on the placement of adapters into sequential adapters, parallel adapters, and hybrid adapters.
Current Challenges Faced
Data, algorithms, and computational infrastructure serve as the three pillars of artificial intelligence, providing opportunities for the development of HFMs. However, due to their current developmental levels, they remain the root of various challenges faced by HFMs.
Data, algorithms, and computational infrastructure as the three pillars of artificial intelligence provide opportunities for the development of HFMs, but their current developmental levels remain the root of various challenges faced by HFMs.
- Data
Data scarcity is a core challenge faced by HFMs, as the general capabilities of foundation models rely on learning from large-scale, diverse datasets. However, inherent issues such as ethics, diversity, heterogeneity, and high costs in healthcare data hinder the construction of large-scale datasets, posing significant challenges to the current development of HFMs.
- Ethical Issues: The collection, sharing, and use of healthcare data must comply with ethical requirements, making it impossible to conduct large-scale data collection at the internet level, as seen in other fields, thus complicating the construction of large datasets.
- Diversity Issues: Due to the long-tail distribution of data modalities and disease distributions in healthcare data, it is challenging to collect balanced large-scale datasets, leading to a lack of diversity in healthcare datasets, which is a significant challenge.
- Heterogeneity Issues: The characteristics of healthcare data vary across different populations, regions, and medical centers, leading to heterogeneity issues in data space shifts, target shifts, and concept shifts in practical applications of HFMs, making effective generalization difficult.
- Cost Issues: Due to the specialized nature of healthcare data, data collection requires expensive specialized equipment, and the labeling and organization of data demand considerable time from professionals, making the construction of large-scale healthcare datasets costly.
- Algorithms
Although algorithms have been researched for decades with the development of AI, the unprecedented data volume, model scale, and application scope in the foundation model era expose new challenges for algorithms.
- Accountability Issues: Due to the close relationship between healthcare and human life, accountability becomes particularly important in HFMs, including sub-issues such as model interpretability, fairness, and safety. While some research on responsible AI exists, the unprecedented scale and application range of foundation models make the use of these technologies more challenging.
- Reliability Issues: The reliability of model outputs determines whether people can trust HFMs. Issues such as large model hallucinations and outdated knowledge continue to undermine confidence in using foundation models in healthcare settings.
- Capability Issues: The capabilities of foundation models still struggle to meet clinical demands. Firstly, the knowledge capacity of models may not encompass diverse knowledge across different modalities such as language, vision, and bioinformatics. Secondly, the functionality of foundation models remains monotonous, making it difficult to meet complex multimodal and multi-task clinical needs. Finally, most foundation models lack reasoning capabilities, making it challenging to provide reasonable inference processes for output results, which affects trust.
- Adaptability Issues: The adaptability of foundation models determines their generalization to different scenarios; however, the transferability of foundation models and their scalability on different devices remain weak.
- Computational Infrastructure
The large-scale attributes of foundation models (including parameter count and data volume) necessitate unprecedented large-scale computational infrastructure for training and inference processes, leading to new challenges.
- Computational Issues: The time and computational resource costs required for training or adapting HFMs to downstream tasks are high, exceeding the budgets of most researchers and organizations.
- Environmental Issues: Developing such large-scale foundation models incurs significant environmental costs, and the long-term deployment of foundation models will continue to consume environmental resources. Therefore, reducing environmental impact in the construction and deployment of foundation models and promoting sustainable development has become an important requirement.
Future Potential Research Directions
Future directions for healthcare foundation models. In this paper, we discuss the transformations in roles, implementation methods, applications, and areas of focus for HFMs.
- Exploring New Paradigms Beyond AI Replacing Humans
While traditional paradigms tend to leverage AI to automate certain healthcare tasks, thereby replacing manual human work, the development of HFMs reveals that human-AI collaboration paradigms exhibit higher practical application value. Specifically:
- Further exploring human-AI collaboration to enhance healthcare service capabilities, leveraging human-AI collaboration to complete more challenging healthcare tasks.
- Further exploring supervised medical intelligence that aligns with healthcare needs, enabling AI to operate under the supervision of human experts to meet the accountability requirements of practical healthcare.
- Further optimizing human-AI collaboration methods, as the development of foundation models enhances the emergent capabilities of AI models, further optimizing human-AI collaboration patterns for more efficient and well-structured cooperation.
- From Static Models to Dynamic Models
Although static AI models have demonstrated effectiveness in specific healthcare tasks, real-world healthcare practice requires coordination of diverse clinical requirements and data modalities from multiple departments. Therefore, constructing dynamic AI models is an important future direction for HFMs. Specifically:
- Enhancing the representation capabilities of foundation models by constructing robust dynamic neural network architectures and researching continual learning and model editing techniques to represent broader data distributions and adapt to real-world data distribution changes.
- Improving task adaptability of models by reducing adaptation costs, using less data and computation to enhance the flexibility of foundation models, while designing more effective methods to boost the emergent capabilities of HFMs to leverage the rich knowledge learned from large-scale data.
- Enhancing the scalability of models by developing effective scaling methods for foundation models to dynamically adapt to computational environments and achieve efficient inference on resource-constrained devices.
- From Ideal Settings to Complex Real-World Scenarios
Previous applications of AI in healthcare often assumed operation under specific problems and certain ideal conditions, failing to address the complexities and uncertainties of the real world. Therefore, exploring the application of HFMs in practical healthcare practice has become an important future direction. Specifically:
- Researching the adaptability to multiple domains to achieve broad applications under different data distribution scenarios.
- Researching adaptability to multiple/open tasks, enabling dynamic adaptation in the face of the inherently multi-task scenarios of healthcare and the ability to respond to unknown inputs in open-world settings.
- Researching multimodal processing capabilities, enabling intelligent models to handle multimodal scenarios in healthcare while addressing key multimodal analysis issues such as modality absence and fusion.
- From Capability Focus to Trust Focus
As the performance of foundation models in healthcare continues to improve, attention will gradually shift from exploring the capabilities of HFMs to trusting their behaviors. Specifically:
- Building interpretable HFMs by designing machine learning principles and model interpretability evidence that can address the unprecedented scale of foundation models, helping researchers and users understand these models.
- Building secure HFMs by designing mechanisms that can withstand external attacks while enhancing the reliability of the models themselves, establishing reliable accountability mechanisms to increase user trust.
- Building sustainable HFMs by adopting low-carbon, environmentally friendly methods to construct efficient, low-cost foundation models, allowing HFMs to develop with minimal environmental impact.
Conclusion
This review provides a potential answer to the question, “Can we build general AI models to benefit various medical tasks?” An increasing number of healthcare practices are benefiting from the development of HFMs, achieving advanced intelligent healthcare services. Although HFMs are gradually demonstrating their immense application value, there remains a lack of clear understanding regarding the challenges, new opportunities, and potential future directions faced by foundation models in healthcare practice. Therefore, this review first provides a comprehensive overview and analysis of HFMs, including methods, data, and applications, aiding in the understanding of the current progress of HFMs. It then delves into key challenges in data, algorithms, and computational infrastructure, elucidating the shortcomings of HFMs. Finally, it anticipates future directions regarding roles, implementation methods, applications, and areas of focus, showcasing the immense prospects of HFMs in advancing healthcare. More specific content from the review can be found in the original paper (https://arxiv.org/abs/2404.03264).
Paper Citation
Yuting He, Fuxiang Huang, Xinrui Jiang, Yuxiang Nie, Minghao Wang, Jiguang Wang, and Hao Chen. “Foundation Model for Advancing Healthcare: Challenges, Opportunities and Future Directions,” IEEE RBME, 2024.