[Nature Biomedical Engineering] A generalizable pathology foundation model using a unified knowledge distillation pretraining framework

Recently, the HKUST Smart Lab team, in collaboration with several leading institutions, has completed a groundbreaking project on the pathology foundation model. Published in Nature Biomedical Engineering, this work presents a comprehensive evaluation of foundation models in pathology and introduces novel solutions to address current limitations.

Introduction

Pathology, often referred to as the “gold standard” in cancer diagnosis, has seen significant advancements with the development of artificial intelligence (AI). However, existing models are often task-specific and lack the generalizability needed for complex clinical workflows. To address this limitation, a collaborative research team led by Professor Hao Chen at HKUST’s Smart Lab, along with institutions such as Southern Medical University, Shanghai AI Lab, and others, has introduced the Generalizable Pathology Foundation Model (GPFM).

GPFM represents a milestone in pathology AI, excelling across 72 clinical tasks spanning six major categories, including diagnosis, prognosis, and quality assurance (QA). The findings have been published in Nature Biomedical Engineering, highlighting the potential for this model to address the limitations of current “specialized” AI systems.

GPFM

Existing pathology AI systems often excel in specific tasks but struggle with broader adaptability. These challenges include:

  • Task-specificity: Models may achieve high accuracy in tissue classification but fail in survival prediction or report generation.
  • Generalization gap: Without a unified evaluation framework, the adaptability of models across diverse pathology tasks remains untested.
  • Complex clinical applications: The lack of a “comprehensive AI” hinders the deployment of AI into integrated clinical workflows.
  • To systematically evaluate these limitations, the team developed the first framework for generalizable pathology AI, assessing models on six task categories across 72 clinical benchmarks. Existing models achieved an average rank of only 3.7, with the best model leading in just six tasks (see Figure 2).

Method

The success of GPFM lies in three critical components:

  • A Unified Pathology Evaluation Framework The team introduced a comprehensive benchmark set to test the true generalizability of pathology models. This framework covers six task types: whole-slide classification, survival analysis, pathology QA, ROI classification, pathology report generation, and pathology image retrieval.

  • Knowledge Distillation The researchers implemented a novel, dual-engine knowledge distillation framework:

    Expert Distillation: Leveraging the strengths of high-performing models such as UNI and Phikon to integrate specialized knowledge.
    Self-Distillation: Facilitating cross-scale alignment of tissue features, enhancing generalization across microscopic and macroscopic levels.
    Large-Scale Pretraining: GPFM was pretrained on an extensive dataset of 190 million image-level samples, sourced from over 95,000 whole-slide images (WSIs) spanning 34 tissue types. This unprecedented scale ensures the model’s robustness and adaptability to unseen clinical data.

GPFM

Clinical Validation

The GPFM was comprehensively validated against state-of-the-art models across a diverse set of clinical scenarios, demonstrating significant improvements in accuracy and generalizability. In whole-slide image classification, a critical diagnostic task, GPFM achieved an average rank of 1.22 across 36 tasks, outperforming the previous leader UNI (average rank: 3.60). The model also recorded a mean AUC of 0.891, surpassing UNI by 1.6% (P < 0.001).

WSI Classification Results

For survival prediction, which requires modeling complex prognostic data, GPFM held an average rank of 2.1 across 15 tasks, securing a top-2 position in 13 of them. It achieved a C-Index score of 0.665, representing a 3.4% improvement over UNI (P < 0.001).

Survival Analysis Results

In the domain of ROI classification, GPFM achieved the best average rank of 1.88 across 16 tasks, outperforming Prov-Gigapath (rank: 3.09), with the highest mean AUC of 0.946 (+0.2%, P < 0.001). Beyond these core tasks, GPFM demonstrated strong performance in additional areas such as pathology QA, image retrieval, and pathology report generation. These results showcase GPFM’s versatility and robustness, highlighting its potential for broad clinical application. For a detailed breakdown of these tasks, refer to the published study.

ROI Classification Results

Translational Potential

Building on the capabilities of GPFM, the team developed SmartPath, a next-generation diagnostic tool designed for intraoperative workflows. SmartPath is tailored to support diagnosis in five high-incidence cancers, including lung, breast, and gastrointestinal cancers.

Currently under active deployment, SmartPath aims to accelerate the adoption of digital pathology by improving diagnostic accuracy and streamlining clinical workflows.


Resources

For more details, please see our paper A generalizable pathology foundation model using a unified knowledge distillation pretraining framework via Nature Biomedical Engineering.

Citation:
Ma, J., Guo, Z., Zhou, F. et al. A generalizable pathology foundation model using a unified knowledge distillation pretraining framework. Nat. Biomed. Eng (2025). https://doi.org/10.1038/s41551-025-01488-4