[IEEE-TMI] HMIL: Hierarchical Multi-Instance Learning for Fine-Grained Whole Slide Image Classification

A study conducted by HKUST Smart Lab in collaboration with Peking University Shenzhen Hospital has been accepted by IEEE Transactions on Medical Imaging. The study introduces a Hierarchical Multi-Instance Learning (HMIL) framework designed to address the challenges of fine-grained classification in whole slide images (WSIs) for precision oncology. This innovative approach leverages hierarchical label correlations to improve the accuracy of cancer diagnosis and personalized treatment strategies.

Background

In computational pathology, the classification of WSIs is a critical task for accurate cancer diagnosis and treatment planning. WSIs, with their gigapixel resolution, present significant challenges due to the need to distinguish subtle morphological variations within the same broad category of cancer subtypes. Multi-Instance Learning (MIL) has emerged as a leading approach for WSI classification. However, current MIL methods face limitations in addressing fine-grained classification tasks. Some approaches treat the problem as a flat multi-class classification, ignoring the hierarchical relationships between cancer subtypes. Others rely on instance-level annotations to first detect a limited number of instances and then classify cancer subtypes, which is labor-intensive and limits scalability. These limitations highlight the need for a more structured and efficient framework to leverage hierarchical information without relying on extensive instance-level annotations.

Difference of Methods

Figure 1: Comparison among prior works and our proposed HMIL framework in fine-grained WSI analysis. Left: Conventional flat classification methods, which form fine-grained classification as a multi-class classification task. Middle: Prior hierarchical classification methods, which typically leverage detector-enriched instance feature for hierarchical classification. Right: Our HMIL framework relaxed the need for detectors, introducing hierarchical alignment at both instance and bag level to improve fine-grained classification.

The HMIL framework proposed in this study aimed to address these limitations by incorporating hierarchical alignment at both the instance and bag levels. This approach provides a more structured and informative learning process, enabling the model to differentiate closely related cancer subtypes more effectively.

Method

Label Hierarchy Construction

The HMIL framework leverages a hierarchical label structure to capture the relationships between different cancer subtypes. The label hierarchy is constructed based on domain knowledge and expert annotations, reflecting the hierarchical relationships between cancer subtypes. This hierarchical information is then incorporated into the model to guide the learning process and improve the classification accuracy of fine-grained cancer subtypes.

Hierarchy

Figure 2: Illustration of the label hierarchy construction. The label hierarchy is constructed based on domain knowledge and expert annotations, reflecting the hierarchical relationships between different cancer subtypes.

The HMIL Framework

Illustrated below, the HMIL framework operates in a dual-branch structure, with a coarse branch for coarse-grained classification and a fine branch for fine-grained classification. The framework incorporates a class-wise attention mechanism that aligns hierarchical information at both the instance and bag levels. Additionally, supervised contrastive learning is introduced to enhance the discriminative capability of the model for fine-grained classification.

HMIL

Figure 3: Overview of the HMIL framework. The HMIL framework consists of a dual-branch structure with a coarse branch for coarse-grained classification and a fine branch for fine-grained classification. The framework incorporates a class-wise attention mechanism that aligns hierarchical information at both the instance and bag levels. Supervised contrastive learning is introduced to enhance the discriminative capability of the model for fine-grained classification.

Experimental Results

Main Results

The HMIL framework was evaluated on three datasets: the publicly available histology datasets BRACS and PANDA, and a large-scale cytology cervical cancer (CCC) dataset collected by the researchers. The results in the table below demonstrate that HMIL outperforms state-of-the-art methods on all three datasets, achieving significant improvements in classification accuracy for fine-grained cancer subtypes.

Table 1: Evaluation of performance on the histology WSI datasets BRACS. We report the results in the form of Mean ± Std. The best and second best results are highlighted in red and blue, respectively. Results 1

Table 2: Evaluation of performance on the histology WSI datasets PANDA. We report the results in the form of Mean ± Std. The best and second best results are highlighted in red and blue, respectively. Results 2

Table 3: Evaluation of performance on the cytology WSI dataset CCC. We report the results in the form of Mean ± Std. The best and second best results are highlighted in red and blue, respectively. Results 3

We also showed the class-wise AUC performance of HMIL on three datasets in the following figure, which further demonstrates a more balanced performance across different cancer subtypes compared to the top-performing baseline methods.

Results 4

Figure 4: Class-wise AUC performance of HMIL on three datasets. The HMIL framework achieves a more balanced performance across different cancer subtypes compared to the top-performing baseline methods.

Ablation Study

Extensive ablation studies were conducted to evaluate the effectiveness of different components of the HMIL framework. The results demonstrated that the hierarchical alignment mechanism and supervised contrastive learning significantly contribute to the performance improvements achieved by HMIL. The ablation study results are summarized in the research question below.

Q1: Is current detection-based MIL framework and small-scale pretrained model sufficient for fine-grained WSI classification? From the results in the table below, we can see that DSMIL, a representative pretrained MIL framework, is not sufficient for fine-grained WSI classification probably due to the limited number of training samples. The rest of the methods all taken advantage of the instance-level annotations, however, the performance is still not satisfactory. The HMIL framework, which does not rely on instance-level annotations, achieves the best performance across all methods, demonstrating the effectiveness of the proposed hierarchical alignment.

Table 4: Comparison for fine-grained cervical cancer classification task, with the best results highlighted in bold. Results 6

Q2: Which part of the alignment mechanism contributes more to the fine-grained classification performance? Shown in the table below, the results indicate that plainly adding a corase branch cannot improve the performance of the fine-grained classification. The hierarchical alignment mechanism at both the instance and bag levels plays a crucial role in enhancing the classification accuracy of fine-grained cancer subtypes.

Table 5: Evaluation of hierarchical guidance on model performance. \cmark denotes applying the corresponding module to the model. Best results are highlighted in bold. Results 5

Q3: How does the dynamic weighting strategy affect the performance of the HMIL framework? To further investigate the impact of the dynamic weighting strategy, we conducted an ablation study by comparing the performance of HMIL with and without the dynamic weighting strategy. We firstly set the weights of the coarse and fine branches to be fixed, and the resulting loss function is: \(\begin{aligned} \mathcal{L} &= a \cdot (\mathcal{L}^{(f)}_{ce} + \mathcal{L}_{ia} + \mathcal{L}_{ba} ) + b \cdot \mathcal{L}_{reg} + \mathcal{L}^{(c)}_{ce} \end{aligned}\)

However, from the results in the table below, we can see that the performance of HMIL with fixed weights is not satisfactory. We then utilized our dynamic weighting strategy and compared the performance of HMIL with different temperature settings. The results in the table below demonstrate that the dynamic weighting strategy with proper temperature settings in supervised contrastive learning can significantly improve the performance of the HMIL framework. The best performance is achieved when the temperature is set to 0.1.

Table 6: Ablation studies of the proposed HMIL framework for loss function, with the best results highlighted in bold. Results 7

Conclusion

The success of HMIL underscores the importance of incorporating hierarchical information into MIL frameworks and highlights the potential of reducing reliance on instance-level annotations in fine-grained WSI classification tasks. By leveraging hierarchical label correlations and incorporating supervised contrastive learning, HMIL achieves state-of-the-art performance across multiple datasets. This approach not only improves the accuracy of cancer diagnosis but also provides valuable insights for personalized treatment strategies.

For more details, the source code of the HMIL framework is available at this link.

References

C. Jin et al., “HMIL: Hierarchical Multi-Instance Learning for Fine-Grained Whole Slide Image Classification ,” in IEEE Transactions on Medical Imaging, doi: 10.1109/TMI.2024.3520602.