---
language:
- en
license: cc-by-nc-4.0
pipeline_tag: image-segmentation
tags:
- medical-imaging
- vision-language-models
- clip
- unimedclip
- biomedical
- healthcare
---
# MedCLIPSeg: Probabilistic Vision–Language Adaptation for Data-Efficient and Generalizable Medical Image Segmentation
This repository hosts the **official trained model checkpoints** for **MedCLIPSeg**, presented in the paper [MedCLIPSeg: Probabilistic Vision-Language Adaptation for Data-Efficient and Generalizable Medical Image Segmentation](https://huggingface.co/papers/2602.20423).
**Authors:** Taha Koleilat, Hojat Asgariandehkordi, Omid Nejati Manzari, Berardino Barile, Yiming Xiao, Hassan Rivaz.
MedCLIPSeg is a vision–language framework for **medical image segmentation** built on top of **CLIP**. It adapts CLIP for robust, data-efficient, and uncertainty-aware segmentation through probabilistic cross-modal attention and bidirectional interaction between image and text tokens.
The released checkpoints correspond exactly to the experiments reported in our paper and are provided **for evaluation and reproducibility purposes only**.
## 🧠 Model Overview
- **Backbone**: UniMedCLIP ViT-B/16
- **Task**: Medical Image Segmentation
- **Modalities**: Ultrasound, MRI, Endoscopy, Dermoscopy, X-ray
- **Training Regimes**:
- Data Efficiency Evaluation
- Fully supervised learning
- Domain generalization
## 🚀 Reproducing Paper Results
### Step 1: Download the Checkpoints
Download the checkpoints [here](https://huggingface.co/TahaKoleilat/MedCLIPSeg/tree/main), then create a directory named `outputs_medclipseg` at the root of the project and place the downloaded checkpoint folders inside it so that the directory structure matches the following layout:
```
outputs_medclipseg/
├── BUSI/
├── BTMRI/
├── ISIC/
├── Kvasir/
├── Covid19/
├── EUS/
└── ...
```
Each folder contains the trained **UniMedCLIP-based MedCLIPSeg checkpoints** for that dataset.
### Step 2: Run Evaluation
Run the following script to reproduce the results reported in the paper:
```bash
bash scripts/reproduce_eval.sh
```
This script automatically loads the corresponding checkpoints and evaluates them on the appropriate test sets.
## 📊 Outputs
Evaluation outputs (segmentations and uncertainty maps) are written to:
```text
outputs_medclipseg//seg_results/
outputs_medclipseg//unc_results/
```
### 📚 Acknowledgment of Foundation Models
The underlying vision–language models used in this repository were introduced in prior work. We gratefully acknowledge the original authors:
- **PubMedCLIP**
Eslami, Sedigheh, Gerard De Melo, and Christoph Meinel. "Does clip benefit visual question answering in the medical domain as much as it does in the general domain?." arXiv preprint arXiv:2112.13906 (2021).
- **UniMedCLIP**
Khattak, Muhammad Uzair, et al. "Unimed-clip: Towards a unified image-text pretraining paradigm for diverse medical imaging modalities." arXiv preprint arXiv:2412.10372 (2024).
For completeness and reproducibility, this repository also includes the **original pretrained checkpoints** of these foundation models under the `checkpoints/` directory, exactly as released by their respective authors.
All MedCLIPSeg checkpoints are **adaptations built on top of these pretrained models** and are released strictly for **research and non-commercial use**, in accordance with their respective licenses.
## 📖 Citation
If you use these checkpoints in your research, please cite:
```bibtex
@article{koleilat2026medclipseg,
title={MedCLIPSeg: Probabilistic Vision-Language Adaptation for Data-Efficient and Generalizable Medical Image Segmentation},
author={Koleilat, Taha and Asgariandehkordi, Hojat and Manzari, Omid Nejati and Barile, Berardino and Xiao, Yiming and Rivaz, Hassan},
journal={arXiv preprint arXiv:2602.20423},
year={2026}
}
```