--- language: - en license: cc-by-nc-4.0 pipeline_tag: image-segmentation tags: - medical-imaging - vision-language-models - clip - unimedclip - biomedical - healthcare --- # MedCLIPSeg: Probabilistic Vision–Language Adaptation for Data-Efficient and Generalizable Medical Image Segmentation arXiv Project Website Code HuggingFace Dataset HuggingFace Models Citation This repository hosts the **official trained model checkpoints** for **MedCLIPSeg**, presented in the paper [MedCLIPSeg: Probabilistic Vision-Language Adaptation for Data-Efficient and Generalizable Medical Image Segmentation](https://huggingface.co/papers/2602.20423). **Authors:** Taha Koleilat, Hojat Asgariandehkordi, Omid Nejati Manzari, Berardino Barile, Yiming Xiao, Hassan Rivaz. MedCLIPSeg is a vision–language framework for **medical image segmentation** built on top of **CLIP**. It adapts CLIP for robust, data-efficient, and uncertainty-aware segmentation through probabilistic cross-modal attention and bidirectional interaction between image and text tokens. The released checkpoints correspond exactly to the experiments reported in our paper and are provided **for evaluation and reproducibility purposes only**. ## 🧠 Model Overview - **Backbone**: UniMedCLIP ViT-B/16 - **Task**: Medical Image Segmentation - **Modalities**: Ultrasound, MRI, Endoscopy, Dermoscopy, X-ray - **Training Regimes**: - Data Efficiency Evaluation - Fully supervised learning - Domain generalization ## 🚀 Reproducing Paper Results ### Step 1: Download the Checkpoints Download the checkpoints [here](https://huggingface.co/TahaKoleilat/MedCLIPSeg/tree/main), then create a directory named `outputs_medclipseg` at the root of the project and place the downloaded checkpoint folders inside it so that the directory structure matches the following layout: ``` outputs_medclipseg/ ├── BUSI/ ├── BTMRI/ ├── ISIC/ ├── Kvasir/ ├── Covid19/ ├── EUS/ └── ... ``` Each folder contains the trained **UniMedCLIP-based MedCLIPSeg checkpoints** for that dataset. ### Step 2: Run Evaluation Run the following script to reproduce the results reported in the paper: ```bash bash scripts/reproduce_eval.sh ``` This script automatically loads the corresponding checkpoints and evaluates them on the appropriate test sets. ## 📊 Outputs Evaluation outputs (segmentations and uncertainty maps) are written to: ```text outputs_medclipseg//seg_results/ outputs_medclipseg//unc_results/ ``` ### 📚 Acknowledgment of Foundation Models The underlying vision–language models used in this repository were introduced in prior work. We gratefully acknowledge the original authors: - **PubMedCLIP** Eslami, Sedigheh, Gerard De Melo, and Christoph Meinel. "Does clip benefit visual question answering in the medical domain as much as it does in the general domain?." arXiv preprint arXiv:2112.13906 (2021). - **UniMedCLIP** Khattak, Muhammad Uzair, et al. "Unimed-clip: Towards a unified image-text pretraining paradigm for diverse medical imaging modalities." arXiv preprint arXiv:2412.10372 (2024). For completeness and reproducibility, this repository also includes the **original pretrained checkpoints** of these foundation models under the `checkpoints/` directory, exactly as released by their respective authors. All MedCLIPSeg checkpoints are **adaptations built on top of these pretrained models** and are released strictly for **research and non-commercial use**, in accordance with their respective licenses. ## 📖 Citation If you use these checkpoints in your research, please cite: ```bibtex @article{koleilat2026medclipseg, title={MedCLIPSeg: Probabilistic Vision-Language Adaptation for Data-Efficient and Generalizable Medical Image Segmentation}, author={Koleilat, Taha and Asgariandehkordi, Hojat and Manzari, Omid Nejati and Barile, Berardino and Xiao, Yiming and Rivaz, Hassan}, journal={arXiv preprint arXiv:2602.20423}, year={2026} } ```