MMFM-BIOMED @ CVPR 2026

About

Biomedical data spans diverse modalities across biological scales - from molecular genomics and cellular microscopy to tissue pathology, organ-level radiology, and patient-level electronic health records. While each modality provides unique insights, integrating these heterogeneous data sources remains a significant challenge in creating comprehensive biomedical understanding.

The Multimodal Foundation Models for Biomedicine (MMFM-BIOMED) workshop brings together experts across disciplines to tackle this challenge. The workshop explores two critical questions:

Technical Challenges

What are the core limitations of existing multimodal learning techniques when applied to biomedical data? Challenges include cross-modal alignment between data with different spatial and temporal resolutions; handling extreme data imbalances between well-annotated and sparse modalities; maintaining modality-specific contexts while enabling knowledge transfer across domains.

New Opportunities

What transformative opportunities do multimodal foundation models unlock in biomedicine? Potential breakthroughs include multi-scale disease diagnosis by combining radiology images with pathology slides; personalized treatment by integrating wearable sensor data with genomic profiles; context-aware operations by synchronizing surgical videos with patient records.

Invited Speakers

James Zou

Stanford University

Emily Fox

Stanford University

Maria Brbic

EPFL

Mengdi Wang

Princeton University

Tentative Schedule

8:45-8:50

Opening Remarks

8:50-9:00

Spotlight Talk 1: Challenges in Retrieval and Context Engineering for Multimodal Patient Timelines (Ryan Nayebi)

9:00-9:10

Spotlight Talk 2: Automated Report-Derived Oncology VQA Benchmark for Evaluating Vision-Language Models on 3D Medical Imaging (Bo Liu)

9:10-9:20

Spotlight Talk 3: Reality vs. Priors: Atypical Anatomy Breaks Vision-Language Models (Leon Mayer)

9:20-9:30

Spotlight Talk 4: Too Sure to Be Right: High-Confidence Wrong and Latent Failure Signals in Medical VQA (Haohao Zhu)

9:30-10:00

Invited Talk 1: Multimodal Generative Modeling of Cellular Complexity (Maria Brbic)

10:00-10:30

Invited Talk 2: Unraveling Disease: How Do We Learn Causality in Drug Discovery? (Emily Fox)

10:30-11:30

Poster & Discussion

11:30-12:00

Invited Talk 3: Learning the language of sleep and the Virtual Biotech (James Zou)

12:00-12:30

Invited Talk 4: LabOS: The AI-XR Co-Scientist That Sees and Works With Humans (Mengdi Wang)

12:30-12:40

Closing Remarks

Call for Papers

We invite short, non-archival paper submissions (4 pages maximum excluding references using CVPR 2026 template) that explore both challenges and opportunities in multimodal foundation models for biomedicine. We encourage two types of submissions:

Challenges in Current Techniques

Papers highlighting limitations in existing methods, especially simple yet intuitive approaches that unexpectedly lead to negative outcomes.

Multimodal foundation models - pre-training, post-training, alignment
Agentic framework - design, evaluation, control
Benchmark and evaluation - failure modes, reproducibility

Opportunities with MMFMs

Papers showcasing novel applications, especially in underexplored areas such as drug discovery and surgery.

Selected papers will be featured as posters, and four will receive spotlight oral presentations.

Submission platform: OpenReview (MMFM-BIOMED)

Submission deadline: ~~May 1, 2026~~

Notification date: ~~May 10, 2026~~

Workshop date: June 3, 2026

Reminder: Please bring your poster (following CVPR 2026 poster requirement) to the workshop

Accepted Papers

All accepted papers participate in the workshop poster session. Four papers are selected for oral presentations. We will update the poster floor location soon. Please design and print your poster and bring it with you to the workshop.

Note: Two posters share each poster board, so each paper should use only one half (one side) of its assigned board. Please find your assigned Poster Board ID in the table below.

Poster guidelines: follow the official CVPR 2026 poster printing information for workshop dimensions (half size), templates, logistics, and optional onsite printing.

MMFM-BIOMED 2026 accepted oral and poster papers
Poster Board ID	Format	Title	Authors
258	Oral	Automated Report-Derived Oncology VQA Benchmark for Evaluating Vision-Language Models on 3D Medical Imaging	Bo Liu, Hanxue Gu, Xiangru Li, Zheren Zhu, Jacob Ellison, Kang Wang, Janine M. Lupo, Yang Yang, Hui Lin
258	Oral	Reality vs. Priors: Atypical Anatomy Breaks Vision-Language Models	Leon Mayer, Piotr Kalinowski, Caroline Maria Ebersbach, Marcel Knopp, Tim Rädsch, Evangelia Christodoulou, Annika Reinke, Fiona R. Kolbinger, Lena Maier-Hein
259	Oral	Challenges in Retrieval and Context Engineering for Multimodal Patient Timelines	Ryan D'Cunha, Ryan Nayebi, Jason Alan Fries
259	Oral	Too Sure to Be Right: High-Confidence Wrong and Latent Failure Signals in Medical VQA	Haohao Zhu, Jiayu Zhou
260	Poster	BSMAD: Bridging Semantic and Structural Manifolds for Robust Cross-Modality Medical Anomaly Detection	Shih-Chih Lin, Chia-Lin Lee, Yun-Tung Chu, Jia-Xian Jian, Wei-Chieh Sun, Fang-Yi Lin
260	Poster	PSF-Med: A Clinician-Audited Benchmark for Paraphrase Sensitivity in Medical Vision-Language Models	Binesh Sadanandan, Vahid Behzadan, Lekshmy Jayan, Arun Gopinatha Kurup
261	Poster	SCOPE: Self-Consistent Patch Reconstruction with Pathology-Aware Prototype Alignment for Anatomical Neglect in CXR Report Generation	Ngo Xuan Cuong
261	Poster	Fine-Tuning BiomedParse for Stroke Detection and Segmentation on CT: A Comparison with Gemini 2.5 Pro and GPT-5	Artun Gunturkun, Halil Ibrahim Gulluk, Özgür Ilker Koska, Olivier Gevaert
262	Poster	Exploring Prompt Alignment with Clinical Factors in Zero-Shot Segmentation VLMs for NSCLC Tumor Segmentation	Suraj Pai, Thibault Heintz, Cosmin Ciausu, Marion Tonneau, Hugo Aerts, Raymond Mak
262	Poster	Position: Why Medical AI for the Global South Must Be Built on Different Technical Principles	Azmine Toushik Wasi, Mohsin Mahmud Topu, Mahfuz Ahmed Anik, Md. Manjurul Ahsan
263	Poster	Multimodal Predictors of Heterogeneity of Treatment Effects in Transcranial Direct Current Stimulation for Knee Osteoarthritis Pain: A Machine Learning Analysis	Seoyoung Kim, Yeri Kim, Chiyoung Lee, Juyoung Park, Allison Huff, Huanrui Yang, Heewon Kim
263	Poster	Does Language Shift Break Medical Vision-Language Models? Indonesian Radiology Visual Question Answering Case Study	Pieter Christy Yan Yudhistira, Dzaki Rafif Malik, Novanto Yudistira
264	Poster	Directional Pathology–Omics Discordance from Frozen Whole-Slide Foundation Embeddings Across Lung and Kidney Cancer	Kuan-Ting Wu
264	Poster	When Pretraining Fails to Transfer: Diagnosing Representation Failure in fMRI Foundation Models	Vijay Srinivas P., Amrutha Veluppal
265	Poster	Self-Supervised Vision Transformers for CBCT-Based Detection of Temporomandibular Joint Osteoarthritis	Shradhdha Trivedi, Vrundan Sojitra
265	Poster	Label-Efficient Multimodal Trust Mapping for MRI	Jacob Ellison, Amir Sadikov, Duan Xu, Janine M. Lupo
266	Poster	RadDiff: Describing Differences in Radiology Image Sets with Natural Language	Xiaoxian Shen, Yuhui Zhang, Sahithi Ankireddy, Xiaohan Wang, Maya Varma, Henry Guo, Curtis Langlotz, Serena Yeung-Levy
266	Poster	When Prompts Mislead: Textual Dominance and Diagnostic Bias in MLLMs	Inhyuk Park, Doohyun Park
267	Poster	JASPR: Joint Spatial Representation learning of histology and spatial genomics for improved virtual genomic screening and clinical prognostication	Marija Pizurica, Eric Zimmermann, Neil Tenenholtz, James Brian Hall, Olivier Gevaert, Ava P. Amini, Lorin Crawford, Kristen A. Severson
267	Poster	Foundation Model–Driven Interpretable Discovery of Prognostic Signals Beyond Clinical Records in Histology and Spatial Proteomics	Yutong Sun, Kyeong Joo Jung, James D. Brooks, Chrystal Chadwick, Sanghee Cho, Soumya Ghose, Elizabeth McDonough, Jianwei Qiu, Robert West, Fiona Ginty, Raghu Machiraju, Parag Mallick
268	Poster	Multimodal Alignment Improves Generalizability of Genomic Biomarker Prediction in Computational Pathology	Ekaterina Redekop, Eric Zimmermann, Ava P. Amini, Alex Xijie Lu, Neil Tenenholtz, James Brian Hall, Lorin Crawford, Kristen A. Severson
268	Poster	Predicting Brain Tumor Recurrence from Multimodal, Longitudinal Patient Data	Divyanshu Tak, Diya Sreedhar, Hugo Aerts, Benjamin H. Kann
269	Poster	On the Trade-offs of LLM-Augmented Multimodal Foundation Models for Clinical Image Classification	Zhaohui Liang, Niccolo Marini, Sivaramakrishnan Rajaraman, Zhiyun Xue, Sameer Antani
269	Poster	Objective-Aligned Direct Answer SFT for Robust Multi-Frame Medical VQA	Site Li, Jianyi Hao, Xiaofeng Liu
270	Poster	When Multimodal Models Stop Listening: Modality Imbalance Induces Dominant-Modality Bias in Biomedicine	Dineth Jayakody
270	Poster	A Landmark-Based Panoramic Radiograph Dataset for Periodontal Severity and Comprehensive Dental Screening	Nayoon Kwon, Jongwon Choi, Jung Seok Lee
271	Poster	An Open Multi-Center Whole-Body FDG PET/CT Foundation Model for Tumor Segmentation	Xiaofeng Liu, Qianru Zhang, Thibault Marin, Menghua Xia, Chi Liu, Georges El Fakhri, Jinsong Ouyang

Organizers

Previous Workshop

1st edition: MMFM-BIOMED @ CVPR 2025

Multimodal Foundation Models for Biomedicine:
Challenges and Opportunities

About

Technical Challenges

New Opportunities

Invited Speakers

James Zou

Emily Fox

Maria Brbic

Mengdi Wang

Tentative Schedule

Opening Remarks

Spotlight Talk 1: Challenges in Retrieval and Context Engineering for Multimodal Patient Timelines (Ryan Nayebi)

Spotlight Talk 2: Automated Report-Derived Oncology VQA Benchmark for Evaluating Vision-Language Models on 3D Medical Imaging (Bo Liu)

Spotlight Talk 3: Reality vs. Priors: Atypical Anatomy Breaks Vision-Language Models (Leon Mayer)

Spotlight Talk 4: Too Sure to Be Right: High-Confidence Wrong and Latent Failure Signals in Medical VQA (Haohao Zhu)

Invited Talk 1: Multimodal Generative Modeling of Cellular Complexity (Maria Brbic)

Invited Talk 2: Unraveling Disease: How Do We Learn Causality in Drug Discovery? (Emily Fox)

Poster & Discussion

Invited Talk 3: Learning the language of sleep and the Virtual Biotech (James Zou)

Invited Talk 4: LabOS: The AI-XR Co-Scientist That Sees and Works With Humans (Mengdi Wang)

Closing Remarks

Call for Papers

Challenges in Current Techniques

Opportunities with MMFMs

Accepted Papers

Organizers

Yuhui Zhang

Xiaohan Wang

Yuzhe Yang

Joyce Yan-Ran Wang

Dongxia Wu

Jeffrey K Jopling

Hoifung Poon

Serena Yeung-Levy

Previous Workshop

Multimodal Foundation Models for Biomedicine:Challenges and Opportunities

About

Technical Challenges

New Opportunities

Invited Speakers

Tentative Schedule

Opening Remarks

Spotlight Talk 1: Challenges in Retrieval and Context Engineering for Multimodal Patient Timelines (Ryan Nayebi)

Spotlight Talk 2: Automated Report-Derived Oncology VQA Benchmark for Evaluating Vision-Language Models on 3D Medical Imaging (Bo Liu)

Spotlight Talk 3: Reality vs. Priors: Atypical Anatomy Breaks Vision-Language Models (Leon Mayer)

Spotlight Talk 4: Too Sure to Be Right: High-Confidence Wrong and Latent Failure Signals in Medical VQA (Haohao Zhu)

Invited Talk 1: Multimodal Generative Modeling of Cellular Complexity (Maria Brbic)

Invited Talk 2: Unraveling Disease: How Do We Learn Causality in Drug Discovery? (Emily Fox)

Poster & Discussion

Invited Talk 3: Learning the language of sleep and the Virtual Biotech (James Zou)

Invited Talk 4: LabOS: The AI-XR Co-Scientist That Sees and Works With Humans (Mengdi Wang)

Closing Remarks

Call for Papers

Challenges in Current Techniques

Opportunities with MMFMs

Accepted Papers

Organizers

Previous Workshop

Multimodal Foundation Models for Biomedicine:
Challenges and Opportunities