What if we can use diffusion models to generate Laproscopic surgeries to train surgeons?
Problem
Asking dalle to just “generate a Laproscopic surgery” is not going to work. It will give you cartoons.
Approach
- text problem formulation: “grasper grasp gallbladder”
- encode text into latents
- do diffusion with late fusion of latents
Data: Cholec T-45
Weighting
Scoring: Perception Prioritized Weighting + Prioritization for Signal-to-Noise
(Ho et al, 2020)
Text
“[subject] [verb] [object] [surgical phase]”
“grasper grasp gallbladder in preparation”
Model
Elucidated Imagen. Dall-E is very bad; Imagen-class models works better because (why?).
Added Value to Physicians using Generated Images
Train a Classifier
Rendevouz Network: train a discriminator for procedure based on data augmented with generated images; 5% improvement.
Medical Expert Survey
“yo mr doctor man can you spot which one of these are generated?”
45% success rate.