farmingvillein t1_ixncbjn wrote
Check out https://twitter.com/cxbln/status/1595652302123454464:
> 🎉Introducing RoentGen, a generative vision-language foundation model based on #StableDiffusion, fine-tuned on a large chest x-ray and radiology report dataset, and controllable through text prompts!
(Not your full problem, but you may find it helpful!)
More generally, you could probably use the same image-to-text techniques that get used to validate a stablediffusion model.
Or for a really quick-and-dirty solution, you could try using a model like theirs to generate training data (image, text pairs) and train an image => txt model (which they do a variant of in the paper).
Viewing a single comment thread. View all comments