mikonvergence OP t1_jao1h1c wrote on March 2, 2023 at 8:36 PM

Reply to comment by plocco-tocco in [P] A minimal framework for image diffusion (including high-resolution) by mikonvergence

Thank you! Yes, in principle, you can generate segmentation maps using the code from the course by treating the segmentation map as the output. I'm not sure how that would compare to a non-diffusion segmentation with the same backbone network but definitely it would be interesting to explore that!

Please remember that the diffusion process generally expects data bound in [-1,+1] range, so in the framework, the images are shifted from the assumed [0,1] limits to that range automatically (via input_T and output_T). So if you go beyond the binary and use more classes within a single channel, make sure the output ground truth values are still between [0,1] (alternatively, you can split each class confidence into a separate channel but it should still be bound).

But yeah, for binary, it should work with no special adjustment!

plocco-tocco t1_jao43p9 wrote on March 2, 2023 at 8:53 PM

Thanks for the input. I have seen some papers claiming SOTA in image segmentation using diffusion so I am also curious to see how they perform.

I have another question, if you don't mind. How difficult would it be to extend the code for image-to-image translation so that it works on 3D data (64x64x64 for example)?

mikonvergence OP t1_jao5zyg wrote on March 2, 2023 at 9:04 PM

There could be a few simple solutions to extending this to 64x64x64 and each would have certain pros and cons. The two key decisions to make are in regards to the data format (perhaps there is a way to compress/reformat data so it's more digestible than direct 64x64x64) and in regards to the type of the underlying architecture (most importantly, do we use a 2D or 3D CNN, or a differnt type of topology altogether).

A trivial approach would be to use a 2D architecture with 64 channels instead of the usual 3, which could be very easily implemented with the existing framework. I predict that would be quite hard to train, however, though you might still try.

This is an area of active research (beyond DreamFusion and other popular papers I'm not very familiar with it), so exploring different solutions to this is still required, and if you discover something that works reasonably well then that will be really exciting!