kingdroopa OP t1_j4v5o38 wrote on January 18, 2023 at 2:09 PM

Reply to comment by new_name_who_dis_ in [D] Suggestion for approaching img-to-img? by kingdroopa

Could you recommend any SOTA models using U-NET?

Anjum48 t1_j4v8mpm wrote on January 18, 2023 at 2:30 PM

+1 for UNets. Since IR will be a single channel you could use a single class semantic segmentation-type model (i.e. a UNet with a 1-channel output passed through a sigmoid). Something like this would get you started:

model = sm.Unet('resnet34', classes=1, activation='sigmoid')

Edit: Forgot the link for the package I'm referencing: https://github.com/qubvel/segmentation_models

Many of the most popular encoders/backbones are implemented in that package

Edit 2: Is the FOV important? If you could resize the images so that the RGB & IR FOV are equivalent then that would make things a lot simpler

kingdroopa OP t1_j4vafrc wrote on January 18, 2023 at 2:43 PM

Thanks a lot! Will look into it, but seems like the U-NET outputs are segmentation masks, whilst I want it to actually output (generate) IR image equivalents of the RGB image. Is there some idea that I'm missing, perhaps?

Anjum48 t1_j4vc9kp wrote on January 18, 2023 at 2:56 PM

The Unet I described will output a continuous number for each pixel between 0 & 1, which you can use as a proxy for your IR image.

People often use a threshold to this image (e.g. 0.5) to create a mask which might be where you are getting confused

kingdroopa OP t1_j4vh0sq wrote on January 18, 2023 at 3:28 PM

Ahh, I see. Thanks! I'll write it down in my TODO list. Might have to investigate seg masks a bit more :)