Viewing a single comment thread. View all comments

new_name_who_dis_ t1_j4v5bet wrote

Architecturally probably some form of unet is best. It’s the architecture of choice for things like segmentation so I imagine it would be good for IR as well

6

kingdroopa OP t1_j4v5o38 wrote

Could you recommend any SOTA models using U-NET?

2

Anjum48 t1_j4v8mpm wrote

+1 for UNets. Since IR will be a single channel you could use a single class semantic segmentation-type model (i.e. a UNet with a 1-channel output passed through a sigmoid). Something like this would get you started:

model = sm.Unet('resnet34', classes=1, activation='sigmoid') 

Edit: Forgot the link for the package I'm referencing: https://github.com/qubvel/segmentation_models

Many of the most popular encoders/backbones are implemented in that package

Edit 2: Is the FOV important? If you could resize the images so that the RGB & IR FOV are equivalent then that would make things a lot simpler

4

kingdroopa OP t1_j4vafrc wrote

Thanks a lot! Will look into it, but seems like the U-NET outputs are segmentation masks, whilst I want it to actually output (generate) IR image equivalents of the RGB image. Is there some idea that I'm missing, perhaps?

2

Anjum48 t1_j4vc9kp wrote

The Unet I described will output a continuous number for each pixel between 0 & 1, which you can use as a proxy for your IR image.

People often use a threshold to this image (e.g. 0.5) to create a mask which might be where you are getting confused

2

kingdroopa OP t1_j4vh0sq wrote

Ahh, I see. Thanks! I'll write it down in my TODO list. Might have to investigate seg masks a bit more :)

1