Comments

You must log in or register to comment.

new_name_who_dis_ t1_j4v5bet wrote

Architecturally probably some form of unet is best. It’s the architecture of choice for things like segmentation so I imagine it would be good for IR as well

6

kingdroopa OP t1_j4v5o38 wrote

Could you recommend any SOTA models using U-NET?

2

Anjum48 t1_j4v8mpm wrote

+1 for UNets. Since IR will be a single channel you could use a single class semantic segmentation-type model (i.e. a UNet with a 1-channel output passed through a sigmoid). Something like this would get you started:

model = sm.Unet('resnet34', classes=1, activation='sigmoid') 

Edit: Forgot the link for the package I'm referencing: https://github.com/qubvel/segmentation_models

Many of the most popular encoders/backbones are implemented in that package

Edit 2: Is the FOV important? If you could resize the images so that the RGB & IR FOV are equivalent then that would make things a lot simpler

4

kingdroopa OP t1_j4vafrc wrote

Thanks a lot! Will look into it, but seems like the U-NET outputs are segmentation masks, whilst I want it to actually output (generate) IR image equivalents of the RGB image. Is there some idea that I'm missing, perhaps?

2

Anjum48 t1_j4vc9kp wrote

The Unet I described will output a continuous number for each pixel between 0 & 1, which you can use as a proxy for your IR image.

People often use a threshold to this image (e.g. 0.5) to create a mask which might be where you are getting confused

2

kingdroopa OP t1_j4vh0sq wrote

Ahh, I see. Thanks! I'll write it down in my TODO list. Might have to investigate seg masks a bit more :)

1

ML4Bratwurst t1_j4vfax7 wrote

I think one important part here is the "misalignment" of the images. Have you tried to cut and resize the images, so that they show the same region? You don't need a GAN then

3

kingdroopa OP t1_j4vguxt wrote

The GAN models I've tested are based on the 'unaligned' approach (e.g. CycleGAN). I still have not tested to cut and resize the images, to make them show the same region. My immediate thought would be that the top-and-bottom of both images might dissapear, but perhaps its ok still?

1

tdgros t1_j4vipol wrote

if the two cameras are rigidly fixed, then you can calibrate them like one calibrates a stereo pair, and at least align the orientation and intrinsics. The points very far from the camera will be well aligned, the ones very close will remain unaligned.

The calibration process will involve you pointing positions by hand, but the maths for the correction is very very simple after that.

5

Latter_Security9389 t1_j4v5b5e wrote

Have you already tried different variants of GAN for more stable training?

2

kingdroopa OP t1_j4v5qug wrote

Have tried CycleGAN, CUT (which is an improvement of CycleGAN), NEGCUT (similar to CUT) and ACL-GAN.

2

BlazeObsidian t1_j4v495i wrote

Autoencoders like VAE’s should work better than any other models for image to image translation. Maybe you can try different VAE models and compare their performance

I was wrong.

1

kingdroopa OP t1_j4v5t9a wrote

Hmm, interesting! Do you have any papers/article/sources supporting this claim?

2

BlazeObsidian t1_j4var74 wrote

Sorry, I was wrong. Modern deep VAE's can match SOTA GAN model performance for img superresolution(https://arxiv.org/abs/2203.09445) but I don't have evidence for recoloring.

But diffusion models are shown to outperform GAN's on multiple img-to-img translation tasks. Eg:- https://deepai.org/publication/palette-image-to-image-diffusion-models

You could probably reframe your problem as an image colorization task:- https://paperswithcode.com/task/colorization and the SOTA is still Palette linked above

1

kingdroopa OP t1_j4vbaxk wrote

Thanks :) I noticed Palette uses paired images, whilst mine are a bit unaligned. Would you considered it a paired image set, or unpaired? They look closely similar, but don't share pixel information in the top/bottom of the images.

1

BlazeObsidian t1_j4vc61q wrote

That depends on the extent to which the pixel information is misaligned I think. If cropping your images is not a solution and a large portion of your images have this issue, the model wouldn't be able to generate the right pixel information for the misaligned sections. But it's worth giving a try with Palette if the misalignment is not significant.

2

ML4Bratwurst t1_j4vfm36 wrote

Maybe you could also turn the RGB image into grayscale and use it as an additional supervised loss for regularization and maybe more stable training.

1

kingdroopa OP t1_j4vgmvb wrote

Interesting! I will for sure write that down in my TODO list, thanks!

2

nmkd t1_j4vg72g wrote

You cannot just translate visible light to IR. No matter what machine learning you use, this is physically impossible.

0

kingdroopa OP t1_j4vgho9 wrote

Correct, it's not physically possible. This is a research project to find to what degree it IS possible :)

2

nmkd t1_j4vh7cl wrote

Okay, in that case, I'll try to be a bit more helpful lol.

I think you absolutely need to use something like YOLO for object identification/classification.

  • Humans and animals are warmer than the environment

  • Cars and other vehicles are warmer than the environment

  • Glass blocks IR but not visible light

You could get the overall "look" with just image-based networks, but to make it really convincing (more like COD's thermal vision) you need classification in order to make objects look hot that are supposed to be hot.

1