I think you absolutely need to use something like YOLO for object identification/classification.

  • Humans and animals are warmer than the environment

  • Cars and other vehicles are warmer than the environment

  • Glass blocks IR but not visible light

You could get the overall "look" with just image-based networks, but to make it really convincing (more like COD's thermal vision) you need classification in order to make objects look hot that are supposed to be hot.