ats678

ats678 t1_jc4lem7 wrote

In the same fashion as LLM, I think Large Vision Models and multimodal intersections with LLM are the next big thing.

Apart from that, I think things such as model quantisation and model distillation are going to become extremely relevant in the short term. If the trend of making models larger will keep running at this pace, it will be necessary to find solutions to run them without using a ridiculous amount of resources. In particular I can see people pre-train large multimodal models and then distill them for specific tasks

1

ats678 t1_issnre9 wrote

From an Applied research perspective, there’s still a lot of work done with GANs or reusing some of the concepts of Adversarial Learning (I believe some diffusion models actually use a type of adversarial loss during training). Although diffusion models showed to perform extremely well in various tasks, there’s still a lot of work to be done in order to make them usable in practical contexts: first of all, the hardware requirements to train them are extremely expensive (stable diffusion for instance used 256 GPUs to train their model), then these are also extremely large to be deployed for inference. These are all factors that in an applied context might make you use a GAN instead of a diffusion model (at least for now, you never know what people will find out in the next couple of months!)

1