Submitted by Fabulous-Let-822 t3_115btl3 in MachineLearning
With the advent of stable diffusion/midjourney/dalle and upcoming text-to-video models from Google and Meta, what will be major challenges in computer vision? It feels like once text-to-video models get released, visual reasoning will be mostly solved, and the only thing left to do is to improve model accuracy/efficiency from there. I am fairly new to Computer Vision and would love to learn new possible areas of research. Thank you in advance!
Ulfgardleo t1_j912luv wrote
computer vision is a much broader problem domain than text to image or text to video. AFAIK 3D pose estimation under occlusions is an unsolved problem, still.