Submitted by Fabulous-Let-822 t3_115btl3 in MachineLearning
With the advent of stable diffusion/midjourney/dalle and upcoming text-to-video models from Google and Meta, what will be major challenges in computer vision? It feels like once text-to-video models get released, visual reasoning will be mostly solved, and the only thing left to do is to improve model accuracy/efficiency from there. I am fairly new to Computer Vision and would love to learn new possible areas of research. Thank you in advance!
ml-research t1_j90suwh wrote
Finding open problems