Viewing a single comment thread. View all comments

Yuli-Ban t1_ixxkdhd wrote

Generative AI is going to lead the pack for the year. Even if there's a weak proto-AGI unveiled, it'll be the same as Gato in that it doesn't affect anything other than showing us "it is possible to generalize AI models." And while that would be exciting in its own right for many reasons, in the immediate near term, it's generative AI and biomedical AI that's going to really make waves.

Generative AI is going to have the most immediate effect of all. We should be seeing DALL-E 3 and its equivalents this coming year. Similarly, audio-generative models should be commercialized as well, as will text-to-video.

As for the biomedical space, I foresee AI models in 2023 that can run genetic diagnostics fast enough to produce treatment options for people customized entirely for them or to some general standard, where a person could be afflicted with something and, within 48 hours, already be undergoing effective treatment. Like how mRNA vaccines were created in only a day or two, though they took months to actually be rolled out. Similarly, I can see diagnostic models being applied in a way where an AI can predict with extreme accuracy if you're going to fall ill to some disorder, have some weakness, or are predisposed to something. I can see that being rolled out to clinics and hospitals within a year.

As for proto-AGI, I expect we'll see some large generalist model released with the ability to interpolate knowledge between tasks (i.e. teach it to do one thing, and it can apply that knowledge to a similar but unrelated task). And we'll geek out about it, but it's probably going to remain a purely academic endeavor.

I say focus more on generative AI for right now. Proto-AGI is exciting only because it's a stepping stone to bigger and better things; by itself, it's just a unified bundle of different AI methodologies. I'm more interested in seeing what December 2023 has in store for us in terms of Midjourney, DALL-E, and Stability.

My hard prediction for what should be feasible by December 1st, 2023, something that I'm sure would appeal to this sub: you know those old avatar programs where you could get an avatar to say certain things? We ought to be able to make far more advanced versions of that as a culmination of loads of different abilities. So, for instance, if you want your own waifu to talk to. You could reasonably generate said waifu and have it animated by AI, and have an NLG converse with you or, conversely, input text for said avatar to speak. Or input text that the avatar does as some sort of action.

Like imagine prompting the AI to generate the waifu in a room that has an M60 machine gun, and you then prompt the AI to "Pick up the gun and shoot it, but it fires roses and party favors". The img/vid module would then process that and play it out, like an interactive text-to-video program. Of course, you could reprompt it, enhance it, subtly alter it, and whatnot to get the exact sort of video you want.

On a similar note, image synthesis ought to be much more advanced. Playing with image synthesis now, I can clearly see the limitations of CLIP already, so a future generation of it might resolve a lot of the current issues by giving us the ability to:

  • Prompt on much larger windows, such as over a thousand characters long, with only minor drops in coherence the further along you go.
  • Ultra-specify prompts, such as going in and marking specific parts of an image to change, with vastly greater accuracy (think of what DreamStudio does, but even better). This could solve the issue of faces and hands— a generation of John, Paul, George, and Ringo comes through, but their faces are wonky and some of their fingers are fused together? Mark the image where needbe, and the model then focuses specifically on those parts, nailing it perfectly. Or maybe it manages to do faces perfectly, but everything else is messed up, so you can mark it telling it to redo the rest of the image but keep the faces the way they are.
  • Contextual transfer, or decon-recon images (deconstruct/reconstruct) where you can input images and break down its parts into a new prompt or basic image to extract things like art style, pose, etc., and then reconstruct a new image with that data. For example, putting the Mona Lisa smile on different people without "Mona Lisa" herself bleeding into the new image
  • Save subjects more easily. What DreamBooth does, but streamlined. The biggest issue I have with Midjourney and DALL-E 2, for example, is that textual inversion is completely impossible with them, and even with Stable Diffusion, it takes a good bit of training for it to understand a new subject, and even then not always perfectly. If CLIP 2.0 or some other diffusion method comes out in 2023 as I expect it to, it should be as easy as uploading a few images, processing them for a minute or two, giving that subject a name, and voila. Which to be fair is how DreamBooth does it as well, but again, I'm expecting it to be more intuitive.

And more to the point, I'm expecting it to be of such good quality that you could use it to create cohesive comics. I've read some comics that were made possible with generative AI, and while they're certainly neat proofs of concept, they leave a lot to be desired.

When you can draw a doodle of a character, upload it to Stable Diffusion 3.5 or Midjourney 7, and then generate more panels with exactly that character with only minor deformation in contextually complicated situations, then we'll definitely be in a new paradigm.

20

Frumpagumpus t1_ixxmnbg wrote

any specific papers or news tidbits that make you so bullish on medical ai?

4

AsuhoChinami t1_ixy2afi wrote

I dunno, but it sounds wonderful and I trust Yuli's opinion, since I've known him for about 10 years and he's pretty grounded and far from a blind optimist.

1