Submitted by Marcus_111 t3_11xjr72 in singularity
Zealousideal_Ad3783 t1_jd3g3sr wrote
Unfortunately when you ask it to adjust the images, it just does a whole new prompt (see the UMich example they showed). I’m looking forward to when we have a chatbot that can keep images exactly the same and just add the element you want.
CleanThroughMyJorts t1_jd3t4lq wrote
Instruct Pix2Pix demoed something along these lines. Don't think it's public yet, but at least we know it's doable right now
mudman13 t1_jd5cixi wrote
That's up and running and is quite cool but controlnet has more variability.
MysteryInc152 t1_jd3v3kp wrote
There are foundation models that do these kinds of things. You can connect them to a language model to get the kind of effect you're thinking about.
Marcus_111 OP t1_jd3ggzz wrote
It's just a tweak away for bing. The underlying dalle 2 can already do this.
Zealousideal_Ad3783 t1_jd3n2sb wrote
I don’t think that’s true. What I’m talking about is having a whole back-and-forth conversation, in natural language, with Bing that allows you to perfect an image to your liking. The same way you’d be able to do it with a human graphic designer.
ThoughtSafe9928 t1_jd3pon2 wrote
Yeah that’s a feature that GPT-4 is capable of.
Zealousideal_Ad3783 t1_jd3pqxj wrote
Uh, no, it only outputs text
ThoughtSafe9928 t1_jd3pv7n wrote
I mean you can either use what’s publicly available and decide that’s what it’s capable of or watch the developer showcase and see what the model itself is capable of.
Zealousideal_Ad3783 t1_jd3q0rs wrote
GPT-4 only outputs text in both the publicly available version and the demo version
ThoughtSafe9928 t1_jd3q55v wrote
It appears humans like me are also prone to hallucinations - I genuinely think I must have dreamed that capability immediately after the dev showcase.
😹😹😹
DaffyDuck t1_jd5dune wrote
It will be able to take a photo as input and generate text based on the image.
ThoughtSafe9928 t1_jdeueib wrote
According to this article that was just released yesterday, the unrestricted model of GPT-4 can produce images.
https://arxiv.org/abs/2303.12712
On page 16, "the model appears to have a genuine ability for visual tasks, rather than just copying code from similar examples in the training data. The evidence below strongly supports this claim, and demonstrates that the model can handle visual concepts, despite its text-only training"
I'm still not sure whether my initial assumption was from information I gleaned somewhere or because I hallucinated it. Regardless, GPT-4 can indeed output images.
Viewing a single comment thread. View all comments