SeymourBits t1_jdlwrgi wrote

This is the most accurate comment I've come across. The entire system is only as good and granular as the CLIP text description that's passed into GPT-4 which then has to "imagine" the described image, often with varying degrees of hallucinations. I've used it and can confirm it is currently not possible to operate anything close to a GUI with the current approach.


SeymourBits t1_jdlkln7 wrote

I second this. I was able to extract fairly useful results from Neo but it took a huge amount of prompt trial and error, eventually getting decent/stable results but not in the same ballpark as GPT3+. The dolly training results here seem good, if not expected. I'm now ready to move to a superior model like LLaMA/Alpaca though. What are you running?