Viewing a single comment thread. View all comments

ReginaldIII t1_iy5w27q wrote

I would argue for the images in the blue car post, that while the cars themselves reached a good fidelity and stopped improving, the backgrounds really improved and grounded the cars in their scenes better.

I think because this is treading into human subjective perception and aesthetic and compositional preferences, this sort of idea can only be tested by a wide scale blind comparative user study.

Similar to how such studies are conducted in lossy compression research.

> It's entirely possible that putting in "close up photo of a plate of food, potatoes, meat stew, green beans, meatballs, indian women dressed in traditional red clothing, a red rug, donald trump, naked people kissing" will amplify some of what you want and cut out some of what's (presumably) a bunch of irrelevant or low-quality SEO spam.

I think the nature of the datasets and language models is always going to mean a specialized negative prompt for where your image is located in the latent space will be needed to tune that image to it's optimum output for whatever composition you are aiming for. It's letting to nudge it around. How much wiggle room that area of the latent manifold has to give for variation will vary greatly.

1