Albino_Jackets t1_jb5cq6x wrote on March 6, 2023 at 3:41 PM

The duplicates aren't perfect duplicates and are added to create more robust model results. Like how an image of a giraffe rotated 90 degrees is still a giraffe even though the patterns are no longer the same. Same thing applies with the Stallone pic, the noise and errors help the model deal with suboptimal image quality

von-hust OP t1_jb5fjqo wrote on March 6, 2023 at 4:00 PM

The stallone pic is generated by SD, I'm misunderstanding something. There are false positives, but they shouldn't be "rotated 90 degrees" as you say. The dup's mostly match raw clip feature duplicates.

InterlocutorX t1_jb6iw7y wrote on March 6, 2023 at 8:25 PM

>The duplicates aren't perfect duplicates and are added to create more robust model results

This is incorrect and anyone who looks at the LAION5b aesthetic set can tell pretty easily. It's got easily viewable identical copies of images.

https://imgur.com/a/Mg2xZcT

And the noisy Stallone was an SD image, not an image from the dataset.

[I looked at the images it has for Henry Cavil and 6 out of 24 images are the exact same Witcher promo shot. Which is a quarter of the images it has of Cavil.]

Feel free to look for yourself:

https://laion-aesthetic.datasette.io/laion-aesthetic-6pls/