Submitted by von-hust t3_11jyrfj in MachineLearning
Albino_Jackets t1_jb5cq6x wrote
The duplicates aren't perfect duplicates and are added to create more robust model results. Like how an image of a giraffe rotated 90 degrees is still a giraffe even though the patterns are no longer the same. Same thing applies with the Stallone pic, the noise and errors help the model deal with suboptimal image quality
von-hust OP t1_jb5fjqo wrote
The stallone pic is generated by SD, I'm misunderstanding something. There are false positives, but they shouldn't be "rotated 90 degrees" as you say. The dup's mostly match raw clip feature duplicates.
InterlocutorX t1_jb6iw7y wrote
>The duplicates aren't perfect duplicates and are added to create more robust model results
This is incorrect and anyone who looks at the LAION5b aesthetic set can tell pretty easily. It's got easily viewable identical copies of images.
And the noisy Stallone was an SD image, not an image from the dataset.
[I looked at the images it has for Henry Cavil and 6 out of 24 images are the exact same Witcher promo shot. Which is a quarter of the images it has of Cavil.]
Feel free to look for yourself:
Viewing a single comment thread. View all comments