[R] We found nearly half a billion duplicated images on LAION-2B-en. Submitted by von-hust t3_11jyrfj on March 6, 2023 at 1:20 PM in MachineLearning 36 comments 375
LetterRip t1_jb5bgvj wrote on March 6, 2023 at 3:32 PM Greatly appreciated, you might run it on aesthetic and 5B also. Permalink 15 von-hust OP t1_jb5ef3f wrote on March 6, 2023 at 3:52 PM I would, but I don't have the CLIP features. I'll release some training code so that it's possible for others to train their indices. The method should scale to 5B, even on a single node, you'll just need more RAM. Permalink Parent 7 [deleted] t1_jb5ecvm wrote on March 6, 2023 at 3:52 PM [deleted] Permalink Parent 1
von-hust OP t1_jb5ef3f wrote on March 6, 2023 at 3:52 PM I would, but I don't have the CLIP features. I'll release some training code so that it's possible for others to train their indices. The method should scale to 5B, even on a single node, you'll just need more RAM. Permalink Parent 7
Viewing a single comment thread. View all comments