Submitted by OnlineGrab t3_y3s4ar in MachineLearning
master3243 t1_isharj4 wrote
Impressive, how big is the dataset? Huggingface says n<2k which seems incredibly small.
Also, what is an individual sample point? A gundam image and it's name?
OnlineGrab OP t1_ishhqxx wrote
Thanks! There's 1565 images in the datasaset. The original Pokemon project used an even smaller one (less than 1K images).
Each row is a gundam image + a text description. The original project used BLIP to auto-caption the images but that didn't really work for this dataset so instead I asked BLIP to only describe the colors and inserted them into a generic description: "A robot, humanoid, futuristic, <colors>". One could likely get better results with more fine-grained captions.
Viewing a single comment thread. View all comments