External_Oven_6379

External_Oven_6379 OP t1_itysdcc wrote

thank you for your input. Since I conduct the project by myself, I have no one to bounce back ideas. This is the first time I am getting some input from an experienced audience. I don't know when I made that decision for the architecture exactly, but I remember that I also had openAI's CLIP on the table, but must have come to the conclusion that the mentioned approach could work better.... how wrong I was!

1

External_Oven_6379 OP t1_itpny3q wrote

Thank you for your input!

I checked on the scale of the VGG19 feature embedding. All values are between [0, 9.7]. So in that case, should the values of the onehot vector be either 0 and 9.7?

The labels are textures like floral or leopard. So you are right, they are not necessarily orthogonal, but it's difficult to estimate the correlation among these classes. So one-hot vectors were the most accessible to me.

I have read about CLIP when starting this. My thoughts were that CLIP input consists of images and a text input like an image description, e.g. "Flowers in the middle of a blue floor" (which is not categorical). Could categorical text be used?

4