Viewing a single comment thread. View all comments

Nir_Kap t1_iu5xfm5 wrote

Very interesting, one of the best works I've seen in a while. I have a question, how do you explain the low performance of the fine-tuned models?

1

YonatanBitton OP t1_iua6jz7 wrote

Thank you :) The random chance with 10-12 candidates is pretty low - 17%-24%, so fine-tuned model performance of 55% is much above random chance. However, we still see that humans perform much better. A possible explaination for this gap is that the datasets is challenging, containing complex social and caltural cues, that challenges the current models who didn't train on similar tasks. We explored this direction on the last section (Table 6) where there are easier classes like "visually salient" (which is more similar to the pre-training task of the model) with performance of 67%, and more difficult ones (different from the pre-training) like "visually non-salient" with 36%.

2