Viewing a single comment thread. View all comments

CMDRJohnCasey t1_j0z0e6h wrote

I'm not sure about the quality of these datasets. As in: outside of the challenge/shared task, they are worth nothing.

The reason is that to assemble data for the negative (non-sarcasm) set, they usually recur to data that are clearly distinguishable either for style (news vs. non-news) or topic (eg politics related vs. non-politics related).

Some forms of sarcasm can be detected (eg hyperbole), but others are completely indiscernible without knowing the context of the author (if I said "I love Sundays" you need some context of my Sunday to understand if I'm sarcastic or not).

10