Comments

You must log in or register to comment.

vzq t1_j0y5mye wrote

A sarcasm detector. Boy, that’s useful.

127

Business-Ad6451 OP t1_j0y5pmo wrote

I detect sarcasm.

42

2blazen t1_j0ym9sj wrote

I think he's just a tiny bit skeptical considering how that's like the biggest challenge of NLP. Probably thousands of people tried it already, but even GPT3 doesn't seem to ace sarcasm yet

11

lno666 t1_j0yrabb wrote

Oh I am sure it’s a trivial task to solve…

19

truffleblunts t1_j0ysg3c wrote

Considering the average reddit user routinely fails to detect sarcasm I think the bots are quite a ways away

12

RageOnGoneDo t1_j0z4z9g wrote

Eh, disagree. It's called "artificial intelligence", not "artificial dunning kruger effect"

4

2blazen t1_j0zq28h wrote

To be fair, ChatGPT very confidently bullshits about everything, even about 2+2 being equal to 3. But I agree, AI being able to detect sarcasm shouldn't be far away, however, it definitely won't be solved by BERT

2

truffleblunts t1_j0yw0z2 wrote

That's amusing because his comment is written in a way that is often sarcastic, but the comment is literally true: it absolutely would be useful to have such a bot.

So, was he really being sarcastic? Good luck to you and your bot :P

6

Just_CurioussSss t1_j0ziigv wrote

I mean. It could be if it always has context on every domain area.

Better semantic search can help solve this problem as it allows us to augment that project with an external knowledge base. At Marqo (the startup I work for), we created a demo where GPT provides up-to-date news summarisation through the use of Marqo as a knowledge base:

https://medium.com/creator-fund/building-search-engines-that-think-like-humans-e019e6fb6389

This could be applied to op's project. You can visit Marqo: https://github.com/marqo-ai/marqo

1

jayqd3 t1_j0yx85f wrote

Hello.

Sarcasm is algorithmically challenging. It is an antithetic form of human expression. You have to take into account the phenomenon of linguistic ellipsis, which means that words, phrases and clauses are understood via world knowledge and pragmatics. As you have probably researched, typical ML implementations produce average results. Before going into the specs of the embeddings, I believe you have to check your dataset. There is a difference between a headlines dataset produced from publishers and other forms of short text like tweets that are user-generated content. You have to think how intented sarcasm, perceived sarcasm, irony, hashtags, emoticons and other written linguistic expressions present in the domain of sentiment analysis, shape the problem. It is very interesting to see how a LLM performs on this task. I hope you make progress.

15

the__itis t1_j0zcafk wrote

If you ever need a data set, I’ll happily donate my mother.

14

Fenzik t1_j0ytx7v wrote

If you’re worried about the dimensionality of the embeddings, why not do some dimensional reduction on them?

5

Business-Ad6451 OP t1_j0ywfde wrote

Because if I use SVD, the matrix I would get would have a rank ≤ min{m, n}, assuming I had a m×n embedding matrix. But I want to reduce the 26000×768 matrix to 10000×768, which can't be done using SVD.

2

jobeta t1_j0yw8sh wrote

New to this: Are there some labelled datasets for sarcasm?

3

Business-Ad6451 OP t1_j0ywigc wrote

Yes. Available on Kaggle.

3

CMDRJohnCasey t1_j0z0e6h wrote

I'm not sure about the quality of these datasets. As in: outside of the challenge/shared task, they are worth nothing.

The reason is that to assemble data for the negative (non-sarcasm) set, they usually recur to data that are clearly distinguishable either for style (news vs. non-news) or topic (eg politics related vs. non-politics related).

Some forms of sarcasm can be detected (eg hyperbole), but others are completely indiscernible without knowing the context of the author (if I said "I love Sundays" you need some context of my Sunday to understand if I'm sarcastic or not).

10

alcibiades27 t1_j0zg9an wrote

Maybe 85% of detecting sarcasm involves knowing the speaker's actual opinion on the topic so the listener can assess the low probability of earnesty.

The other 15% is present in the overdirect phrasing of the counterintuitive opinion.

So, where a low probability of a speaker's earnesty is present: if they employ very clear verbiage, and especially emphatic punctuation or enthusiastic modifiers, it is more likely sarcasm than some other alternative (e.g., changing opinion or acknowledging nuance).

Clarity provides a maybe 60% chance where counterintuitive enthusiasm increases to near certainty.

A vegan about a steak:

I. "Doesn't that look delicious!"

V.

Ii. "That actually kind of looks delicious...."

Both are counterintuitive statements of the speaker, but the emphasis and certainty of statement i. versus the uncertainty present in option ii. makes it clear which is more likely sarcastic.

Great question, by the way!

3

astrange t1_j0z2ea3 wrote

This seems like an extremely difficult problem. Humans generally fail to recognize sarcastic journalism all the time; I expect only the original authors could tell for some of it.

(For instance, famous alleged-fraudster SBF has a lot of articles in places like the NYT which most readers think are "good press" for him, but I'm fairly sure are actually the journalists lowkey making fun of him.)

2

Oheligud t1_j0zb9x0 wrote

Yeah, sure you made one.

2

TimeQuestions t1_j0zrnqz wrote

The issue isn’t the sarcasm- the issue is in the definition of sarcasm the parameters - an actual person is putting perimeters on the sarcasm- Reddit , Facebook…. Any of these platforms intent on machine learning need to understand- are they pursuing mimicking machine growth - or Human growth - and if it’s human then which side am I or anyone else pushing it- if machine - than what category? Be honest- is what I tell myself

1

TimeQuestions t1_j0zscbo wrote

A good way to begin is to ask algorithmically ask a series of specific questions- for example - maybe end user or bot user - intent - query- … command : after picking up key word grouping pairs like “well that” auto bot answers end user auto send clarify “clarify meaning “

1

Nether_Portals t1_j0zwg4d wrote

Professor Frink professor Frink, he'll make you laugh he'll make you think...

bUt I dOn'T tHiNk He CaN dEtEcT sArCaSiM

UnLeSs ThEy UsE /s

1

ddofer t1_j1375g8 wrote

  1. There's some nice big datasets, I cleaned an existing one from reddit for use in fact!

https://www.kaggle.com/datasets/danofer/sarcasm

  1. Regarding this being a challenging task: It's not as hard as you'd think, there's a much harder related problem though, in humor - how sarcastic, or funny something is - that's much harder! LLMs do very badly at it.

We presented a paper about this, and predicting winning jokes in games of Cards Against Humanity at EMNLP :)

"Cards Against AI: Predicting Humor in a Fill-in-the-blank Party Game"

https://arxiv.org/abs/2210.13016

https://github.com/ddofer/CAH

1