vzq t1_j0y5mye wrote on December 20, 2022 at 7:31 AM

A sarcasm detector. Boy, that’s useful.

Business-Ad6451 OP t1_j0y5pmo wrote on December 20, 2022 at 7:32 AM

I detect sarcasm.

PropOnTop t1_j0ylemy wrote on December 20, 2022 at 11:11 AM

Oh, really?

2blazen t1_j0ym9sj wrote on December 20, 2022 at 11:22 AM

I think he's just a tiny bit skeptical considering how that's like the biggest challenge of NLP. Probably thousands of people tried it already, but even GPT3 doesn't seem to ace sarcasm yet

lno666 t1_j0yrabb wrote on December 20, 2022 at 12:21 PM

Oh I am sure it’s a trivial task to solve…

truffleblunts t1_j0ysg3c wrote on December 20, 2022 at 12:34 PM

Considering the average reddit user routinely fails to detect sarcasm I think the bots are quite a ways away

RageOnGoneDo t1_j0z4z9g wrote on December 20, 2022 at 2:26 PM

Eh, disagree. It's called "artificial intelligence", not "artificial dunning kruger effect"

2blazen t1_j0zq28h wrote on December 20, 2022 at 4:52 PM

To be fair, ChatGPT very confidently bullshits about everything, even about 2+2 being equal to 3. But I agree, AI being able to detect sarcasm shouldn't be far away, however, it definitely won't be solved by BERT

truffleblunts t1_j0yw0z2 wrote on December 20, 2022 at 1:10 PM

That's amusing because his comment is written in a way that is often sarcastic, but the comment is literally true: it absolutely would be useful to have such a bot.

So, was he really being sarcastic? Good luck to you and your bot :P

Cactus_TheThird t1_j0yrbpy wrote on December 20, 2022 at 12:22 PM

Well aren't you a smart little bot.

grudev t1_j0z85sy wrote on December 20, 2022 at 2:50 PM

200% precision!

-gh0stRush- t1_j0zf5rz wrote on December 20, 2022 at 3:40 PM

https://www.youtube.com/watch?v=FRK8uPlXOYo

Just_CurioussSss t1_j0ziigv wrote on December 20, 2022 at 4:02 PM

I mean. It could be if it always has context on every domain area.

Better semantic search can help solve this problem as it allows us to augment that project with an external knowledge base. At Marqo (the startup I work for), we created a demo where GPT provides up-to-date news summarisation through the use of Marqo as a knowledge base:

https://medium.com/creator-fund/building-search-engines-that-think-like-humans-e019e6fb6389

This could be applied to op's project. You can visit Marqo: https://github.com/marqo-ai/marqo

jayqd3 t1_j0yx85f wrote on December 20, 2022 at 1:21 PM

Hello.

Sarcasm is algorithmically challenging. It is an antithetic form of human expression. You have to take into account the phenomenon of linguistic ellipsis, which means that words, phrases and clauses are understood via world knowledge and pragmatics. As you have probably researched, typical ML implementations produce average results. Before going into the specs of the embeddings, I believe you have to check your dataset. There is a difference between a headlines dataset produced from publishers and other forms of short text like tweets that are user-generated content. You have to think how intented sarcasm, perceived sarcasm, irony, hashtags, emoticons and other written linguistic expressions present in the domain of sentiment analysis, shape the problem. It is very interesting to see how a LLM performs on this task. I hope you make progress.

the__itis t1_j0zcafk wrote on December 20, 2022 at 3:20 PM

If you ever need a data set, I’ll happily donate my mother.

Fenzik t1_j0ytx7v wrote on December 20, 2022 at 12:49 PM

If you’re worried about the dimensionality of the embeddings, why not do some dimensional reduction on them?

Business-Ad6451 OP t1_j0ywfde wrote on December 20, 2022 at 1:14 PM

Because if I use SVD, the matrix I would get would have a rank ≤ min{m, n}, assuming I had a m×n embedding matrix. But I want to reduce the 26000×768 matrix to 10000×768, which can't be done using SVD.

jobeta t1_j0yw8sh wrote on December 20, 2022 at 1:12 PM

New to this: Are there some labelled datasets for sarcasm?

Business-Ad6451 OP t1_j0ywigc wrote on December 20, 2022 at 1:14 PM

Yes. Available on Kaggle.

CMDRJohnCasey t1_j0z0e6h wrote on December 20, 2022 at 1:49 PM

I'm not sure about the quality of these datasets. As in: outside of the challenge/shared task, they are worth nothing.

The reason is that to assemble data for the negative (non-sarcasm) set, they usually recur to data that are clearly distinguishable either for style (news vs. non-news) or topic (eg politics related vs. non-politics related).

Some forms of sarcasm can be detected (eg hyperbole), but others are completely indiscernible without knowing the context of the author (if I said "I love Sundays" you need some context of my Sunday to understand if I'm sarcastic or not).

AnyString3053 t1_j0yx2s6 wrote on December 20, 2022 at 1:20 PM

Can you share the link please

Business-Ad6451 OP t1_j0z03e7 wrote on December 20, 2022 at 1:46 PM

https://www.kaggle.com/datasets/rmisra/news-headlines-dataset-for-sarcasm-detection

RageOnGoneDo t1_j0z51t9 wrote on December 20, 2022 at 2:26 PM

google.com/

alcibiades27 t1_j0zg9an wrote on December 20, 2022 at 3:47 PM

Maybe 85% of detecting sarcasm involves knowing the speaker's actual opinion on the topic so the listener can assess the low probability of earnesty.

The other 15% is present in the overdirect phrasing of the counterintuitive opinion.

So, where a low probability of a speaker's earnesty is present: if they employ very clear verbiage, and especially emphatic punctuation or enthusiastic modifiers, it is more likely sarcasm than some other alternative (e.g., changing opinion or acknowledging nuance).

Clarity provides a maybe 60% chance where counterintuitive enthusiasm increases to near certainty.

A vegan about a steak:

I. "Doesn't that look delicious!"

V.

Ii. "That actually kind of looks delicious...."

Both are counterintuitive statements of the speaker, but the emphasis and certainty of statement i. versus the uncertainty present in option ii. makes it clear which is more likely sarcastic.

Great question, by the way!

astrange t1_j0z2ea3 wrote on December 20, 2022 at 2:05 PM

This seems like an extremely difficult problem. Humans generally fail to recognize sarcastic journalism all the time; I expect only the original authors could tell for some of it.

(For instance, famous alleged-fraudster SBF has a lot of articles in places like the NYT which most readers think are "good press" for him, but I'm fairly sure are actually the journalists lowkey making fun of him.)

Oheligud t1_j0zb9x0 wrote on December 20, 2022 at 3:13 PM

Yeah, sure you made one.

I_will_delete_myself t1_j0zjt4c wrote on December 20, 2022 at 4:11 PM

It’s not gonna be simple text classifier.

TimeQuestions t1_j0zrnqz wrote on December 20, 2022 at 5:02 PM

The issue isn’t the sarcasm- the issue is in the definition of sarcasm the parameters - an actual person is putting perimeters on the sarcasm- Reddit , Facebook…. Any of these platforms intent on machine learning need to understand- are they pursuing mimicking machine growth - or Human growth - and if it’s human then which side am I or anyone else pushing it- if machine - than what category? Be honest- is what I tell myself

TimeQuestions t1_j0zscbo wrote on December 20, 2022 at 5:07 PM

A good way to begin is to ask algorithmically ask a series of specific questions- for example - maybe end user or bot user - intent - query- … command : after picking up key word grouping pairs like “well that” auto bot answers end user auto send clarify “clarify meaning “

Fueled_by_sugar t1_j0zuldx wrote on December 20, 2022 at 5:21 PM

Are you Sheldon Cooper?

Business-Ad6451 OP t1_j0zv6tx wrote on December 20, 2022 at 5:25 PM

Nah. Physics was never my strong suit.

Nether_Portals t1_j0zwg4d wrote on December 20, 2022 at 5:33 PM

Professor Frink professor Frink, he'll make you laugh he'll make you think...

bUt I dOn'T tHiNk He CaN dEtEcT sArCaSiM

UnLeSs ThEy UsE /s

ddofer t1_j1375g8 wrote on December 21, 2022 at 9:38 AM

There's some nice big datasets, I cleaned an existing one from reddit for use in fact!

https://www.kaggle.com/datasets/danofer/sarcasm

Regarding this being a challenging task: It's not as hard as you'd think, there's a much harder related problem though, in humor - how sarcastic, or funny something is - that's much harder! LLMs do very badly at it.

We presented a paper about this, and predicting winning jokes in games of Cards Against Humanity at EMNLP :)

"Cards Against AI: Predicting Humor in a Fill-in-the-blank Party Game"

https://arxiv.org/abs/2210.13016

https://github.com/ddofer/CAH

Sarcasm Detection model [R].

Comments