Comments

You must log in or register to comment.

deepfates t1_iwor6fj wrote

Substack sometimes throws a pop-up that looks like a paywall, because they want your email, but it has a subtle link saying "Let me read it first".

maybe you hit that?

15

marcus_hk t1_iwp6dtu wrote

A model that takes actions to minimize uncertainty will appear to be curious. Intelligent sampling of the input space is the way to go.

15

dat_cosmo_cat t1_iwpav2u wrote

> [this year], large language models (LLMs) finally got good.

Every year it's like dejavu with this shit since 2018.

43

yldedly t1_iwpdtne wrote

"Wow, this test accuracy is way better!" "Ok, how does it do on OOD data?" "Hmm, not great. Let's train a bigger model."

"Wow, this test accuracy is way better!" "Ok, how does it do on OOD data?" "Hmm, not great. Let's..."

27

Meddhouib10 t1_iwpot62 wrote

Can anyone send me the main papers describing the instructed and action driven models pleaseee ?

−1

Worried_Zombie_2190 t1_iwqezs2 wrote

Is there an available read about document, concern the above..

I will appreciate if someone can help me.

−1

dat_cosmo_cat t1_iwqnbt1 wrote

The ubiquity of pretrained BERT + ResNet models in commercial software applications (and the measurable lift they deliver) is proof that they've been "good enough" for years. Sometimes these articles can come off a bit naive to the impact that the technology has already had or how widely it is used beyond the specific application that is most observable / accessible to the author.

10

---AI--- t1_iwtd2xx wrote

But they are useful. Look at the thousands of real world uses. Look at grammerly, translation, protein folding, and so on. How can you possibly deny it?!

> not fundamentally better

In just the last two years, the models went from scoring 43 on this system of testing to 75. How much more of a fundamental improvement are you after?!

1

dat_cosmo_cat t1_iwteguv wrote

You and I are literally saying the same things. These models have been in prod on every major software platform since BERT.

We don't even need to look at offline eval metrics anymore. If you're an actual MLE / data scientist you likely have the pipelines set up which directly measure the engagement / attributable sales differences and report the real business impact across millions of users each time a new model is released.

I work on a team that has made millions of dollars building applications on top of LLMs since 2018, so when I see the claim "LLMs finally got good this year" it's hard not to laugh. --this is what I am getting at.

Edit*: did you read the article?

5