Github: https://github.com/facebookresearch/NPM

Abstract:

>Existing language models (LMs) predict tokens with a softmax over a finite vocabulary, which can make it difficult to predict rare tokens or phrases. We introduce NPM, the first nonparametric masked language model that replaces this softmax with a nonparametric distribution over every phrase in a reference corpus. We show that NPM can be efficiently trained with a contrastive objective and an in-batch approximation to full corpus retrieval. Zero-shot evaluation on 9 closed-set tasks and 7 open-set tasks demonstrates that NPM outperforms significantly larger parametric models, either with or without a retrieve-and-generate approach. It is particularly better on dealing with rare patterns (word senses or facts), and predicting rare or nearly unseen words (e.g., non-Latin script).

https://preview.redd.it/qf2lqrkku47a1.jpg?width=658&format=pjpg&auto=webp&s=7dc7e76f3075b4b4f0916c2de1e442b19b2c0f49

https://preview.redd.it/gqhlbykku47a1.jpg?width=1241&format=pjpg&auto=webp&s=39f63470d18ea6f4a8ed560b371cc46b939b2c6f

https://preview.redd.it/p7bzdukku47a1.jpg?width=883&format=pjpg&auto=webp&s=6a8eb2b66abcb1581abf7280180c1c0e86201232

https://preview.redd.it/z6niwykku47a1.jpg?width=1112&format=pjpg&auto=webp&s=8337a4802db983df1a4b0b11934c0708888641a4

https://preview.redd.it/s8fdhxkku47a1.jpg?width=1361&format=pjpg&auto=webp&s=28b307df857ef2262d3f8348fd1094ebb793a63d

https://preview.redd.it/94t5fwkku47a1.jpg?width=1362&format=pjpg&auto=webp&s=da8bca8fd08ecaf956658c674f5a32a930cdd3a2

Comments

You must log in or register to comment.

Dankmemexplorer t1_j123o1b wrote on December 21, 2022 at 2:37 AM

time to train gpt-4 on my mom's laptop

farmingvillein t1_j12s4mn wrote on December 21, 2022 at 6:22 AM

unfortunately still really slow (for now) to run, however:

> the speed of NPM is still on par with the speed of significantly larger parametric models that NPM outperforms

Dankmemexplorer t1_j13k11f wrote on December 21, 2022 at 12:23 PM

aint that just the way

yaosio t1_j15h0xa wrote on December 21, 2022 at 8:27 PM

They also say there's room for improvement but they didn't explore that in this paper. Just think, one day we'll have the power of ~~the sun~~ GPT-3 in the palm of our hand. Could be really soon, could be far away, but it's coming.

ItsTheUltimateBob t1_j16z2v4 wrote on December 22, 2022 at 3:00 AM

Hopefully, they'll be beyond GPT-3.

red75prime t1_j1899a0 wrote on December 22, 2022 at 11:49 AM

GPT-3: Sure, I can tell you power output of the sun. It would be 3.8 x 1026 W or 3.234 kW. I'm glad to help.

[deleted] t1_j15g25f wrote on December 21, 2022 at 8:20 PM

[deleted]

rjromero t1_j12aza8 wrote on December 21, 2022 at 3:36 AM

> We use the model architecture and initial weights of RoBERTa large (Liu et al., 2019), consisting of 354M parameters. Training is done for 100,000 steps, using thirty-two 32GB GPUs.

354M parameters? At FP32 that's 1.41gb. It's tiny.

ItsTheUltimateBob t1_j12goke wrote on December 21, 2022 at 4:25 AM

That's a puny number of GPUs, too.

vwings t1_j13pguc wrote on December 21, 2022 at 1:16 PM

It was expected, right? A retrieval system should be much more efficient than storing phrases in neural net weights as GPT does...

CatalyzeX_code_bot t1_j11aphv wrote on December 20, 2022 at 11:01 PM

Found relevant code at https://github.com/facebookresearch/NPM + all code implementations here

To opt out from receiving code links, DM me

Singularian2501 OP t1_j11bgj5 wrote on December 20, 2022 at 11:06 PM

~~The github link is broken. That was also the reason I didn´t include it in the post. The paper is not from me! Also searched on paperswithcode but they also dont have a working link.~~

Edit the link is working now: https://github.com/facebookresearch/NPM !

Abiacere t1_j12o7f6 wrote on December 21, 2022 at 5:38 AM

Has anyone actually found the code?

[deleted] t1_j15cxh2 wrote on December 21, 2022 at 8:00 PM

[deleted]

Taenk t1_j16lvo8 wrote on December 22, 2022 at 1:17 AM

Anyone got a demo running?

[deleted] t1_j1513ye wrote on December 21, 2022 at 6:42 PM

[removed]

Purplekeyboard t1_j12lik7 wrote on December 21, 2022 at 5:11 AM

Ok, but how does it compare in the real world to GPT-3?

master3243 t1_j12nmgc wrote on December 21, 2022 at 5:32 AM

There's no way for a paper to just have a table of "real world comparison of GPT-3",

There needs to (for now) be some benchmark created that systematically tests for the things we care about. Which is exactly why I deeply respect researchers dedicated on creating better and more useful benchmarks as their work immensely accelerates the field while they mostly don't get the attention they (IMO) deserve.

Purplekeyboard t1_j12uk1s wrote on December 21, 2022 at 6:50 AM

But what I'm asking is, how do the benchmarks match real world performance? Because I've seen claims that other language models were supposedly close to or equal to GPT-3 in this or that benchmark, but try interacting with them and the difference is striking. It's like the difference between talking to a college grad student and talking to the meth addled homeless guy who shouts at lampposts.

valdanylchuk t1_j137hla wrote on December 21, 2022 at 9:43 AM

From the paper:

>Extension for generation. It is currently non-trivial to use NPM for generation, since it is the encoder-only model. Future work can explore autoregressive generation as done in Patel et al. (2022) or use NPM for editing (Schick et al., 2022; Gaoet al., 2022).

So, don't expect to talk to it just yet.

yaosio t1_j17p2bx wrote on December 22, 2022 at 7:17 AM

There was a thread awhile back about one benchmark being filled with spelling errors, grammar errors, and wrong answers. In many cases there were multiple correct answers but one was picked as the correct answer for no particular reason. Creating a benchmark for the subjective task of "is this text good?" seems to be pretty hard. It's even harder when the people creating the benchmark have a poor grasp of language.

If I were to ask a language model "Describe an apple." There are many correct answers, none more correct than the others. Multiple independent humans would have to go over the answers and make subjective decisions on if the LLM answerded well. This becomes much more difficult with better LLMs because the prompts and answers have to become more complex, which makes reviewing the answers harder and more time consuming.

Maximum t1_j136j97 wrote on December 21, 2022 at 9:29 AM

How about BIG-bench?

blose1 t1_j12voe0 wrote on December 21, 2022 at 7:04 AM

GPT-3 is like yesterday news, SOTA is chatGPT and it does circles around real world GPT-3 on every possible task.

RealGrande t1_j12zfl6 wrote on December 21, 2022 at 7:52 AM

ChatGPT is a fine-tuned version of gpt3 (well, gpt3.5 but pretty much the same barring some improvements)

blose1 t1_j14q7ul wrote on December 21, 2022 at 5:33 PM

Have you actually tried both on same tasks? I mean it seems like a lot of people here read a paper and some blog and make their conclusion without even using the tool, I've used both on the same tasks, compared on hundreds of real world cases and yes it's fine-tuned GPT3 but with human assisted RL and it's doing circles around GPT-3 in question answering, COT and code generation.

oathbreakerkeeper t1_j15vgip wrote on December 21, 2022 at 10:03 PM

What's COT?

[deleted] t1_j15vrid wrote on December 21, 2022 at 10:05 PM

[removed]

Think_Olive_1000 t1_j1cvpnz wrote on December 23, 2022 at 10:37 AM

Chain of thought

ShowerVagina t1_j13gxva wrote on December 21, 2022 at 11:49 AM

GPT-3 is still the best for general use. Or for story writing. Novel AI is good, but still not as good as GPT-3.

blose1 t1_j14qfir wrote on December 21, 2022 at 5:34 PM

Have you compared both yourself on question answering, COT and code generation ?

mtocrat t1_j14di0f wrote on December 21, 2022 at 4:11 PM

How is that relevant?

machinelearner77 t1_j1437x9 wrote on December 21, 2022 at 3:03 PM

Looks like cool stuff... but if you put a code link in the abstract and publish your paper, it should be a functioning link...

Fit-Presence-8040 t1_j154l7z wrote on December 21, 2022 at 7:05 PM

The github link is working now. Just got public.

[deleted] t1_j15c4qw wrote on December 21, 2022 at 7:54 PM

[deleted]

PengsoonThePenguin t1_j16rrtg wrote on December 22, 2022 at 2:03 AM

I guess an easy explanation is that the model works solely from retrieval over the corpus. Every prediction has to be explained by the corpus.

drd13 t1_j1h3gvy wrote on December 24, 2022 at 8:00 AM

Similarly to T5 (abd Bert) the model is pre-trained by predicting some randomly masked spans of words. However the way these spans of words are predicted is different.

In T5, masked words are generated one-by-one autoregressively (i.e. use a softmax over vocabulary to generate words one by one). Here a set of candidate possible spans, covering your whole trained corpus is preliminarily created and the model looks at all the candidate spans and chooses the one it thinks is the best (using a contrastive loss).

gbfar t1_j16478a wrote on December 21, 2022 at 11:03 PM

I see lots of potential applications for this. I wonder if we could reliably control text generation by tweaking the reference corpus.

[deleted] t1_j12qhwp wrote on December 21, 2022 at 6:03 AM

[deleted]