Submitted by Singularian2501 t3_y4tp4b in MachineLearning

Paper: https://arxiv.org/abs/2205.05131

Github: https://github.com/google-research/google-research/tree/master/ul2

https://ai.googleblog.com/2022/10/ul2-20b-open-source-unified-language.html

Abstract:

>Existing pre-trained models are generally geared towards a particular class of problems. To date, there seems to be still no consensus on what the right architecture and pre-training setup should be. This paper presents a unified framework for pre-training models that are universally effective across datasets and setups. We begin by disentangling architectural archetypes with pre-training objectives -- two concepts that are commonly conflated. Next, we present a generalized and unified perspective for self-supervision in NLP and show how different pre-training objectives can be cast as one another and how interpolating between different objectives can be effective. We then propose Mixture-of-Denoisers (MoD), a pre-training objective that combines diverse pre-training paradigms together. We furthermore introduce a notion of mode switching, wherein downstream fine-tuning is associated with specific pre-training schemes. We conduct extensive ablative experiments to compare multiple pre-training objectives and find that our method pushes the Pareto-frontier by outperforming T5 and/or GPT-like models across multiple diverse setups. Finally, by scaling our model up to 20B parameters, we achieve SOTA performance on 50 well-established supervised NLP tasks ranging from language generation (with automated and human evaluation), language understanding, text classification, question answering, commonsense reasoning, long text reasoning, structured knowledge grounding and information retrieval. Our model also achieve strong results at in-context learning, outperforming 175B GPT-3 on zero-shot SuperGLUE and tripling the performance of T5-XXL on one-shot summarization. Finally, we show that UL2 20B works well with chain-of-thought prompting and reasoning. We release Flax-based T5X model checkpoints for the 20B model at https://github.com/google-research/google-research/tree/master/ul2.

https://preview.redd.it/kwjgesaiuzt91.jpg?width=1145&format=pjpg&auto=webp&s=0a822a7ae0defb6d0f992a7ad86c87d730b9a281

https://preview.redd.it/pafyuzaiuzt91.jpg?width=1142&format=pjpg&auto=webp&s=f26f78cc09a4bf8812a894cda34254a6295ce98f

https://preview.redd.it/5lidpyaiuzt91.jpg?width=1586&format=pjpg&auto=webp&s=9716645207b413861b8ccd0913918a22c18bac6f

https://preview.redd.it/uz4i7saiuzt91.jpg?width=932&format=pjpg&auto=webp&s=242d379e5919bcad0b3f61fab8b1cd8d63a3ec99

https://preview.redd.it/oplo6zaiuzt91.jpg?width=1122&format=pjpg&auto=webp&s=09015953559d794e854acee0069a3df1d4835e27

190

Comments

You must log in or register to comment.

rmsisme t1_isg3nmz wrote

I wish to see the results of an amateur implementation of that

18

hosjiu t1_isi4pyg wrote

the same point of view with u.

1

visarga t1_isij2xr wrote

I'm wondering what is the minimum hardware to run this model, is this really the portable alternative of GPT-3?

10

cwhaley112 t1_ispnv6f wrote

If you mean gpu, then 20B parameters * 2 bytes (assuming fp16) = 40GB VRAM.

4

massimosclaw2 t1_ishdjbw wrote

I wonder how this will perform on out of distribution stuff + remembering obscure references like "Alfred Korzybski" (as GPT-3 does), and what they are related to or if 20B parameters is too small to memorize enough

6

EducationalCicada t1_isjcz0x wrote

Is there a website that keeps track of all the models being released by the major AI labs?

I guess this sub has them all, but looking for a neater presentation.

4

SquareRootsi t1_isjsnk6 wrote

I haven't vetted this yet, but it looks pretty well done from my first glance. It compares multiple models against multiple tasks, so you can hone in on your specific needs.

https://gem-benchmark.com/results

I think huggingface has something similar, but I haven't found all the info in a single page that's easy to compare. You kind of have to bounce around between various model cards, tasks, and metrics pages to find similar info.

2

freezelikeastatue t1_isho5v9 wrote

Somebody must’ve listen to my comment about the originating data being all fucked up.

1