Submitted by juliensalinas t3_11tqryd in MachineLearning

I just released an instruct version of GPT-J using Stanford Alpaca's dataset.The result of this experiment is very cool and confirms that, when fine-tuned on the right data, GPT-J is a very powerful AI model!You can download the model from the HuggingFace hub: https://huggingface.co/nlpcloud/instruct-gpt-j-fp16

Here is an example:

from transformers import pipeline import torch

generator = pipeline(model="nlpcloud/instruct-gpt-j-fp16", torch_dtype=torch.float16, device=0)

prompt = "Correct spelling and grammar from the following text.\nI do not wan to go\n" print(generator(prompt))

More details about this experiment here: https://nlpcloud.com/instruct-version-of-gpt-j-using-stanford-alpaca-dataset.html

I hope it will be useful! Please don't hesitate to share some feedbacks!

Julien

143

Comments

You must log in or register to comment.

pitrucha t1_jckiv1q wrote

Any plans to quantize it? I saw that someone managed to do so with 65B LLama and push it from 120 to 30 GB

24

Necessary_Ad_9800 t1_jckvyg7 wrote

Does it remember previous prompts? And how long outputs can it make?

1

juliensalinas OP t1_jckwtdj wrote

No if you want such a model to "remember" previous prompts you will need to add them at the top of each requests you are making.

The output can be up to 2048 tokens. But on a Tesla T4 you might not have enough VRAM so maybe you will be limited to 1024 tokens because the GPU will run out of memory above that.

9

juliensalinas OP t1_jcky8ok wrote

You're welcome.

A token is a unique entity that can either be a small word, part of a word, or punctuation.
On average, 1 token is made up of 4 characters, and 100 tokens are roughly equivalent to 75 words.
Natural Language Processing models need to turn your text into tokens in order to process it.

9

Franck_Dernoncourt t1_jclpll3 wrote

Thanks for sharing! How does it compare against other models (eg, alpaca or gpt 3.5/4)?

9

No_Combination_6429 t1_jd20q4w wrote

Could you please provide the source Code for the fine-tuning? Also did you use the LoRa approach?

1