Submitted by juliensalinas t3_11tqryd in MachineLearning

I just released an instruct version of GPT-J using Stanford Alpaca's dataset.The result of this experiment is very cool and confirms that, when fine-tuned on the right data, GPT-J is a very powerful AI model!You can download the model from the HuggingFace hub: https://huggingface.co/nlpcloud/instruct-gpt-j-fp16

Here is an example:

from transformers import pipeline import torch

generator = pipeline(model="nlpcloud/instruct-gpt-j-fp16", torch_dtype=torch.float16, device=0)

prompt = "Correct spelling and grammar from the following text.\nI do not wan to go\n" print(generator(prompt))

More details about this experiment here: https://nlpcloud.com/instruct-version-of-gpt-j-using-stanford-alpaca-dataset.html

I hope it will be useful! Please don't hesitate to share some feedbacks!

Julien

143

Comments

You must log in or register to comment.

pitrucha t1_jckiv1q wrote

Any plans to quantize it? I saw that someone managed to do so with 65B LLama and push it from 120 to 30 GB

24

juliensalinas OP t1_jcktk8o wrote

This is maybe something I'll focus on in the future. But for the moment I find this fp16 version well suited for small budgets as it runs on a 16GB GPU while the native fp32 version of GPT-J requires at least 24GB of VRAM.

Also, with the bitsandbytes integration in HF Transformers you can use the model in 8 bits: https://huggingface.co/blog/hf-bitsandbytes-integration

12

Franck_Dernoncourt t1_jclpll3 wrote

Thanks for sharing! How does it compare against other models (eg, alpaca or gpt 3.5/4)?

9

juliensalinas OP t1_jclvlim wrote

Clearly it is below Alpaca (based on what I can see from their web demo) and GPT Davinci.

But still this is a very interesting improvement compared to the base GPT-J.

10

Necessary_Ad_9800 t1_jckvyg7 wrote

Does it remember previous prompts? And how long outputs can it make?

1

juliensalinas OP t1_jckwtdj wrote

No if you want such a model to "remember" previous prompts you will need to add them at the top of each requests you are making.

The output can be up to 2048 tokens. But on a Tesla T4 you might not have enough VRAM so maybe you will be limited to 1024 tokens because the GPU will run out of memory above that.

9

Necessary_Ad_9800 t1_jckxh1h wrote

Thanks for the answer. Is 1 letter equal to 1 token?

1

juliensalinas OP t1_jcky8ok wrote

You're welcome.

A token is a unique entity that can either be a small word, part of a word, or punctuation.
On average, 1 token is made up of 4 characters, and 100 tokens are roughly equivalent to 75 words.
Natural Language Processing models need to turn your text into tokens in order to process it.

9

No_Combination_6429 t1_jd20q4w wrote

Could you please provide the source Code for the fine-tuning? Also did you use the LoRa approach?

1

juliensalinas OP t1_jd2owfz wrote

Sure. Here's the repo I used for the fine-tuning: https://github.com/kingoflolz/mesh-transformer-jax. I used 5 epochs, and appart from that I kept the default parameters in the repo.

I haven't tried the LoRa approach yet. Do you think it could improve quality?

1