I just released an instruct version of GPT-J using Stanford Alpaca's dataset.The result of this experiment is very cool and confirms that, when fine-tuned on the right data, GPT-J is a very powerful AI model!You can download the model from the HuggingFace hub: https://huggingface.co/nlpcloud/instruct-gpt-j-fp16

Here is an example:

from transformers import pipeline import torch

generator = pipeline(model="nlpcloud/instruct-gpt-j-fp16", torch_dtype=torch.float16, device=0)

prompt = "Correct spelling and grammar from the following text.\nI do not wan to go\n" print(generator(prompt))

More details about this experiment here: https://nlpcloud.com/instruct-version-of-gpt-j-using-stanford-alpaca-dataset.html

I hope it will be useful! Please don't hesitate to share some feedbacks!

Julien

Comments

pitrucha t1_jckiv1q wrote on March 17, 2023 at 2:32 PM

#2,253,546

Any plans to quantize it? I saw that someone managed to do so with 65B LLama and push it from 120 to 30 GB

juliensalinas OP t1_jcktk8o wrote on March 17, 2023 at 3:42 PM

#2,254,190

Replying to pitrucha (#2,253,546)

This is maybe something I'll focus on in the future. But for the moment I find this fp16 version well suited for small budgets as it runs on a 16GB GPU while the native fp32 version of GPT-J requires at least 24GB of VRAM.

Also, with the bitsandbytes integration in HF Transformers you can use the model in 8 bits: https://huggingface.co/blog/hf-bitsandbytes-integration

Necessary_Ad_9800 t1_jckvyg7 wrote on March 17, 2023 at 3:58 PM

#2,254,335

Does it remember previous prompts? And how long outputs can it make?

juliensalinas OP t1_jckwtdj wrote on March 17, 2023 at 4:03 PM

#2,254,385

Replying to Necessary_Ad_9800 (#2,254,335)

No if you want such a model to "remember" previous prompts you will need to add them at the top of each requests you are making.

The output can be up to 2048 tokens. But on a Tesla T4 you might not have enough VRAM so maybe you will be limited to 1024 tokens because the GPU will run out of memory above that.

Necessary_Ad_9800 t1_jckxh1h wrote on March 17, 2023 at 4:07 PM

#2,254,426

Replying to juliensalinas (#2,254,385)

Thanks for the answer. Is 1 letter equal to 1 token?

juliensalinas OP t1_jcky8ok wrote on March 17, 2023 at 4:12 PM

#2,254,462

Replying to Necessary_Ad_9800 (#2,254,426)

You're welcome.

A token is a unique entity that can either be a small word, part of a word, or punctuation.
On average, 1 token is made up of 4 characters, and 100 tokens are roughly equivalent to 75 words.
Natural Language Processing models need to turn your text into tokens in order to process it.

[deleted] t1_jcl92n0 wrote on March 17, 2023 at 5:21 PM

#2,255,062

[removed]

Franck_Dernoncourt t1_jclpll3 wrote on March 17, 2023 at 7:07 PM

#2,255,991

Thanks for sharing! How does it compare against other models (eg, alpaca or gpt 3.5/4)?

juliensalinas OP t1_jclvlim wrote on March 17, 2023 at 7:46 PM

#2,256,342

Replying to Franck_Dernoncourt (#2,255,991)

Clearly it is below Alpaca (based on what I can see from their web demo) and GPT Davinci.

But still this is a very interesting improvement compared to the base GPT-J.

No_Combination_6429 t1_jd20q4w wrote on March 21, 2023 at 7:19 AM

#2,288,459

Could you please provide the source Code for the fine-tuning? Also did you use the LoRa approach?

juliensalinas OP t1_jd2owfz wrote on March 21, 2023 at 12:29 PM

#2,289,885

Replying to No_Combination_6429 (#2,288,459)

Sure. Here's the repo I used for the fine-tuning: https://github.com/kingoflolz/mesh-transformer-jax. I used 5 epochs, and appart from that I kept the default parameters in the repo.

I haven't tried the LoRa approach yet. Do you think it could improve quality?

No_Combination_6429 t1_jd3ioav wrote on March 21, 2023 at 4:04 PM

#2,292,146

Replying to juliensalinas (#2,289,885)

Thanks for sharing! As far as I know LoRa approach increases efficiency, not so sure about quality Wiki. Maybe the paper can help you further.

Franck_Dernoncourt t1_jdpfgpn wrote on March 26, 2023 at 3:41 AM

#2,364,460

Another similar project: https://github.com/databrickslabs/dolly

> This fine-tunes the GPT-J 6B model on the Alpaca dataset using a Databricks notebook. Please note that while GPT-J 6B is Apache 2.0 licensed, the Alpaca dataset is licensed under Creative Commons NonCommercial (CC BY-NC 4.0).

[D] An Instruct Version Of GPT-J Using Stanford Alpaca's Dataset