juliensalinas
juliensalinas OP t1_jd2owfz wrote
Reply to comment by No_Combination_6429 in [D] An Instruct Version Of GPT-J Using Stanford Alpaca's Dataset by juliensalinas
Sure. Here's the repo I used for the fine-tuning: https://github.com/kingoflolz/mesh-transformer-jax. I used 5 epochs, and appart from that I kept the default parameters in the repo.
I haven't tried the LoRa approach yet. Do you think it could improve quality?
juliensalinas OP t1_jclvlim wrote
Reply to comment by Franck_Dernoncourt in [D] An Instruct Version Of GPT-J Using Stanford Alpaca's Dataset by juliensalinas
Clearly it is below Alpaca (based on what I can see from their web demo) and GPT Davinci.
But still this is a very interesting improvement compared to the base GPT-J.
juliensalinas OP t1_jcky8ok wrote
Reply to comment by Necessary_Ad_9800 in [D] An Instruct Version Of GPT-J Using Stanford Alpaca's Dataset by juliensalinas
You're welcome.
A token is a unique entity that can either be a small word, part of a word, or punctuation.
On average, 1 token is made up of 4 characters, and 100 tokens are roughly equivalent to 75 words.
Natural Language Processing models need to turn your text into tokens in order to process it.
juliensalinas OP t1_jckwtdj wrote
Reply to comment by Necessary_Ad_9800 in [D] An Instruct Version Of GPT-J Using Stanford Alpaca's Dataset by juliensalinas
No if you want such a model to "remember" previous prompts you will need to add them at the top of each requests you are making.
The output can be up to 2048 tokens. But on a Tesla T4 you might not have enough VRAM so maybe you will be limited to 1024 tokens because the GPU will run out of memory above that.
juliensalinas OP t1_jcktk8o wrote
Reply to comment by pitrucha in [D] An Instruct Version Of GPT-J Using Stanford Alpaca's Dataset by juliensalinas
This is maybe something I'll focus on in the future. But for the moment I find this fp16 version well suited for small budgets as it runs on a 16GB GPU while the native fp32 version of GPT-J requires at least 24GB of VRAM.
Also, with the bitsandbytes integration in HF Transformers you can use the model in 8 bits: https://huggingface.co/blog/hf-bitsandbytes-integration
Submitted by juliensalinas t3_11tqryd in MachineLearning
juliensalinas t1_ityl00x wrote
Reply to comment by pommedeterresautee in [P] Up to 12X faster GPU inference on Bert, T5 and other transformers with OpenAI Triton kernels by pommedeterresautee
Definitely. I will keep you posted Michael. Thanks!
juliensalinas t1_ittw69q wrote
Reply to [P] Up to 12X faster GPU inference on Bert, T5 and other transformers with OpenAI Triton kernels by pommedeterresautee
Congrats and thanks a lot u/pommedeterresautee for this amazing project. As usual, your in-depth explanations about low level machine learning are very insightful.
Transformer Deploy was already very exciting, and this new project seems even more promising!
Can't wait to try it for real and see if we can use it behind NLP Cloud somehow.
juliensalinas OP t1_jd6uju4 wrote
Reply to comment by No_Combination_6429 in [D] An Instruct Version Of GPT-J Using Stanford Alpaca's Dataset by juliensalinas
Thx!