KingsmanVince t1_jaboa8m wrote on February 28, 2023 at 7:09 AM

#2,094,763

Knowing the architecture isn't enough. How large is your training dataset? Do you use gradient accumulation?

CKtalon t1_jabpgds wrote on February 28, 2023 at 7:24 AM

#2,095,025

A ~10^7-10^8 parameter model should be possible.

aigoritma-1 t1_jac492v wrote on February 28, 2023 at 10:53 AM

#2,098,347

I can recommend a free open-source lib to help you train on cloud if you need more resources https://skypilot.readthedocs.io/en/latest/

ggf31416 t1_jac61sd wrote on February 28, 2023 at 11:17 AM

#2,098,790

2060 has 6GB of VRAM, right?

It should be possible to train with that amount https://huggingface.co/docs/transformers/perf_train_gpu_one#optimizer

If you need to train from scratch (most people will just finetune) this will take a while, original training took 90 hours in 8xV100, each one should be faster than your GPU https://www.arxiv-vanity.com/papers/1910.01108/

ahiddenmessi2 OP t1_jacahig wrote on February 28, 2023 at 12:10 PM

#2,100,053

Replying to CKtalon (#2,095,025)

Thank you . I will take a look of my number of parameters .

ahiddenmessi2 OP t1_jachs8r wrote on February 28, 2023 at 1:22 PM

#2,102,395

Replying to aigoritma-1 (#2,098,347)

Thank you I will look into it

ahiddenmessi2 OP t1_jacin2n wrote on February 28, 2023 at 1:29 PM

#2,102,676

Replying to KingsmanVince (#2,094,763)

My dataset size can be varied cuz the data can be generated. Also, I will consider using gradient accumulation to improve performance too. Thanks

ahiddenmessi2 OP t1_jaciwqg wrote on February 28, 2023 at 1:31 PM

#2,102,770

Replying to ggf31416 (#2,098,790)

Thanks for your reply. My goal is to train the transformer to read a specific programming language so I I guess there is no pre trained model available. Seems I have to train it from scratch on my laptop GPU :(

Edit: and yes it has 6gb only

ggf31416 t1_jacq8pl wrote on February 28, 2023 at 2:29 PM

#2,105,135

Replying to ahiddenmessi2 (#2,102,770)

For reference a RTX 3090 can be rented as low as ~ $0.25/hour at vast.ai with just a credit card if you are in a hurry (AWS and GCP require a quota increase to use GPUs), or you may be able to get free credits for research at major cloud providers.

I_will_delete_myself t1_jad9amj wrote on February 28, 2023 at 4:37 PM

#2,111,197

ChatGPT uses GPT-3.5, which is a pre-trained model. Google uses pertained models. Facebook created a pre-trained model recently.

If these models satisfy their needs it will definitely satisfy yours. Unless if you are going beyond a kind of problem that hasn't been tackled before, a pre-trained model will save you so much time training and require a lot less data to get it to converge and actually be useful.

ahiddenmessi2 OP t1_jad9upi wrote on February 28, 2023 at 4:41 PM

#2,111,374

Replying to I_will_delete_myself (#2,111,197)

Thank you. I am looking at codeBERT which might satisfy my needs

[D] Training transformer on RTX2060

Comments