Comments

You must log in or register to comment.

KingsmanVince t1_jaboa8m wrote

Knowing the architecture isn't enough. How large is your training dataset? Do you use gradient accumulation?

2

CKtalon t1_jabpgds wrote

A ~10^7-10^8 parameter model should be possible.

3

ahiddenmessi2 OP t1_jaciwqg wrote

Thanks for your reply. My goal is to train the transformer to read a specific programming language so I I guess there is no pre trained model available. Seems I have to train it from scratch on my laptop GPU :(

Edit: and yes it has 6gb only

1

ggf31416 t1_jacq8pl wrote

For reference a RTX 3090 can be rented as low as ~ $0.25/hour at vast.ai with just a credit card if you are in a hurry (AWS and GCP require a quota increase to use GPUs), or you may be able to get free credits for research at major cloud providers.

2

I_will_delete_myself t1_jad9amj wrote

ChatGPT uses GPT-3.5, which is a pre-trained model. Google uses pertained models. Facebook created a pre-trained model recently.

If these models satisfy their needs it will definitely satisfy yours. Unless if you are going beyond a kind of problem that hasn't been tackled before, a pre-trained model will save you so much time training and require a lot less data to get it to converge and actually be useful.

2