Submitted by wrsage t3_xtl6jm in MachineLearning

Gpu for machine translation

Soo, i want to make machine translation rig for me to make my work easier. I work as translator and use currently using google api to reduce my workload. But my country have very few people so development of google translate is extremely bad. I had to fix some easiest sentenses like "Goodnight" since GT translate it wrong. That's why I decided to make my own translation system and use my own translations as base. So what is bare minimum required gpu for at least 10.000 pages of translations? Currently I'm considering p106-100, rx 580, 1060 6gb. I think these materials are enough, but let me know if it's not.

2

Comments

You must log in or register to comment.

JustOneAvailableName t1_iqqovjt wrote

You could make an impact with lots of data in a low resource language. You can't make an impact without experience in this area.

The 1060 is absolutely useless for any kind of training, it was a low tier GPU 6 years ago. The older techniques are fine on a CPU.

2

suflaj t1_iqt876f wrote

To develop a translation system you will need loads of data. To process this loads of data, you will need loads of processing power. To get good results, you will need large models. Now, depends on how much time you have. If you have a year, then a 2080Ti will be fine, it can pretrain a BERT model in several months. If you have a month, you should probably consider a few 3090s. If you have a week, then renting out 8xA100 rigs might be your best bet.

Overall I'd first focus on collecting the data. 10000 pages of data, assuming 50 sentences per page (and that is a generous guess) is nowhere near enough to develop a translation system. Aim for several tens of millions of sentences, ideally 100-200mil sentence pairs if you wish to outperform Google Translate, or consider developing a handmade tool instead.

The GPUs you mentioned will only be able to run LSTM-CNN models, which will never compete with Google Translate (which is hand-tuned system accompanied by a transformer model). You at least need a 1080Ti/2080Ti, and even that is fairly bad and you'll need months to get anything out of it.

2

thevillagersid t1_iquosy6 wrote

Have you tried getting in touch with someone at Google? I wonder whether they have a procedure for folks with expertise in the local language to contribute materials (e.g. your translations, high quality translations from other sources) to improve the model for that language.

1

wrsage OP t1_isj9bko wrote

Thank you for detailed information. I think I will go for single 2080ti or 3080. 3090 is out of my budget for now. By mean 10000 pages i meant tens of thousands so I have quite a lot of materials of my own work. And I worked with my colleagues to make dictionary and I have a lot (questionable) raw data. I don't know if those pair of words have any help.

1