ZestyData t1_je9ly2p wrote on March 30, 2023 at 12:41 PM

.. Uh. I'm going to assume you're relatively new to the world of ML. Translation is one of the most common uses for SOTA LLMs.

Its how Google Translate works, as just the most famous example.

What the SOTA translation tools don't yet use is instruct-tuning, to give them conversational interfaces (i.e the difference between GPT and ChatGPT). So they look different than using ChatGPT. But its very largely the same Generative (technically pretrained) Transformers under the hood.

RobbinDeBank t1_je9uola wrote on March 30, 2023 at 1:50 PM

The whole transformer architecture was invented for the purpose of doing translation too

MysteryInc152 t1_je9rwv5 wrote on March 30, 2023 at 1:29 PM

He's talking about unsupervised predict the next token GPTs. That's definitely not how Google Translate and the like work.

And GPT like models far outperform traditional SOTA translators

https://github.com/ogkalu2/Human-parity-on-machine-translations

ChuckSeven t1_jea2b99 wrote on March 30, 2023 at 2:46 PM

Google translate is certainly not an LLM. LLM can do translation but they are significantly worse than translation models trained on translation data. They have an encoder-decoder architecture as it is a sequence-to-sequence model and not a autoregressive architecture like LLMs do.

They are also not pretrained afaik. Since language modelling is modelling p(x) but translation is p(y|x).

ZestyData t1_jea73i3 wrote on March 30, 2023 at 3:21 PM

LLM simply means Large Language Model. A language model with a large number of parameters. LLMs have referred to all sorts of deep learning architectures over the past 20 years.

Google invented the Transformer architecture, and most importantly discovered how well transformers scale in power as they scale in size. This invention kickstarted the new arms race of LLMs to refer to transformer models with large numbers of parameters.

Google translate's current Prod architecture is a (large) transformer to encode, and an RNN to decode.[1] This falls into the category of LLMs - which weren't just invented when OpenAI invented RLHF at the end of 2022 and published ChatGPT. GPT is the same, but uses transformers for both the encoder & decoder.

The decoding RNN in google translate absolute is an autoregressive model.

I re-read the original GPT paper[2] to try and get a better understanding of the actual "pre-training" term here and I genuinely can't see a difference between that and what Google write about in their papers & blogs [3]; it just defines X & Y differently but they're both predicting a token based on the context window. GPT calls it pretraining because it does an additional step after learning P(X | context). But both approaches perform this fundamental autoregressive training.

[1] - https://ai.googleblog.com/2020/06/recent-advances-in-google-translate.html

[2] - https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf

[3] - https://arxiv.org/pdf/1609.08144.pdf

ChuckSeven t1_jeab590 wrote on March 30, 2023 at 3:48 PM

It's funny how you mention unrelated stuff, like RLHF, which has nothing to do with the point of discussion. A bit like an LLM I reckon.

See, Google translate models are (as far as publicly known) trained on a parallel corpus. This is supervised data since it provides the same text in different languages. The model is trained to model, e.g. p(y=German|x=English). There is much less supervised data available which means that the models you train will be significantly smaller. Note that translation models are usually only auto-regressive in the decoding part. The encoder part, which usually makes up about 50% of the parameters, is not auto-regressive.

LLMs tend to be >>1B parameter models trained on billions or trillions of tokens. The vast amount of data is believed to be necessary to train such large models. The models are modelling p(x) which in some cases is monolingual or virtually so. An LLM that is trained on a vast but only English corpus will not be capable of translating at all. LLM trained on a multi-lingual corpus can be prompted to translate but they are far inferior to actual translation models.

Lastly, modelling p(y|x) is significantly easier and thus less general than modelling p(x).

ZestyData t1_jeagwa8 wrote on March 30, 2023 at 4:25 PM

Thank you for repeating half of what I said back to me, much like ChatGPT you catch on quick to new information:

So, let's be clear here then. Contrary to your incorrect first comment; Google translate is an LLM, it is autoregressive, and it is pretrained. At least to the definition of pre-training given in the GPT paper, which was the parallel I first used in my own comment for OP who was coming into this thread with the knowledge of the latest GPT3+ and ChatGPT products.

>It's funny how you mention unrelated stuff, like RLHF

I did so because I had naively assumed you were also a newcomer to the field who knew nothing outside of ChatGPT, given how severely wrong your first comment was. I'll grant you that it wasn't related, except to lend an olive branch and reasonable exit-plan if that were the case for you. Alas.

>LLMs tend to be >>1B parameter models

Again, no. Elmo was 94 million, GPT was 120 milliom, GPT-2 was 1.5 billion. BERT has ~300 million parameters. These are all Large Language Models and have been called so for years.There is no hard definition on what constitutes "large". 2018's large is nearly today's consumer-hardware level. Google Translate (and its search) are a few of the most well-used LLMs actually out there.

Man. Why do you keep talking about things that you don't understand, even when corrected?

>Lastly, modelling p(y|x) is significantly easier and thus less general than modelling p(x).

Sure! It is easier! But that's not what you said. You'd initially brought up P(Y|X) as a justification that Translation isn't pre-trained. Those are two unrelated concepts. Its ultimate modelling goal is P(Y|X) but in both GPT (Generative Pre-training) and Google translate, they both pretrain their ability to predict P(X|context) in the decoder, just like any hot new LLM of today, hence my correction for you. The application towards ultimate P(Y|X) is not connected to the pretraining of their decoders.

[deleted] t1_jeduf55 wrote on March 31, 2023 at 8:37 AM

[removed]

MysteryInc152 t1_jeanb01 wrote on March 30, 2023 at 5:06 PM

>LLM trained on a multi-lingual corpus can be prompted to translate but they are far inferior to actual translation models.

No lol. You would know this if you've ever actually tried to translate with GPT-4 and the like. They re far superior to current sota

https://github.com/ogkalu2/Human-parity-on-machine-translations

ChuckSeven t1_jedsgz5 wrote on March 31, 2023 at 8:07 AM

I know about this post. It is interesting but the results here are far from conclusive. The BLOOM papers also did translation experiments and they say "... In the one-shot setting, BLOOM can, with the right prompt, perform competent translation, although it is behind dedicated (supervised) models such as M2M-100".

So let's maybe use some quantifiable measures instead of just looking at a few cherry-picked examples and claim otherwise?

MysteryInc152 t1_jee5zba wrote on March 31, 2023 at 11:12 AM

It's not cherry picked lol.

Wild how everyone will just use that word even when they've clearly not tested the supposed model themselves. I'm just showing you what anyone who's actually used these models for translation will tell you

https://youtu.be/5KKDCp3OaMo

https://www.reddit.com/r/visualnovels/comments/11rty62/gpt4_ai_vs_human_translation_on_the_opening_scene/?utm_source=share&utm_medium=android_app&utm_name=androidcss&utm_term=1&utm_content=share_button

ChuckSeven t1_jeeae4o wrote on March 31, 2023 at 11:57 AM

Look, it doesn't matter. You can't claim that LLM are better if you don't demonstrate it on an established benchmark with a large variety of translations. How should I know if those Japanese anime translations are correct? For what its worth it might be just "prettier" text but a wrong translation.

It's sad to get downvoted on this subreddit for insisting on very basic academic principles.

MysteryInc152 t1_jeecbeq wrote on March 31, 2023 at 12:15 PM

I didn't downvote you but it's probably because you're being obtuse. anyway whatever. if you don't want to take evidence at plain sight then don't. the baseline human comparisons are right there. Frankly it's not my problem If you're so suspicious of results and not bilingual to test it yourself. It's not really my business if you believe me or not.

ChuckSeven t1_jeenkvs wrote on March 31, 2023 at 1:46 PM

I'm happy to take evidence into account. Your results indicate that LLM can be beneficial for translation. As I said previously, it looks interesting. But you claim, and I quote: "They re far superior to current sota" solely based on your personal and human comparison. This is an over-generalisation and not scientific. Like a flat earther claiming the earth is flat because .. just look at it "evidence at plain sight".

MysteryInc152 t1_jeanjj9 wrote on March 30, 2023 at 5:08 PM

>LLM can do translation but they are significantly worse than translation models trained on translation data.

This is not true at all lol. They're better by a wide margin.

matthkamis OP t1_je9m25q wrote on March 30, 2023 at 12:42 PM

And it doesn’t work well

ZestyData t1_je9mf4n wrote on March 30, 2023 at 12:45 PM

Indeed. But that's the answer to your question. Don't downvote me for it.

"Can large language models be applied to language translation". Yes, they already are.

matthkamis OP t1_je9mpyd wrote on March 30, 2023 at 12:48 PM

Don’t give me a passive aggressive answer then “..Uh. I’m gonna assume you’re…”

ZestyData t1_je9n2f0 wrote on March 30, 2023 at 12:50 PM

¯\_(ツ)_/¯

not that aggressive to have assumed but ok

roybatty553 t1_je9ptxe wrote on March 30, 2023 at 1:13 PM

matthkamis is right. You can (and did) pack a lot of condescension into a single ‘Uh..’ . The rest of your reply was well-informed and helpful (for me too; thank you) but your opener helped no one.

matthkamis OP t1_je9ojio wrote on March 30, 2023 at 1:03 PM

The assuming isn’t the passive aggressive part. It’s the leading your response with “uh..” it reads kinda like “well actually…”

Dry_Bag_2485 t1_je9qo78 wrote on March 30, 2023 at 1:20 PM

Don’t ask questions on Reddit if you can’t handle certain pixels depicting letters in an order you find offensive😂 The ML Reddits are really going downhill

Dry_Bag_2485 t1_je9qfyi wrote on March 30, 2023 at 1:18 PM

Try DeepL. Or openais translation endpoints, there’s a lot of options other than google

spiky_sugar t1_je9phxc wrote on March 30, 2023 at 1:10 PM

https://arxiv.org/abs/2301.08745

matthkamis OP t1_je9pn5x wrote on March 30, 2023 at 1:11 PM

Thank you

MysteryInc152 t1_je9s41k wrote on March 30, 2023 at 1:31 PM

Bilingual LLMs are much better translators than traditional SOTA.

https://github.com/ogkalu2/Human-parity-on-machine-translations

ChuckSeven t1_jeduwfs wrote on March 31, 2023 at 8:44 AM

The presented examples are intriguing but your general statements requires a proper evaluation. Afaik, no bilingual LLM has yet beaten the state of the art on an established translation benchmark.

harharveryfunny t1_je9t0b4 wrote on March 30, 2023 at 1:38 PM

Just try it ! Yes - they do very well.

You don't even need to ask them to translate - just give them a foreign language source and ask questions about it, or ask for a summary !

[deleted] t1_je9ydid wrote on March 30, 2023 at 2:17 PM

[removed]

FermiAnyon t1_je9ygm5 wrote on March 30, 2023 at 2:18 PM

Yeah, I've had bilingual conversations with chatgpt.

Also, ask it a question and when it tells you some paragraphs, say "can you rewrite that in Japanese" or whatever you speak.

Pretty cool

IDefendWaffles t1_jea2403 wrote on March 30, 2023 at 2:45 PM

You can talk to chatgpt in English and then just ask it a question in any other language and it will answer back to you in that language. Or you can just tell it: let's talk in Finnish from now on and it will.

[D] Can large language models be applied to language translation?

Comments

ZestyData t1_je9ly2p wrote on March 30, 2023 at 12:41 PM

RobbinDeBank t1_je9uola wrote on March 30, 2023 at 1:50 PM

MysteryInc152 t1_je9rwv5 wrote on March 30, 2023 at 1:29 PM

ChuckSeven t1_jea2b99 wrote on March 30, 2023 at 2:46 PM

ZestyData t1_jea73i3 wrote on March 30, 2023 at 3:21 PM

ChuckSeven t1_jeab590 wrote on March 30, 2023 at 3:48 PM

ZestyData t1_jeagwa8 wrote on March 30, 2023 at 4:25 PM

[deleted] t1_jeduf55 wrote on March 31, 2023 at 8:37 AM

MysteryInc152 t1_jeanb01 wrote on March 30, 2023 at 5:06 PM

ChuckSeven t1_jedsgz5 wrote on March 31, 2023 at 8:07 AM

MysteryInc152 t1_jee5zba wrote on March 31, 2023 at 11:12 AM

ChuckSeven t1_jeeae4o wrote on March 31, 2023 at 11:57 AM

MysteryInc152 t1_jeecbeq wrote on March 31, 2023 at 12:15 PM

ChuckSeven t1_jeenkvs wrote on March 31, 2023 at 1:46 PM

MysteryInc152 t1_jeanjj9 wrote on March 30, 2023 at 5:08 PM

matthkamis OP t1_je9m25q wrote on March 30, 2023 at 12:42 PM

ZestyData t1_je9mf4n wrote on March 30, 2023 at 12:45 PM

matthkamis OP t1_je9mpyd wrote on March 30, 2023 at 12:48 PM

ZestyData t1_je9n2f0 wrote on March 30, 2023 at 12:50 PM

roybatty553 t1_je9ptxe wrote on March 30, 2023 at 1:13 PM

matthkamis OP t1_je9ojio wrote on March 30, 2023 at 1:03 PM

Dry_Bag_2485 t1_je9qo78 wrote on March 30, 2023 at 1:20 PM

Dry_Bag_2485 t1_je9qfyi wrote on March 30, 2023 at 1:18 PM

spiky_sugar t1_je9phxc wrote on March 30, 2023 at 1:10 PM

matthkamis OP t1_je9pn5x wrote on March 30, 2023 at 1:11 PM

MysteryInc152 t1_je9s41k wrote on March 30, 2023 at 1:31 PM

ChuckSeven t1_jeduwfs wrote on March 31, 2023 at 8:44 AM

harharveryfunny t1_je9t0b4 wrote on March 30, 2023 at 1:38 PM

[deleted] t1_je9ydid wrote on March 30, 2023 at 2:17 PM

FermiAnyon t1_je9ygm5 wrote on March 30, 2023 at 2:18 PM

IDefendWaffles t1_jea2403 wrote on March 30, 2023 at 2:45 PM