Viewing a single comment thread. View all comments

Vivavirtu t1_j9j4zsu wrote

"ChatGPT is surprisingly good at forming natural, albeit a bit formal, answers that seem to understand traditional and pop-cultural references in China."

Huh, neat. I didn't know ChatGPT training data included other languages.

With multiple countries rushing to create their own NLP chatbots, it would be cool if they could take it to the next step and create language agnostic chatbots, somehow. I'm guessing that would take a much deeper level of "understanding" that ChatGPT does not yet possess.

2

jackmountion t1_j9janho wrote

Well it doesn't. ChatGPTs training data is largely English with some other languages mixed in but extremely limited(not exactly sure about this). ChatGPT understanding Chinese could be part of a strange phenomena which we don't completely understand yet. There has been a major paper about it but it seems that these LLMs have an emergent capacity to generalize to languages it is not trained on. One of the theories on this is perhaps during learning it is actually learning a grammer structure, since it's the most efficient way to "understand" human language. This which can be easily copied for other languages. Sorta like if I really learn the ins and outs of Calculous, I can sorta give you a general understanding of the ins and outs of what Physics math is doing without taking a Physics class. What's amazing if true is this would mean AI generalizes much easier than anticipated. Maybe even giving insight to how there statistical models seem to have theory of mind capabilities.

Here's a dude talking about the study. Hopefully u can use this to find it. It's very recently done. https://twitter.com/janleike/status/1625207251630960640?t=3z0NEYPFifguL2u8NOCWfA&s=19

3

jackmountion t1_j9jasgc wrote

could also be pretraining thought that's another theory that in pretraining data there leaks in some stuff from other languages. But I personally don't buy that it's simply not enough data. Maybe both theories are slightly right it's generalizing better than we thought but it needs so language context at first?

1

brainshortcircuited t1_j9j5cgj wrote

My first thought is to use it for translating.

2

Vivavirtu t1_j9j69bn wrote

That would be pretty useful, a translator that understands slang, context, and cultural nuance.

I was thinking it would be great if it could learn from a body of text in a certain language, and apply what it learned to conversations in any other language. But I don't know how this stuff works, and how distant of a goal that would be.

2

Mindaroth t1_j9mb01l wrote

I’ve used chatgpt for simple translations to Japanese. I found it better than Google translate, but I’m not fluent enough to try it on more complicated text.

1