Submitted by MrOfficialCandy t3_10uqnj3 in MachineLearning

While the greatest amount of training content is available for English at the moment, it seems unlikely to me that it's an efficient language to train AI. A more optimal language would reduce training time and model size.

It might, for example, be much more efficient to train AI on Chinese, Korean, or Japanese due to a reduce grammatical token-set when constructing sentences/ideas.

But taking the idea further, I wonder if we should be using a human language at all. Perhaps it's more efficient to use something altogether new in order to both communicate with AI more exactingly and also to reduce model size/training.

What do y'all think?

0

Comments

You must log in or register to comment.

noobgolang t1_j7dff9f wrote

What?

22

Striking-Travel-6649 t1_j7hx5tb wrote

"I wonder if we should be using a human language at all"

My response: 01101000 01100101 01101100 01101100 01101111 00101100 00100000 01101111 01110101 01101110 00100001

4

danielgafni t1_j7dkl6x wrote

English is a pretty simple language in comparison to other popular languages. Not sure why do you think it’s more complex than Chinese…

8

gunshoes t1_j7dg38g wrote

Depends on your problem space. If you're talking about NLP/Speech applications, English is the most popular simply because it's the most resources language available and has a larger market application.

Even then, most models only show good performance with prestige dialects. Minority dialects such as AAVE notorious suffer with modern models.

7

MadScientist-1214 t1_j7dh2rw wrote

From a linguistic perspective, no language is more efficient than another language. Switching to an Asian language like Chinese would not necessarily be a better representation for the neural network than English. Mandarin Chinese is a very analytical language with a low inflectional morphology, but it is no less complex. For example, it has a large number of modal particles that have no equivalent in English.

In linguistics, there are also attempts to convert languages into other forms of representation. The natural semantic metalanguage (NSM), for example, reduces words to a set of semantic primitives.

I am a bit more skeptical from what I have seen both in linguistics and in NLP.

6

AngelKitty47 t1_j7dluel wrote

I think our brains dont think in language but use language to describe our thoughts so how do you teach a machine to think thoughts?

4

uotsca t1_j7dhgjb wrote

No

2

FHIR_HL7_Integrator t1_j7gjfhs wrote

What then? In terms of available data and being the lingua franca I don't see a better option. Just going on logic here but open minded to an alternative. It's all moot though - all languages should be translated to a common language in order to build data set, then results translated into language of choice. I suppose there could be an intermedia semantic language but that seems like a lot of additional steps for an intermediary.

1

ai_master_central t1_j7euxkp wrote

what we need a completly new language designed to the bridge between human and machine, that will be ideal , maybe we can train a multi-language model to create a perfect human language .

2

FHIR_HL7_Integrator t1_j7gj3c0 wrote

Way too difficult. Just translate all languages first and then incorporate them into the lingua Franca, which is English at this point in history.

1

[deleted] t1_j7f9vm6 wrote

The perfect language would be Mentalese.

2

uoftsuxalot t1_j7qkcq7 wrote

No, information is not reduced by just using another code

1

MrOfficialCandy OP t1_j7r7xg3 wrote

It's not just about swapping tokens for other tokens. It's that grammatical structure (of any language) which can convey ambiguous meaning.

1

like_a_tensor t1_j7r4mdx wrote

There's been some work on getting models to work at the byte level. An example: https://arxiv.org/abs/2105.13626

1

MrOfficialCandy OP t1_j7r85li wrote

That doesn't help at all. Reading tokens at the byte level does not stop the word "they" or "it" from being vague in the context of a sentence.

1

like_a_tensor t1_j7rbdno wrote

Sounds like you want something like a logical representation of sentences. Reducing sentences to first order logic might be what you're looking for. There's also AMRs (Abstract Meaning Representations). The problem with AMRs is that they need to be built, which is non-trivial for machines and time-consuming for humans.

1