Viewing a single comment thread. View all comments

IntelArtiGen t1_iziveld wrote

I don't know if I entirely got the question but I can try to answer. With one number you can in theory represent an infinite amount of information. In practice, on computers, we don't have an infinite amount of precision on one number (fp16, fp32 etc.), and we don't have an infinite amount of precision on how a DL algorithm can interpret this number. If -1 and +1 are two different pieces of information, it's fine. If 0.9999999 and 1.000001 are two different pieces of information, a DL algorithm will have trouble learning it.

So there is a relationship because for practical reasons we can't represent everything in one number. But there also is a limit, if you can fit all the information you need in 10 values, using 100000 values to represent it won't help. And if you want to know what is the right value in theory I'm afraid you can't because it depends on the dataset, the model and the training process.

Perhaps this has a bit to do with information theory. But I'm not aware of an information theory that would focus on DL, this field of science is maybe under-investigated.

1

OutOfCharm OP t1_iziwmbt wrote

That's a point! Would you agree that if with 10 values it is sufficient to fit all the information you need, decreasing the values to e.g. 3 must harm the performance?

1

IntelArtiGen t1_izj2jc3 wrote

The amount of values must be sufficient and the model must be able to process these values. We could imagine a model which would not perform well with 10 values because it's too much to process but it would perform better with 3 values, even though the "perfect model" would need 10 values to give the best results.

1