Viewing a single comment thread. View all comments

jackmountion t1_j9jasgc wrote

could also be pretraining thought that's another theory that in pretraining data there leaks in some stuff from other languages. But I personally don't buy that it's simply not enough data. Maybe both theories are slightly right it's generalizing better than we thought but it needs so language context at first?

1