znihilist t1_islbm58 wrote on October 16, 2022 at 8:55 PM

Almost always this is an issue of sampling. Make sure everything is well represented everywhere.

> And why testing accuracy shouldn’t be higher than training?

There is no law that says this shouldn't happen, but in 99.99% of cases it is a sampling issue. However, sometimes when doing off-time testing, this issue can prop out, and isn't necessarily something that means your model is flawed (in this specific context).

I've had an issue with a model we were working on, and we needed to prove that the model works for different time periods, and we needed to remove the last two month's of data from the training and left them for validation. It turns out that in the last months of data, specific subset of the data was over represented than in the previous months, and it was the "good" data.