pornthrowaway42069l t1_isldxay wrote on October 16, 2022 at 9:11 PM

One of the reasons, besides mentioned in other comments, is that sometimes test set is just easier to solve than the train set. Not saying that is your crux, but might be worth a try by getting different splits.

DrXaos t1_ismok0x wrote on October 17, 2022 at 3:01 AM

In this case the train and test probably wasn't split stratified by classes, and there's an imbalance in the relative proportions of classes and there is some bias in the predictor.

And it's probably measuring top 1 accuracy which isn't the loss function being optimized directly.

do a stratified test/train split
measure more statistics on train vs test
check dropout or other regularization differences

redditnit21 OP t1_isn2joq wrote on October 17, 2022 at 5:18 AM

I am using a stratified test/train split. "train_df, test_df = model_selection.train_test_split(
df, test_size=0.2, random_state=42, stratify=df['Class']
)"

All the classes are equally proportioned except 1 class. I am using dropout layer in the model for training. Is the dropout layer creating this issue?

DrXaos t1_isn3k9e wrote on October 17, 2022 at 5:29 AM

Certainly could be dropout. Dropout is on during training, stochastically perturbing activations in its usual form in packages, and off during test.

Take out dropout, use other regularization and report directly on your optimized loss function, train and test, often NLL if you're using a conventional softmax + CE loss function which is the most common for multinomial outcomes.

redditnit21 OP t1_isn465e wrote on October 17, 2022 at 5:37 AM

>Views

Yeah I am using conventional softmax + CE loss function which is the most common for multinomial outcomes. Which regularization method would you suggest me to use and what's the main reason why test acc should be less than train acc?

DrXaos t1_isn4fs5 wrote on October 17, 2022 at 5:40 AM

top 1 accuracy is a noisy measurement particularly if it's a binary 0/1 measurement.

A continuous performance statistic will more likely show the expected behavior of train perf better than test. Note on loss functions lower is better.

There's lots of regularization possible, but start with L2, weight decay, and/or limiting the size of your network.

redditnit21 OP t1_isn2ap3 wrote on October 17, 2022 at 5:15 AM

I am using this split "train_df, test_df = model_selection.train_test_split(
df, test_size=0.2, random_state=42, stratify=df['Class']
)"