Viewing a single comment thread. View all comments

chatterbox272 t1_irifzyq wrote

Your model is a teeny-tiny MLP, your dataset is relatively small, it's entirely possible that you're unable to extract rich enough information to do better than 70% on the val set.

You also haven't mentioned how much L2 or Dropout you're using, nor how they do on their own. Both of those methods come with their own hyperparameters which need to be tuned.

4

perfopt OP t1_irig9zc wrote

I see. I’ll try increasing the data used. My fear is that it may lead to a some categories having much less data than others.

L2 0.001 and Dropout 0.1

1