chatterbox272 t1_irifzyq wrote on October 8, 2022 at 1:06 PM

Your model is a teeny-tiny MLP, your dataset is relatively small, it's entirely possible that you're unable to extract rich enough information to do better than 70% on the val set.

You also haven't mentioned how much L2 or Dropout you're using, nor how they do on their own. Both of those methods come with their own hyperparameters which need to be tuned.

perfopt OP t1_irig9zc wrote on October 8, 2022 at 1:08 PM

I see. I’ll try increasing the data used. My fear is that it may lead to a some categories having much less data than others.

L2 0.001 and Dropout 0.1