PleaseKillMeNowOkay OP t1_iqqwpem wrote on October 2, 2022 at 1:28 PM

That's what I thought but I haven't been able to get the second model to even match the performance of the first one. I tried regularization methods without much success.

thebear96 t1_iqqwxur wrote on October 2, 2022 at 1:30 PM

Is the loss decreasing enough after running for specified number of epochs? Are you getting a flat tail after convergence?

PleaseKillMeNowOkay OP t1_iqqxd7o wrote on October 2, 2022 at 1:34 PM

Yes, I trained until the validation loss stopped improving, and then some more just to make sure.

thebear96 t1_iqqxkb4 wrote on October 2, 2022 at 1:35 PM

That's strange. It could be a data quantity issue. Bigger networks typically will need more data to perform well.

PleaseKillMeNowOkay OP t1_iqqxw6h wrote on October 2, 2022 at 1:38 PM

I wouldn't call it a bigger network necessarily. The second network has two more output neurons compared to the first. Rest are the same. How much difference that makes. Idk

thebear96 t1_iqqykoz wrote on October 2, 2022 at 1:43 PM

That shouldn't create a lot of difference but yes the performance should be worse than the first network in that case. It's far easier to predict two outputs than four. You can try increasing linear layers and using a slower learning rate to see if the model improves.

PleaseKillMeNowOkay OP t1_iqqz3lp wrote on October 2, 2022 at 1:47 PM

I could add more linear layers and based on my experiments it would probably help but my intention is to compare my new model with the old one for which I presume the architecture should be as close as possible.

thebear96 t1_iqr04o9 wrote on October 2, 2022 at 1:55 PM

Ideally it should. In that case you will have a worse performance for the second architecture. When you compare you'll have to say that. But it's pretty expected that the second architecture will not perform as well as the first one, so I'm not sure if there's much use comparing. But it's definitely doable.

sydjashim t1_iqtuqdt wrote on October 3, 2022 at 1:17 AM

Can you reason out why the model will get converged quicker ?

thebear96 t1_iqtxrnx wrote on October 3, 2022 at 1:41 AM

Well I assumed that the network had more layers and so more parameters. More parameters can represent data much better and quicker. For example if you had a dataset with 30 features, and you use a Linear layer with 64 neurons, it should be able to represent each data point much quicker and easier than let's say a linear layer with 16 neurons. That's why I think the model would get converged quicker. But in OPs case his hidden layers are the same, only the output layer has more neurons. In that case we won't have a quick convergence.

Neural network that models a probability distribution

thebear96 t1_iqqsaoe wrote on October 2, 2022 at 12:50 PM