Submitted by PleaseKillMeNowOkay t3_xtadfd in deeplearning

I have a neural network whose outputs are the parameters of a probability distribution. I have another neural network whose outputs are the parameters of a probability distribution with a more general covariance structure than the first one (It can be reduced to the first pdf). Is my second network going to perform at least as well as my first network?

Apologies for the vague description. I am not sure how much I'm allowed to talk about it. Literally, any help is appreciated.

5

Comments

You must log in or register to comment.

WhizzleTeabags t1_iqq59i6 wrote

It can perform worse, the same or better

−2

UsernameRelevant t1_iqq5hp9 wrote

> Is my second network going to perform at least as well as my first network?

Impossible to say. In general, more parameters mean that you can get a better fit, but also that the model overfits more easily.

Why don’t you compare the models on a test set?

2

thebear96 t1_iqqsaoe wrote

Assuming same hyperparameters, the second network theoretically should converge to the solution quicker. So one will need to modify the hyperparameters and maybe add some dropouts so that the model doesn't overfit.

3

thebear96 t1_iqqykoz wrote

That shouldn't create a lot of difference but yes the performance should be worse than the first network in that case. It's far easier to predict two outputs than four. You can try increasing linear layers and using a slower learning rate to see if the model improves.

1

PleaseKillMeNowOkay OP t1_iqqz3lp wrote

I could add more linear layers and based on my experiments it would probably help but my intention is to compare my new model with the old one for which I presume the architecture should be as close as possible.

1

thebear96 t1_iqr04o9 wrote

Ideally it should. In that case you will have a worse performance for the second architecture. When you compare you'll have to say that. But it's pretty expected that the second architecture will not perform as well as the first one, so I'm not sure if there's much use comparing. But it's definitely doable.

2

PleaseKillMeNowOkay OP t1_iqscxo9 wrote

The simpler model had lower training loss with the same number of epochs. I tried training the second model until it had the same training loss as the first model, which took much longer. The validation did not improve and had a slight upward trend, which I know means that it's overfitting.

1

sydjashim t1_iqtu9hp wrote

Did you keep same initial weights for both the networks ?

1

thebear96 t1_iqtxrnx wrote

Well I assumed that the network had more layers and so more parameters. More parameters can represent data much better and quicker. For example if you had a dataset with 30 features, and you use a Linear layer with 64 neurons, it should be able to represent each data point much quicker and easier than let's say a linear layer with 16 neurons. That's why I think the model would get converged quicker. But in OPs case his hidden layers are the same, only the output layer has more neurons. In that case we won't have a quick convergence.

1

sydjashim t1_ique162 wrote

I have got a quick guess here.. maybe can be of help to you.. take the n-1 layers weights of your first learned model (trained weights) then try finetuning with the 4 outputs and observe either your validation loss is improving.

If so, then later you can take the untrained initial weights of your first model (till n-1th layer) then trying converging them with 4 outputs. This step is mentioned such that you have got a model started training from scratch for 4 outputs but having the same initial weights for both the models.

Why am i saying this ?

Well. I think you could try in this way since you expect to keep maximum params esp. model parameters (weights) similar while running the comparision between them.

2