thebear96 t1_iqtxrnx wrote on October 3, 2022 at 1:41 AM

Reply to comment by sydjashim in Neural network that models a probability distribution by PleaseKillMeNowOkay

Well I assumed that the network had more layers and so more parameters. More parameters can represent data much better and quicker. For example if you had a dataset with 30 features, and you use a Linear layer with 64 neurons, it should be able to represent each data point much quicker and easier than let's say a linear layer with 16 neurons. That's why I think the model would get converged quicker. But in OPs case his hidden layers are the same, only the output layer has more neurons. In that case we won't have a quick convergence.