Viewing a single comment thread. View all comments

i_likebrains t1_j2k46t1 wrote

What batch sizes, learning rates and number of epochs are suitable for smaller datasets?

2

v2thegreat t1_j2lpumb wrote

These comes under hyperparameter optimization, so you will definitely need to play around with them, but here are my rules of thumb (take it with a grain of salt!)

Learning rate: start with a large learning rate (ex 10e-3), and if the model overfits, then reduce it down to 10e-6. There's a stackoverflow article that explains this quite well.

Number of epochs: it's right before your model's loss starts diverging from the validation loss. Plot them out and where they diverge is where the overfitting happens.

Batch size: large enough that the data fits in memory to speed things up in general

3

jakderrida t1_j2zxpxe wrote

The batch size, learning rate, and number of epochs can all affect the model's performance on a smaller dataset. Here are some general guidelines that you can use as a starting point:

Batch size: A smaller batch size can be more appropriate for smaller datasets because it allows the model to make updates based on more diverse data. For example, a batch size of 32 or 64 is a good starting point for a smaller dataset.

Learning rate: The learning rate determines how fast the model updates its weights. A higher learning rate can allow the model to make rapid progress at the beginning of training, but it can also make the model more prone to overfitting. A lower learning rate can make the model's progress slower, but it can also help the model to generalize better to new data. A learning rate in the range of 0.001 to 0.01 is a good starting point for a smaller dataset.

Number of epochs: The number of epochs is the number of times the model sees the entire dataset during training. A smaller dataset may require fewer epochs to prevent overfitting. For example, you may want to start with a small number of epochs (e.g., 10 or 20) and increase it if the model's performance on the validation set is still improving.

Keep in mind that these are just general guidelines, and the optimal batch size, learning rate, and number of epochs will depend on the specific characteristics of your dataset and model. It may be helpful to experiment with different combinations of these hyperparameters to find the best settings for your particular case.

3

No_Research5050 t1_j2o941c wrote

Honestly will need to play around and probably perform some sort of search on the hyperparams

1

debrises t1_j3itq4j wrote

Larger batch sizes lead to a better gradient estimation, meaning, optimizer steps tend to be in the “right” direction, thus leading to faster convergence.

Run a test epoch to see when your model converges, and then use slightly more epochs so that your model can try to find different minimum points. And use model checkpoint callback.

As for loss, just use an Optimizer from the Adam family, like AdamW. It handles most of the problems that can happen pretty well.

The learning rate heavily depends on what range of values your loss has. Think about it this way: if your loss is equal to 10 then using the lr of 0.01 will get us 10 * 0.01 = 0.1. We then compute partial derivatives of this value with respect to each weight and backpropagate that and update our weights. Usually, we want our weights to have small values and to be centered around zero, updating them by even smaller values every step. The point is that your model doesn't know what values your loss takes and thus, you have to optimize the learning rate to find that nice value that connects your loss signal to your weights.

1