BrohammerOK
BrohammerOK t1_j9wvrl7 wrote
Reply to comment by osedao in [D] Is validation set necessary for non-neural network models, too? by osedao
You can work with 2 splits, which is a common practice. For a small dataset you can use 5 or 10 fold crossvalidation with shuffling on 75-80% of the dataset (train) for hyperparameter tunning / model selection, fit the best model on the entirety of that set, and then evaluate/test on the remaining 25%-20% that you held out. You can repeat the process multiple times with different seeds to get a better estimation of the expected performance, assuming that the input data when you do inference comes from the same distribution as your dataset.
BrohammerOK t1_j7x21pg wrote
Reply to [D] Similarity b/w two vectors by TKMater
If you care as you said about both magnitude and direction, try with L2 (Euclidean distance) , not cosine similarity.
BrohammerOK t1_j2ekoyj wrote
Reply to [D] Does it make sense to use dropout and layer normalization in the same model? by Beneficial_Law_5613
If you do use both in the same layer, dropout should never be applied right before batch or layer norm because the features set to 0 would affect the mean and variance calculations. As an example, it is common to use batch norm in CNNs, and then dropout after the global average pooling (before the final fc layer). Sometimes you even see dropout between conv blocks, take a look at EfficientNet by Google.
BrohammerOK t1_iy9ap9d wrote
My first approach world be sampling N key frames uniformly from each long video and see if I get good validation performance training on that (tune the value of N as you wish). I wouldn't use a 3D transformer because frames will be very far away and the sequential nature of the data shouldn't matter that much unless your videos have some kind of general structure, you would know that I guess. I would build a baseline with like an average pooling of single frame embeddings and a classification head, then try if adding the time dimension helps at all. By randomly sampling in this way you could create a lot of data to train your model. Always inspect the sets of key frames visually first to make sure that the approach makes sense. It is a good idea to spend a good amount of time looking at the data before even thinking about models and hyperparameters, specially if it isn't a standard dataset.
BrohammerOK t1_ivwkn3i wrote
Adam with lr=1e-4 for fine tunning with decay or decrease on plateau always works pretty well for me with convnets.
BrohammerOK t1_j9ww6yx wrote
Reply to comment by BrohammerOK in [D] Is validation set necessary for non-neural network models, too? by osedao
If you wanna use something like early stopping, though, you'll have no choice but to use 3 splits.