BrohammerOK t1_j9ww6yx wrote on February 25, 2023 at 3:30 AM

Reply to comment by BrohammerOK in [D] Is validation set necessary for non-neural network models, too? by osedao

If you wanna use something like early stopping, though, you'll have no choice but to use 3 splits.

BrohammerOK t1_j9wvrl7 wrote on February 25, 2023 at 3:26 AM

Reply to comment by osedao in [D] Is validation set necessary for non-neural network models, too? by osedao

You can work with 2 splits, which is a common practice. For a small dataset you can use 5 or 10 fold crossvalidation with shuffling on 75-80% of the dataset (train) for hyperparameter tunning / model selection, fit the best model on the entirety of that set, and then evaluate/test on the remaining 25%-20% that you held out. You can repeat the process multiple times with different seeds to get a better estimation of the expected performance, assuming that the input data when you do inference comes from the same distribution as your dataset.

BrohammerOK t1_j7x21pg wrote on February 10, 2023 at 12:28 AM

Reply to [D] Similarity b/w two vectors by TKMater

If you care as you said about both magnitude and direction, try with L2 (Euclidean distance) , not cosine similarity.

BrohammerOK t1_j2ekoyj wrote on December 31, 2022 at 6:05 PM

Reply to [D] Does it make sense to use dropout and layer normalization in the same model? by Beneficial_Law_5613

If you do use both in the same layer, dropout should never be applied right before batch or layer norm because the features set to 0 would affect the mean and variance calculations. As an example, it is common to use batch norm in CNNs, and then dropout after the global average pooling (before the final fc layer). Sometimes you even see dropout between conv blocks, take a look at EfficientNet by Google.

BrohammerOK t1_iy9ap9d wrote on November 29, 2022 at 6:38 PM

Reply to [D] Are problems with massive amount of input features feasible? by Vae94

My first approach world be sampling N key frames uniformly from each long video and see if I get good validation performance training on that (tune the value of N as you wish). I wouldn't use a 3D transformer because frames will be very far away and the sequential nature of the data shouldn't matter that much unless your videos have some kind of general structure, you would know that I guess. I would build a baseline with like an average pooling of single frame embeddings and a classification head, then try if adding the time dimension helps at all. By randomly sampling in this way you could create a lot of data to train your model. Always inspect the sets of key frames visually first to make sure that the approach makes sense. It is a good idea to spend a good amount of time looking at the data before even thinking about models and hyperparameters, specially if it isn't a standard dataset.

BrohammerOK t1_ivwkn3i wrote on November 11, 2022 at 3:24 AM

Reply to [D] Best learning rate for fine tuning a pretained CNN by Meddhouib10

Adam with lr=1e-4 for fine tunning with decay or decrease on plateau always works pretty well for me with convnets.