MUSEy69
MUSEy69 t1_iz4gkn2 wrote
Reply to comment by Visual-Arm-7375 in [D] Model comparison (train/test vs cross-validation) by Visual-Arm-7375
Thank you for your question, it generated different points of view, from which I learned a lot.
MUSEy69 t1_iyzzbxr wrote
Hi, you should always have an independent test split, and do whatever you want with the other, e.g. Crossvalidation visual sklearn reference
Why are you losing lots of datapoint in the test split? the idea is that distributions match so you can use the p-value criteria for this.
If you want to test lots of models try, optuna for finding the best hparams. No problem using the same metric, that's the one you care at the end.
Depending on your domain I would ignore step 5, because you can test disfribution shifts, and even new models in time and be able to compare them.
MUSEy69 t1_ixd12bw wrote
Great work, why don't you try stable diffusion? I think the topic has enough momemtum to boost your channel.
MUSEy69 t1_iu10tpw wrote
Reply to [D] [R] Large-scale clustering by jesusfbes
MUSEy69 t1_j43g8x4 wrote
Reply to comment by PassionatePossum in [R] Git is for Data (CIDR 2023) - Extending Git to Support Large-Scale Data by rajatarya
not in the paper, but I found a table on their site: https://xetdata.com/why-xethub/