ResponsibilityNo7189 t1_j2sgk3g wrote on January 3, 2023 at 5:18 PM

Reply to [D] Classification problem with too many features, too few samples by qazokkozaq

Decistion trees would help in this precise case, by selecting the right features.

Random forests to improve results.

ResponsibilityNo7189 t1_j1ulsd1 wrote on December 27, 2022 at 2:56 PM

Reply to [D] Has any research been done to counteract the fact that each training datapoint "pulls the model in a different direction", partly undoing learning until shared features emerge? by derpderp3200

That is why you have hundreds of millions of parameters in a network. There is so many ways for the weights to move that it's not a zero-sum game: some direction will not be so detrimental to other examples. It's precisely for this reason that self-supervised methods tend to work best on very deep networks. see "Scaling Vision Transformers".

ResponsibilityNo7189 t1_j02t093 wrote on December 13, 2022 at 5:56 PM

Reply to comment by alkaway in [P] Are probabilities from multi-label image classification networks calibrated? by alkaway

does note change the order. It will make the prediction less "stark", i.e. instead of .99 and 0.0001 0.002 0.007, you will get something like 0.75, 0.02, 0.04, 0.19 for instance. It is the easiest thing to do, but remember there isn't any "go-to" technique.

ResponsibilityNo7189 t1_j02dzwf wrote on December 13, 2022 at 4:22 PM

Reply to [P] Are probabilities from multi-label image classification networks calibrated? by alkaway

It's an open problem to get your network probabilities to be calibrated. First you might want to read aleatoric vs. epistemic uncertainty. https://towardsdatascience.com/aleatoric-and-epistemic-uncertainty-in-deep-learning-77e5c51f9423

MonteCarlo sampling and training have been used to get a sense of uncertainty.

Also changing the Softmax temperature to get less confident outputs might "help".

ResponsibilityNo7189 t1_iy6ksf8 wrote on November 29, 2022 at 3:01 AM

Reply to Is coding from scratch a requirement to be able to do research? [D] by [deleted]

It might be good to code one thing completely from scratch. Why? because it might help you improve other's code, and give you the resilisence and skill to open up other's code and tinker with it. I have seen too many students only wanting to download code from github, and i feel that it is severely hampering their creativity, and thus their research impact. At one point you will have to produce some genuine code, and the coding from scratch will be helpful then.

ResponsibilityNo7189 t1_ivtxak2 wrote on November 10, 2022 at 4:12 PM

Reply to [Discussion] Can we train with multiple sources of data, some very reliable, others less so? by DreamyPen

You can maybe use the less reliable data in pretraining, then only use your trusted data for finetuning.

ResponsibilityNo7189 t1_iu3ng2v wrote on October 28, 2022 at 9:21 AM

Reply to [D] [R] Large-scale clustering by jesusfbes

https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.mllib.clustering.PowerIterationClustering.html