pm_me_your_ensembles t1_j01xzcw wrote

The two are not comparable. In a multi-class single-label problem, you do K distinct projections, one for each class, but then they are combined via softmax to give you something that resembles probabilities. Since no such function is applied, it's not possible to compare the two as they don't influence each other in any way.

However, you shouldn't treat whatever a NN outputs as a probability even if it's within [0,1] as NNs are known to be overconfident.


pm_me_your_ensembles t1_irt5y8h wrote

Phil Wang/lucidrains has phenomenal implementations of stuff, I'd recommend checking them out and reading their code.

Furthermore, I'd recommend simply reading more code and tackling complex problems, e.g. try building a DL framework from "scratch" ontop of jax. Read the Haiku codebase, and compare it to say Equinox (I am a big fan of this one). Go through the huggingface code bases, e.g. transformers. Choose a model and build it from scratch and make it compatible with their API.