dumbmachines
dumbmachines t1_j0ztjdq wrote
Reply to [D] Techniques to optimize a model when the loss over the training dataset has a Power Law type curve. by Dartagnjan
Have you tried something like this?
You're not able to overfit on the hard examples alone? Why not?
dumbmachines t1_j0zsn0r wrote
Reply to comment by vprokopev in [D] Why are we stuck with Python for something that require so much speed and parallelism (neural networks)? by vprokopev
The alternative is writing your own cuda code or C++. Fortunately for you pytorch is pretty easily extendable. If you have something that needs to be done quickly, why not write a cpp extension?
dumbmachines t1_j0zlr39 wrote
Reply to [D] Why are we stuck with Python for something that require so much speed and parallelism (neural networks)? by vprokopev
If you're using pytorch, what's stopping you from using the C++ api? Seems like it is exactly what you are asking for.
dumbmachines t1_iqx2g66 wrote
Reply to comment by MLNoober in [D] Why restrict to using a linear function to represent neurons? by MLNoober
>So if we ignore the implementation details to accomplish this for large networks, are there any inherent advantages to using higher-order neurons?
I don't know what that might be, but there is an inherent advantage in stacking layers of act(WX+b) where act is some non-linear function. Instead of guessing what higher level function you should use for each neuron, you can learn the higher order function by stacking many simpler non-linear functions. That way the solution is general and can work over many different datasets and modalities.
dumbmachines t1_j133fcs wrote
Reply to comment by Dartagnjan in [D] Techniques to optimize a model when the loss over the training dataset has a Power Law type curve. by Dartagnjan
If focal loss is interesting, check out polyloss, which is a generalization of the focal loss idea.