Viewing a single comment thread. View all comments

seba07 t1_iqltogp wrote

I think the real answer for many ML problems is "because it works". Why are we using relu (=max(x,0)) instead of sigmoid or tanh as layer activations nowdays? Math would discourage this as the derivative at 0 is not defined, but it's fast an it works.

20

jesuslop t1_iqmbdpg wrote

Genuine interest, How frequently would you say new projects/libraries use ReLU activations nowadays? (as oposed to other activations)

EDIT: reformultated

3

cthorrez OP t1_iqmvcze wrote

Exactly, lots of people use gelu now. (A more expensive version which utilizes a Gaussian distribution...)

5