Viewing a single comment thread. View all comments

its_ean t1_iqlr342 wrote

hyperbolic tangent is convenient for backpropogation since its derivative is 1-tanh²


cthorrez OP t1_iqlrf1v wrote

I'm not necessarily saying it should be replaced in every layer but I think it would at least make sense to investigate other options for final probability generation. tanh is definitely good for intermediate layer activation.


chatterbox272 t1_iqm72tk wrote

Tanh is not a particularly good intermediate activation function at all. It's too linear around zero and it saturates at both ends.


cthorrez OP t1_iqnk270 wrote

Well it's an even worse final output activation for binary classification because the outputs are -1 to 1 not 0 to 1.

I've never seen it used as anything but an internal activation.