A common theme in these topics is people observe the status quo design decisions, such as linear layers connected by relu, and then try and backwards rationalize it with relatively hand-wavy mathematical justification. Citing things such as the universal approximation theorem which are not particularly relevant.

The reality is that this field is heavily driven by empirical results, and I would be highly skeptical of anyone saying that "xyz is the clear best way to do it".

jackfakert1_iqzixrs wroteReply to

[D] Why restrict to using a linear function to represent neurons?byMLNooberA common theme in these topics is people observe the status quo design decisions, such as linear layers connected by relu, and then try and backwards rationalize it with relatively hand-wavy mathematical justification. Citing things such as the universal approximation theorem which are not particularly relevant.

The reality is that this field is heavily driven by empirical results, and I would be highly skeptical of anyone saying that "xyz is the clear best way to do it".