A common theme in these topics is people observe the status quo design decisions, such as linear layers connected by relu, and then try and backwards rationalize it with relatively hand-wavy mathematical justification. Citing things such as the universal approximation theorem which are not particularly relevant.
The reality is that this field is heavily driven by empirical results, and I would be highly skeptical of anyone saying that "xyz is the clear best way to do it".
jackfaker t1_iqzixrs wrote
Reply to [D] Why restrict to using a linear function to represent neurons? by MLNoober
A common theme in these topics is people observe the status quo design decisions, such as linear layers connected by relu, and then try and backwards rationalize it with relatively hand-wavy mathematical justification. Citing things such as the universal approximation theorem which are not particularly relevant.
The reality is that this field is heavily driven by empirical results, and I would be highly skeptical of anyone saying that "xyz is the clear best way to do it".