Viewing a single comment thread. View all comments

canbooo t1_iy3pylo wrote

Bad initialization can be a problem if you do it yourself (i.e. bad scaling of weights) and if you are not using batch or other kinds of normalizations, since it might make your neurons die. E.g. a tanh neuron with too large input scale will only predict -1 or 1 for all data, which leads it to being dead, i.e. not learning anything due to 0 grad for the entire data set.

6