Own-Archer7158 t1_iy3mec9 wrote on November 28, 2022 at 2:31 PM

Reply to comment by Own-Archer7158 in Can someone pls help me with this question and explain to me why you chose your answers? Many thanks! by CustardSignificant24

Note that the minimal loss is reached when the parameters make neural network predictions the closest to the real labels

Before that, the gradient is non zero generally (except for an very very unlucky local minimum)

You could see the case of the linear regression with least square error as loss to understand better the underlying optimization problem (in one dimension, it is a square function to minimize, so no local minimum)