Viewing a single comment thread. View all comments

Pleasant-Resident-53 t1_j26cuzg wrote

I'm learning about simple gradient descent in regression models. And in my example 3-D graph, its shows the algorithm starting from one point and then eventually reaching a local minima. (lowest cost function). But isn't the whole point to reach a value of w and b which produces a J(w, b) of 0. On the graph the local minima describes a point of w and b which has a negative j(w,b) , but isn't this just counter intuitive to the whole point of the algorithm. Is having a negative J(w,b) good? or have I just misunderstood this.

1