Submitted by Severe-Improvement32 t3_10ohqyw in deeplearning

so, I have been learning what DL is and how NN learns to do stuff. From what I understand is the repeated iteration will take random weights and at some point those weights will be kinda perfect for the given task (plz correct me if i'm wrong)

Ok, so lets take an example of a task like path finding AI, so we make a NN and train it to go from point A to point B, now it is trained and doing nice and goes to point b perfectly, SO here the weights are set to go from point A to point B right?

What if we give the point B somewhere else, How will the AI get perfect weights as the current weights are only perfect for current point B

What if we put an obstacle in between point A and B, how will the NN set weights, or is it something like a range of weights which are perfect for any given task for NN

​

IDK if I explained it right, plz comment if you have question about my question, and answer also💕

2

Comments

You must log in or register to comment.

FastestLearner t1_j6exhli wrote

Corrections:

  1. The weights are set to random only at the beginning (i.e. before iter=0). Every iteration onwards, the optimization algorithm (some form of gradient descent) kicks in and nudges the weights slightly in a way to make the whole network perform incrementally better at the task it’s being trained for. After hundreds of thousands of iterations, it is hoped that the weights reach an optimal state, where more nudging does not optimize the weights any further (and by extension it does not make the neural network learn any better). This is called convergence.

  2. Coming to your example of path finding, first of all this is a reinforcement learning (RL) problem. RL is different from DL. DL or deep learning is a subset of machine learning algorithms which is mostly concerned with the training of deep neural networks (hence the name). RL is a particular method of training ‘any’ learning algorithm (doesn’t always have to be neural networks) using what are called reward functions. Think of it like training a dog (an agent) to perform tricks (a task) using biscuits (as rewards). Every time your dog does what you ask him to do and then you follow up by giving him a biscuit, you basically ‘reinforce’ his behavior, so he will do more of it when you ask him to do it again.

  3. Now, the example of the path finding agent that you gave is silly. No RL agent is trained on one single scenario. If you do train an RL agent on just a single scenario, you get a condition called overfitting, meaning that your agent learns perfectly well on how to navigate that one scenario but it doesn’t generalize to any other unseen scenarios. In practice, we train an RL agent on hundreds of thousands of different scenarios, with each scenario being slightly different from the rest. Many of these scenarios can have different conditions like different lighting, differently structured environment, different geometries and different obstacles, etc. etc. What we hope to achieve is that after training, the RL agent learns a generalized navigation function that is adaptive to any scenario.

I suggest you watch some TwoMinutePapers videos on YT, of some OpenAI’s RL papers. There are some videos in which RL agents learn to fight in a boxing match, and in another one, several agents collaborate to play hide and seek. You’d get a feel for how RL works.

3

suflaj t1_j6eqh0b wrote

It depends. If it only learned A to B we say it is overfit. If you give it enough different A to Bs, it might learn to generalize, and then for any A to B pair it will be able to find the path.

If it learned on paths without obstacles, it will not be able to deal with obstacles. Which means that it will go right through them, or run into them, if your environment does not alloe an agent to go through them.

2

Severe-Improvement32 OP t1_j6ev9gf wrote

Got your point, and have another question: So let continue with path finding example, if we do not give the enough AB pairs the as you said it will fail, But then what about unsupervised learning as there won't be any data given right?

1

suflaj t1_j6evpg4 wrote

Well you will presumably not be labeling this with humans but probably Astar, so it's all unsupervised learning anyways

2

Autogazer t1_j6h0bi5 wrote

That’s not how unsupervised training works. All training requires data, unsupervised just means that the data isn’t labeled.

1

the_Wallie t1_j6espo7 wrote

"From what I understand is the repeated iteration will take random weights and at some point those weights will be kinda perfect for the given task (plz correct me if i'm wrong)"

You're at least somewhat wrong - it's not all random. The weights are indeed initialized randomly, but then adjusted to fit batches of training data. The weights are updated to more closely match the data. This is usually done through stochastic gradient descent and leverages the difference between your network's current predictions and the known ground truth as calculated using the chosen loss function (e.g. the mean square error or binary cross-entropy).

1