Viewing a single comment thread. View all comments

FastestLearner t1_j6exhli wrote

Corrections:

  1. The weights are set to random only at the beginning (i.e. before iter=0). Every iteration onwards, the optimization algorithm (some form of gradient descent) kicks in and nudges the weights slightly in a way to make the whole network perform incrementally better at the task it’s being trained for. After hundreds of thousands of iterations, it is hoped that the weights reach an optimal state, where more nudging does not optimize the weights any further (and by extension it does not make the neural network learn any better). This is called convergence.

  2. Coming to your example of path finding, first of all this is a reinforcement learning (RL) problem. RL is different from DL. DL or deep learning is a subset of machine learning algorithms which is mostly concerned with the training of deep neural networks (hence the name). RL is a particular method of training ‘any’ learning algorithm (doesn’t always have to be neural networks) using what are called reward functions. Think of it like training a dog (an agent) to perform tricks (a task) using biscuits (as rewards). Every time your dog does what you ask him to do and then you follow up by giving him a biscuit, you basically ‘reinforce’ his behavior, so he will do more of it when you ask him to do it again.

  3. Now, the example of the path finding agent that you gave is silly. No RL agent is trained on one single scenario. If you do train an RL agent on just a single scenario, you get a condition called overfitting, meaning that your agent learns perfectly well on how to navigate that one scenario but it doesn’t generalize to any other unseen scenarios. In practice, we train an RL agent on hundreds of thousands of different scenarios, with each scenario being slightly different from the rest. Many of these scenarios can have different conditions like different lighting, differently structured environment, different geometries and different obstacles, etc. etc. What we hope to achieve is that after training, the RL agent learns a generalized navigation function that is adaptive to any scenario.

I suggest you watch some TwoMinutePapers videos on YT, of some OpenAI’s RL papers. There are some videos in which RL agents learn to fight in a boxing match, and in another one, several agents collaborate to play hide and seek. You’d get a feel for how RL works.

3