Viewing a single comment thread. View all comments

Best-Neat-9439 t1_j2yn1nx wrote

>There are also AI that can improve themselves more than the human given data. The AlphaGo project started off with human Go matches as training data, and evolved into tabula-rasa training by self play. By the end, the AI beats the best human.

Neither AlphaGo Zero or AlphaZero were trained with supervised learning. They were both trained with reinforcement learning (and MCTS, so it's not purely RL, but it's more like RL + planning). It's then not surprising that it can beat humans - its "ground truth" doesn't come from humans anyway.

16

horselover_f4t t1_j31e8tm wrote

>The system's neural networks were initially bootstrapped from human
gameplay expertise. AlphaGo was initially trained to mimic human play by
attempting to match the moves of expert players from recorded
historical games, using a database of around 30 million moves.[21]
Once it had reached a certain degree of proficiency, it was trained
further by being set to play large numbers of games against other
instances of itself, using reinforcement learning to improve its play.

https://en.wikipedia.org/wiki/AlphaGo#Algorithm

1

MustachedSpud t1_j31glqv wrote

The zero in alpha zero means it starts with no human knowledge. They figures out that this approach is eventually stronger than the base alpha go strategy.

2

horselover_f4t t1_j32ebo0 wrote

But the person you responded to didn't talk about the zero variant. Maybe I misread the point of your post?

1

MustachedSpud t1_j34qzvu wrote

The person two comments up was talking about the zero version. Thread is about how AI can surpass humans and the point is they already can if they have a way to improve without human data

1

horselover_f4t t1_j355j8z wrote

Still does not make sense to me as the person before was specifically talking about vanilla. But no point in arguing about any of that i guess.

1

MustachedSpud t1_j35gn03 wrote

Are you trolling me or something? YOU are the person I responded to. YOU brought up the vanilla version, in a response to someone else who was talking about the zero version. The zero version is most relevant here because it learns from scratch, without human knowledge.

1

horselover_f4t t1_j368cx1 wrote

>There are also AI that can improve themselves more than the human given data. The AlphaGo project started off with human Go matches as training data, and evolved into tabula-rasa training by self play. By the end, the AI beats the best human.

https://www.reddit.com/r/MachineLearning/comments/103694n/comment/j2ycihi/?utm_source=share&utm_medium=web2x&context=3

​

>YOU brought up the vanilla version, in a response to someone else who was talking about the zero version.

... who responded to someone who talked about the vanilla version. In my first response to you, I did not realize you were not actually the person I responded to in the first place. Apparently you have not read what they responded to, which seems to be the reason you're missing the context.

I assume they must be laughing if they see us still talking about this.

1