Viewing a single comment thread. View all comments

BeatLeJuce t1_j6mlxjc wrote

your question is answered in the abstract itself ("using only pixels and game points as input"), and repeated multiple times in the text ("In our formulation, the agent’s policy π uses the same interface available to human players. It receives raw RGB pixel input x_t from the agent’s first-person perspective at timestep t, produces control actions a_t ∼ π simulating a gamepad, and receives game points ρt attained"). Did you even attempt to read the paper? The concrete architecture showing the CNN is also in Figure S10.

3

pfm11231 t1_j6n3emy wrote

right my confusion is how it views the rgb pixel input, would you summarize it as it's looking at a screen, a whole image like a human player would, like the little ai is in it's own vr head set. or is it more just looking at numbers and finding a pattern

−1

cruddybanana1102 t1_j6n46op wrote

I don't really unserstand the question What do you mean "looking at a screen"? Or "looking at numbers and finding a pattern"?

The model takes in multidimensional array as input. That array is all the rgb values at a given instant. Take that to mean whatever suits you.

3

BeatLeJuce t1_j6n6x9b wrote

It looks at the screen. Your question indicate you're not well versed in AI. I'd advise you to read up more on fundamental deep learning techniques if you don't know what a CNN does.

1

bacon_boat t1_j6n82xv wrote

Am I looking at your comment right now, or is it just some number of voltages over the neurons of my visual cortex?

1