Every lunar lander tutorial or example I've found so far uses deep RL. Is classical Q learning such an obviously bad idea that no-one bothers with it? I've had some success recently applying Q learning to lunar lander (converting the continuous observations into discrete values) and am surprised there aren't more tutorials about this approach. Am I missing something?

Comments

You must log in or register to comment.

[deleted] t1_j1ahe6n wrote on December 22, 2022 at 9:28 PM

#1,013,583

[deleted]

blimpyway t1_j1aqaza wrote on December 22, 2022 at 10:30 PM

#1,014,322

I don't know, what were your results?

deepestdescent t1_j1b720n wrote on December 23, 2022 at 12:33 AM

#1,015,593

Out of interest, how many discrete states do you end up with? Surely this blows out as you explore the state space?

fnbr t1_j1bc8i3 wrote on December 23, 2022 at 1:13 AM

#1,016,073

The main problem with tabular Q-learning (I'm assuming that by classical, you mean tabular) is that for most environments that are interesting, the state space is massive, so we can't actually store all states in memory.

In particular for lunar lander, you have a continuous observation space, so you need to apply some sort of discretization; at that point, you might as well just use tile coding or some sort of other function approximator.

abhisheknaik96 t1_j1blgcx wrote on December 23, 2022 at 2:25 AM

#1,016,914

I am sure the classic (linear) Q-learning algorithm can solve Lunar Lander when the state space is discretized using tile coding.

Are you saying you solved it using tabular Q-learning, that is, by learning about each discretized state independently? I would be curious to know what kind of discretization you used and how many training steps were required.

leocus4 t1_j1cigff wrote on December 23, 2022 at 7:40 AM

#1,019,682

In a paper, l used a decision tree with Q learning to solve LunarLander. While it's not exactly what you asked, you can see a DT as a way to discretize the Q table, so basically that decision tree corresponds to a Q table with 5 discretized states.

If you're interested I can expand more this explanation, just let me know!

verbigratia OP t1_j1cwpi1 wrote on December 23, 2022 at 10:51 AM

#1,020,533

Replying to leocus4 (#1,019,682)

Thanks, will check this out!

verbigratia OP t1_j1cxswc wrote on December 23, 2022 at 11:05 AM

#1,020,638

Thanks all.

I've only just started experimenting but my approach so far has been to discretise the observation into Q table sizes varying between [8, 8, 8, 8, 8, 8, 2, 2, action_space.n] and [20, 20, 20, 20, 20, 20, 2, 2, action_space.n]. And 5k-10k episodes, learning rate of 0.1 and discount of 0.99.

The results will not win any SpaceX contracts just yet but they do result in soft-ish landings between the flags more often than not.

I found hovering to be a problem so added some handling to exit the episode after around 500 steps.

At this point, I normally start looking at what others have done, and was surprised not to see more examples demonstrating tabular Q learning in this scenario (despite the issues with the continuous observation space).

Will look at deep RL next but found it interesting to try the tabular approach first.

Edit: grammar

[deleted] t1_j1lm0iv wrote on December 25, 2022 at 11:01 AM

#1,052,911

Replying to fnbr (#1,016,073)

[deleted]