Submitted by nicku_a t3_120h120 in MachineLearning

Hey! We're creating an open-source training framework focused on evolutionary hyperparameter optimization for RL. This offers a speed up of 10x over other HPO methods!

Check it out and please get involved if you would be interested in working on this - any contributions are super valuable.

We believe this can change the way we train our models, and democratise access to RL for people and businesses who don't currently have the resources for it!

GitHub: https://github.com/AgileRL/AgileRL

120

Comments

You must log in or register to comment.

paramkumar1992 t1_jdh8abi wrote

This looks incredible. This is going to save hours of training. Amazing!

20

Puzzleheaded_Acadia1 t1_jdhlpj7 wrote

Can someone pls explain this to me I'm still new to this

8

nicku_a OP t1_jdhra2m wrote

Sure! Traditionally, hyperparameter optimization (HPO) for reinforcement learning (RL) is particularly difficult when compared to other types of machine learning. This is for several reasons, including the relative sample inefficiency of RL and its sensitivity to hyperparameters.
AgileRL is initially focused on improving HPO for RL in order to allow faster development with robust training. Evolutionary algorithms have been shown to allow faster, automatic convergence to optimal hyperparameters than other HPO methods by taking advantage of shared memory between a population of agents acting in identical environments.
At regular intervals, after learning from shared experiences, a population of agents can be evaluated in an environment. Through tournament selection, the best agents are selected to survive until the next generation, and their offspring are mutated to further explore the hyperparameter space. Eventually, the optimal hyperparameters for learning in a given environment can be reached in significantly less steps than are required using other HPO methods.

26

boyetosekuji t1_jdhyeok wrote

ChatGpt: Okay, let me try to explain this using gaming terminology!

Imagine you're playing a game where you have to learn how to do something new, like defeat a tough boss. You have different settings or options (hyperparameters) to choose from, like which weapons or abilities to use, how aggressive or defensive to play, etc.

Now, imagine that this boss is really tough to beat and you don't have many chances to practice. So, you want to find the best combination of options as quickly as possible, without wasting too much time on trial and error. This is where hyperparameter optimization (HPO) comes in.

HPO is like trying out different settings or options until you find the best ones for your playstyle and the boss's behavior. However, in some games (like Dark Souls), it's harder to do this because you don't have many chances to try out different combinations before you die and have to start over. This is similar to reinforcement learning (RL), which is a type of machine learning that learns by trial and error, but it's not very sample efficient.

AgileRL is like having a bunch of other players (agents) who are also trying to defeat the same boss as you. After a while, the best players (agents) are chosen to continue playing, and their "offspring" (new combinations of settings or options) are mutated and tested to see if they work better. This keeps going until the best possible combination of settings or options is found to beat the boss in the fewest possible attempts. Using AgileRL is much faster than other ways of doing HPO for RL, which is like having a lot of other players to help you find the best strategy for defeating the boss.

20

Peantoo t1_jdhuqsq wrote

Love it. I tried to come up with something like this myself but never found the time or extra help I'd need to implement it. Glad to see someone has done all the hard work!

8

nicku_a OP t1_jdhyqr6 wrote

You can help! Please join the discord and get involved, we’d love to have you

8

LifeScientist123 t1_jdiis55 wrote

I'm also new to this so forgive me if this is a dumb question. My understanding was that RL is superior to evolutionary algorithms because in evolutionary algos "mutation" is random, so you evaluate a lot of dud "offspring". In RL algos, eg MCTS, you also do tree search randomly, but you're iteratively picking the best set of actions, without evaluating many dud options. Am I wrong? Somehow mixing RL with evolutionary algorithms seems like a step backwards

2

nicku_a OP t1_jdkdxy8 wrote

Good question! So what we’re doing here is not specifically applying evolutionary algorithms instead of RL. We’re applying evolutionary algorithms as a method of HPO, while still using RL to learn and it’s advantages. Take a look at my other comments explaining how this works, and check out the docs for more information.

1

Riboflavius t1_jdhg0ed wrote

That sounds fantastic, kudos to you! Great effort.

7

nicku_a OP t1_jdhr0vk wrote

Thanks!! Plenty more work to be done. Please share with anyone you think would be interested!

3

bushrod t1_jdjvtbp wrote

As an evolutionary learning guy, I'll say it's crazy this didn't already exist! Thanks for sharing. Is it based on any publications, or are you considering writing one?

5

Modruc t1_jdk903e wrote

Great project! One question though, is there any reason why you are not using existing RL models instead of creating your own, such as stable baselines?

5

nicku_a OP t1_jdkd2ax wrote

Libraries like stable baselines/rl zoo are actually quite inflexible and hard to fit to your own problem. We’re introducing (with plans to add way more!) RL algorithms that you can use, edit and tune to your specific needs faster and more flexibly.

5

nicku_a OP t1_jdkd7qf wrote

We’ve also shown that using these libraries as-they-come is far slower for real problems than what we can offer!

3

jomobro117 t1_jdljb97 wrote

Thanks for sharing! Just a couple of questions. Is the evolutionary algorithm you use similar to PBT or fundamentally different in some way? And is there a plan to implement distributed training and HPO (similar to Ray RLlib with PBT from Tune)?

2

nicku_a OP t1_jdllrfv wrote

Hey! Yes, there are similarities to PBT, but there are a few differences here. Firstly, the mutations implemented with AgileRL are much more dynamic. Rather than only mutating hyperparameters, we’re allowing any part of the algorithm/model to mutate - HPs, network architecture (layers and nodes), activation functions and network weights themselves. We also train the population in one go, and offer efficient learning by sharing experience within the population.

1

nicku_a OP t1_jdlluja wrote

And yes the plan is to offer distributed training! As you can imagine there are about a million things we want/need to add! If you would like to get involved in the project and help out, please do

1

[deleted] t1_jdi71ly wrote

[deleted]

1

nicku_a OP t1_jdi7ks3 wrote

So this is a training framework that can be used to train agents in gym environments (and other custom environments that use the gym-style format).

You can select a gym environment from its name, e.g. 'LunarLander-v2' when creating the environment, and then train it. See the docs for more info.

3

sytelus t1_jdk0y38 wrote

Thank you for this but can you make this easier to use. I think there should be clear APIs so one doesn't have to deal with RL and other complexity. For example, you are given function f and dictionary of arguments with ranges for each. Your algorithm takes this and spits out optimal params within each range.

Is such interface and tutorial available anywhere?

0