Reinforcement Learning with Stable Baselines 3

Reinforcement Learning with Stable Baselines 3 - Introduction (P.1)

Рет қаралды 96,415

Күн бұрын

Пікірлер: 91

@ashu- 3 жыл бұрын

I have been using stable baselines 2 for last year or so for my work and it's super convenient, the docs are great, great examples for custom env etc. It's a great library.

@Mutual_Information 3 жыл бұрын

This is very useful. I'm working on an RL video series myself (the theory side, so no overlap here) and I was just looking for prebuilt RL algos. Stable baseline's 3 is by far the most complete/well tested suite I've come across. This really makes a big differences - thanks! Also, it's nice to see super technical coverage like can yield a 1M+ followers. Awesome.

@arminneashrafi2846 Жыл бұрын

Hi, I love your work! Keep up the amazing videos. Love from Iran.

@AakashKumar-gt9ip 3 жыл бұрын

By the way, when you were comparing the models you were still using env.step(env.action_space.sample()) Which is why they were almost the same and didn’t look like they were learning

@AakashKumar-gt9ip 3 жыл бұрын

For anyone wondering how to get the predicted action, the text-based tutorial has the correct code but it is: action, _states = model.predict(obs)

@SimonEliasen123 3 жыл бұрын

@@AakashKumar-gt9ip Hahaha, this is hilarious, but also so close to the reality of developing with reinforcement learning ;-)

@fus3n 3 жыл бұрын

dude yea I was like whattt why isn't he using predicted actions

@mikehoops 3 жыл бұрын

Yeah, was puzzled as I was watching the video how come he didn’t correct it

@jindy94 3 жыл бұрын

I was wondering the same thing!! Thank you for clarifying :)

@geepytee Ай бұрын

This guy never misses, best tutorials in the game.

@vernonvkayhypetuttz 2 жыл бұрын

SentDex youre a legend, brother. The thought of implementing these using deep learning libraries alone, instant grey hair! Thank you

@hendrixkid2362 3 жыл бұрын

Your videos always inspire me to continue working on my own projects!!!

@OhItsAnthony 2 жыл бұрын

If you're following along using a Conda environment and the Lunar Lander environment gives you an error (namely "module 'gym.envs.box2d' has no attribute 'LunarLander'") then I found that you need to also install two other packages; swig and box2d-py: conda install -c conda-forge swig box2d-py

@djbroake9810 2 жыл бұрын

conda install swig then pip install gym[box2d] worked for me.

@pfrivik 3 жыл бұрын

LETS GOOOOOO THIS IS EXACTLY WHAT I WANTED THANK YOU SO MUCH

@thetiesenvy4859 3 жыл бұрын

Even without watching it . Thanks for your good work and content sentdex

@tytobieola2766 3 жыл бұрын

Happy New Year SEndex, was learning machine learning during the lockdown & I had no idea in the Field . U teach so well

@enriquesnetwork 3 жыл бұрын

wow, great video! really can't wait for the rest to come out and learn more. Thanks for all the info you provide us!

@amogh3275 3 жыл бұрын

Honestly loving this series, i hope you make a indepth tutorial series on this. Thanks

@Shaunmcdonogh-shaunsurfing 3 жыл бұрын

Awesome. Can’t wait for the next one

@alishbakhan1084 3 жыл бұрын

I had so much fun learning with you.... can't wait to follow you again after completing my web project

@0OTheIDaveO0 3 жыл бұрын

I think you were still getting random results because you still had the .sample method call in the rendered tests for A2C and PPO. They learned, but you did not use the trained model for testing.

@amaressa1924 Жыл бұрын

I was just going to point out the same!!

@Djellowman 3 жыл бұрын

Looking forward to the next one!

@VaibhavSingh-lf6ps 3 жыл бұрын

Thanks for introducing the Stable Baseline 3, and yeah sometime we forget to use model!

@ahmarhussain8720 2 жыл бұрын

awesome video, learned a lot, keep up the good work

@arthurflores4585 3 жыл бұрын

Thank you, these video tutorial will be of big help to my thesis. I going to support you. I have many doubts I hope this can resolved them.

@KennTollens 3 жыл бұрын

Thank you for this tutorial. I am just getting into AI. It is over my head immediately, but your overview of the parts such as observation and agent were helpful for the bigger picture.

@bluedade2100 Жыл бұрын

What does the variable episodes represent here?

@AIdreamer_AIdreamer 9 ай бұрын

Can you please talk about how we use the RL to model and optimize satellite networks and HAP( high altitude platforms)?? How we control the direction and angle of a projector embedded into HAP or UAV so that it directs its light beams towards an special area of interest on the Earth??

@DaZMan772 3 жыл бұрын

This is really interesting and new to me! You mentioned going over creating custom environments in future videos which sounds like exactly what I am eager to know next so I’m really looking forward to that video! Is there anything I should educate myself on in the meantime?

@ebrahimpichka 3 жыл бұрын

looking forward for the next episodes. BTW, at the end you were still using random actions after training the model.

@markd964 2 жыл бұрын

Great series as always...needs the next step, developing asynchronous (multiprocessing) models, eg: PPO into Asynchronous-PPO (APPO) on custom environments...Thx

@randywelt8210 3 жыл бұрын

10:40 I dont get the reward calculation. Also what is a step just the next frame?

@EctoMorpheus 3 жыл бұрын

A step is indeed one frame. The reward is defined by the environment, and in the case of LunarLander it's some function of the fuel spent and the distance to the landing area. You typically get a reward every frame, and then maybe a large (negative) one once the episode ends.

@randywelt8210 3 жыл бұрын

@@EctoMorpheus so why not using an accelerator+gyro reward. the fuel reward does not make much sense to me. anyway thxs for clarification.

@DasJonski 3 жыл бұрын

Am I the only one trying to clean the screen from dust looking like a fool at the term explanations? Anyways, great video Harrison, really enjoy your videos!

@luisbarba9532 Жыл бұрын

can SB3 be extended to pettingzoo and used for MARL?

@andreamaiellaro6581 Жыл бұрын

I followed all the instructions but when I try to run the notebook I get error on the step function; It advise me: raise NotImplememtedError.....>.

@ReOp14 Жыл бұрын

Im at the start of the tutorial after adding the env.render().. why is it that its not rendering anything when I run the code? I'm running python=3.9 on a windows machine w/ conda

@ReOp14 Жыл бұрын

Alright I found a fix by restarting my pc and downgrading to gym==0.25.0

@yashwanth9549 3 жыл бұрын

Please add more videos about reinforcement learning

@pfrivik 3 жыл бұрын

How often will these videos be released?? Im so excited to start watching and keep watching tzhe series!!

@sentdex 3 жыл бұрын

close to daily if not daily for 4 parts. Havent written a p5 yet so no idea there, but should be everything up to custom envs pretty quick.

@michpo1445 Жыл бұрын

"Your environment must inherit from the gymnasium.Env class cf." can you address this error?

@walterwang5996 Жыл бұрын

I have a small question: why A2C only uses one "MlpPolicy" in Stable_baselines3? Actually, it has two networks, am I right? Thanks.

@coolkaran1234 3 жыл бұрын

by when do you think you are going to have the whole series out? it might be very helpful for my research and masters

@sentdex 3 жыл бұрын

next 3 parts will come pretty quick, just need to review them and release pretty much, probably ~ close to daily

@martinsosmucnieks8515 3 жыл бұрын

@@sentdex Whaaaat? That is so cool! I wanted to get into stable baselines earlier but had a hard time and didnt know what to try and do. Loving this series!!!🥳 Thank you very much for making them!

@coolkaran1234 3 жыл бұрын

@@sentdex thats awesome man, you are awesome, my research group focusses on using DeepRL to control Drones and underwater vehicles and we use stable baselines for that, since i am new to the group, I need to catch up, this will be incredibly helpful!! Thanks!!

@oguzhanoguz8890 3 жыл бұрын

Little heads up for the next video if you can explore it : the saving and loading of a sb3 baseline model depends on the " deterministic " flag.. Sometimes when used the eval procedure given in sb3, even if the u saved the model in deterministic manner you get unstable results. Can you explore that too ? Thank g8 video

@ahmedyamany5065 3 жыл бұрын

Thanks in advance, my issue with stable Baseline3 the installation, I got many errors last month whether installing the package on Windows or Ubuntu.

@noorwertheim2515 2 жыл бұрын

Could this algorithm also be used for multi-agent multi-objective environments?

@davidcristobal7152 3 жыл бұрын

Don't you have to define a neural model? I mean, what if you have an image as an input? Does Stable Baselines automagically asumes the neural network to pass through de values of the observations?

@furkank5614 3 жыл бұрын

It seems garage has finally turned into a studio =)

@poomchantarapornrat5685 2 жыл бұрын

What operating system do you use to run these on?

@rgel3762 3 жыл бұрын

Have you considered unity+mlagents? Why not to go that way?

@sanjaydokula5140 2 жыл бұрын

I see that yours is using cuda device, how do i make mine use cuda device instead of cpu?

@shreeshaaithal- 3 жыл бұрын

Then can u say how can I make gym to play valorant game 😅 can we do this with gym or can it play call of duty: cold war

@EnglishRain 3 жыл бұрын

What does one use this for IRL?

@connorvaughan356 3 жыл бұрын

Very excited for this series. I'm following along and when the lunar lander game displays, it plays incredibly quickly. Probably 4-5 times faster than in the video. Does anyone know how to adjust the speed at which the game plays?

@nikhilvarmadandu1589 2 жыл бұрын

did you find any method to do so?

@user-yw5jc1fi2l 3 жыл бұрын

You still used the random sample for testing.

@Veptis 2 жыл бұрын

I have watched a bunch of videos about what reinforcement learning can do. But I gave up on the Steve Brunton series. Perhaps I watch this series instead and understand how learning is done everything I did so far has been just gradient based learning. And I don't know if reinforcement learning applies to language. Maybe in a conversational setting. I have a game from my childhood: Mirror's Edge mobile edition. Which you can't no longer buy as EA removed it from the store instead of updating it. As it essentially just has 6 discrete inputs I could see how it can be learned. But the levels are limited, so it might overfit easily. And rewards can't just be time, as that requires success in the first place.

@bluedade2100 Жыл бұрын

Guys anyone having problem with installing/running stabe baseline in MacBook? I can't run on either MacBook or linux

@PerfectNight123 3 жыл бұрын

Does anybody know how to train the model using GPU? I tried changing the model parameter to device='cude', but it's still using cpu device when learning.

@adomet2123 Жыл бұрын

Did you find the way ?

@sharmakartikeya 3 жыл бұрын

Hi Harrison sir, I live in India and conversion from USD to INR is quite expensive. Is there any way to get a discount?

@sentdex 3 жыл бұрын

Send me an email to harrison@pythonprogramming.net

@ddos87 3 жыл бұрын

youre such a beauty man

@wwooo62023jk 3 жыл бұрын

So expect your class!

@narayanbandodker5482 3 жыл бұрын

At first I thought it is a part 3 of a series and missed something. Then I read the description to find out that's the package's name. Big dumb moment