This guy never misses, best tutorials in the game.
@ashu-2 жыл бұрын
I have been using stable baselines 2 for last year or so for my work and it's super convenient, the docs are great, great examples for custom env etc. It's a great library.
@AakashKumar-gt9ip2 жыл бұрын
By the way, when you were comparing the models you were still using env.step(env.action_space.sample()) Which is why they were almost the same and didn’t look like they were learning
@AakashKumar-gt9ip2 жыл бұрын
For anyone wondering how to get the predicted action, the text-based tutorial has the correct code but it is: action, _states = model.predict(obs)
@SimonEliasen1232 жыл бұрын
@@AakashKumar-gt9ip Hahaha, this is hilarious, but also so close to the reality of developing with reinforcement learning ;-)
@fus3n2 жыл бұрын
dude yea I was like whattt why isn't he using predicted actions
@mikehoops2 жыл бұрын
Yeah, was puzzled as I was watching the video how come he didn’t correct it
@jindy942 жыл бұрын
I was wondering the same thing!! Thank you for clarifying :)
@Mutual_Information2 жыл бұрын
This is very useful. I'm working on an RL video series myself (the theory side, so no overlap here) and I was just looking for prebuilt RL algos. Stable baseline's 3 is by far the most complete/well tested suite I've come across. This really makes a big differences - thanks! Also, it's nice to see super technical coverage like can yield a 1M+ followers. Awesome.
@arminneashrafi2846 Жыл бұрын
Hi, I love your work! Keep up the amazing videos. Love from Iran.
@vernonvkayhypetuttz2 жыл бұрын
SentDex youre a legend, brother. The thought of implementing these using deep learning libraries alone, instant grey hair! Thank you
@OhItsAnthony2 жыл бұрын
If you're following along using a Conda environment and the Lunar Lander environment gives you an error (namely "module 'gym.envs.box2d' has no attribute 'LunarLander'") then I found that you need to also install two other packages; swig and box2d-py: conda install -c conda-forge swig box2d-py
@djbroake98102 жыл бұрын
conda install swig then pip install gym[box2d] worked for me.
@hendrixkid23622 жыл бұрын
Your videos always inspire me to continue working on my own projects!!!
@thetiesenvy48592 жыл бұрын
Even without watching it . Thanks for your good work and content sentdex
@pfrivik2 жыл бұрын
LETS GOOOOOO THIS IS EXACTLY WHAT I WANTED THANK YOU SO MUCH
@amogh32752 жыл бұрын
Honestly loving this series, i hope you make a indepth tutorial series on this. Thanks
@alishbakhan10842 жыл бұрын
I had so much fun learning with you.... can't wait to follow you again after completing my web project
@tytobieola27662 жыл бұрын
Happy New Year SEndex, was learning machine learning during the lockdown & I had no idea in the Field . U teach so well
@Shaunmcdonogh-shaunsurfing2 жыл бұрын
Awesome. Can’t wait for the next one
@enriquesnetwork2 жыл бұрын
wow, great video! really can't wait for the rest to come out and learn more. Thanks for all the info you provide us!
@Djellowman2 жыл бұрын
Looking forward to the next one!
@0OTheIDaveO02 жыл бұрын
I think you were still getting random results because you still had the .sample method call in the rendered tests for A2C and PPO. They learned, but you did not use the trained model for testing.
@amaressa192410 ай бұрын
I was just going to point out the same!!
@arthurflores45852 жыл бұрын
Thank you, these video tutorial will be of big help to my thesis. I going to support you. I have many doubts I hope this can resolved them.
@VaibhavSingh-lf6ps2 жыл бұрын
Thanks for introducing the Stable Baseline 3, and yeah sometime we forget to use model!
@KennTollens2 жыл бұрын
Thank you for this tutorial. I am just getting into AI. It is over my head immediately, but your overview of the parts such as observation and agent were helpful for the bigger picture.
@ahmarhussain87202 жыл бұрын
awesome video, learned a lot, keep up the good work
@AIdreamer_AIdreamer8 ай бұрын
Can you please talk about how we use the RL to model and optimize satellite networks and HAP( high altitude platforms)?? How we control the direction and angle of a projector embedded into HAP or UAV so that it directs its light beams towards an special area of interest on the Earth??
@markd9642 жыл бұрын
Great series as always...needs the next step, developing asynchronous (multiprocessing) models, eg: PPO into Asynchronous-PPO (APPO) on custom environments...Thx
@ebrahimpichka2 жыл бұрын
looking forward for the next episodes. BTW, at the end you were still using random actions after training the model.
@DaZMan7722 жыл бұрын
This is really interesting and new to me! You mentioned going over creating custom environments in future videos which sounds like exactly what I am eager to know next so I’m really looking forward to that video! Is there anything I should educate myself on in the meantime?
@DasJonski2 жыл бұрын
Am I the only one trying to clean the screen from dust looking like a fool at the term explanations? Anyways, great video Harrison, really enjoy your videos!
@bluedade2100 Жыл бұрын
What does the variable episodes represent here?
@coolkaran12342 жыл бұрын
by when do you think you are going to have the whole series out? it might be very helpful for my research and masters
@sentdex2 жыл бұрын
next 3 parts will come pretty quick, just need to review them and release pretty much, probably ~ close to daily
@martinsosmucnieks85152 жыл бұрын
@@sentdex Whaaaat? That is so cool! I wanted to get into stable baselines earlier but had a hard time and didnt know what to try and do. Loving this series!!!🥳 Thank you very much for making them!
@coolkaran12342 жыл бұрын
@@sentdex thats awesome man, you are awesome, my research group focusses on using DeepRL to control Drones and underwater vehicles and we use stable baselines for that, since i am new to the group, I need to catch up, this will be incredibly helpful!! Thanks!!
@randywelt82102 жыл бұрын
10:40 I dont get the reward calculation. Also what is a step just the next frame?
@EctoMorpheus2 жыл бұрын
A step is indeed one frame. The reward is defined by the environment, and in the case of LunarLander it's some function of the fuel spent and the distance to the landing area. You typically get a reward every frame, and then maybe a large (negative) one once the episode ends.
@randywelt82102 жыл бұрын
@@EctoMorpheus so why not using an accelerator+gyro reward. the fuel reward does not make much sense to me. anyway thxs for clarification.
@luisbarba953211 ай бұрын
can SB3 be extended to pettingzoo and used for MARL?
@yashwanth95492 жыл бұрын
Please add more videos about reinforcement learning
@michpo1445 Жыл бұрын
"Your environment must inherit from the gymnasium.Env class cf." can you address this error?
@davidcristobal71522 жыл бұрын
Don't you have to define a neural model? I mean, what if you have an image as an input? Does Stable Baselines automagically asumes the neural network to pass through de values of the observations?
@pfrivik2 жыл бұрын
How often will these videos be released?? Im so excited to start watching and keep watching tzhe series!!
@sentdex2 жыл бұрын
close to daily if not daily for 4 parts. Havent written a p5 yet so no idea there, but should be everything up to custom envs pretty quick.
@walterwang5996 Жыл бұрын
I have a small question: why A2C only uses one "MlpPolicy" in Stable_baselines3? Actually, it has two networks, am I right? Thanks.
@andreamaiellaro6581 Жыл бұрын
I followed all the instructions but when I try to run the notebook I get error on the step function; It advise me: raise NotImplememtedError.....>.
@furkank56142 жыл бұрын
It seems garage has finally turned into a studio =)
@oguzhanoguz88902 жыл бұрын
Little heads up for the next video if you can explore it : the saving and loading of a sb3 baseline model depends on the " deterministic " flag.. Sometimes when used the eval procedure given in sb3, even if the u saved the model in deterministic manner you get unstable results. Can you explore that too ? Thank g8 video
@ddos872 жыл бұрын
youre such a beauty man
@noorwertheim25152 жыл бұрын
Could this algorithm also be used for multi-agent multi-objective environments?
@rgel37622 жыл бұрын
Have you considered unity+mlagents? Why not to go that way?
@ahmedyamany50652 жыл бұрын
Thanks in advance, my issue with stable Baseline3 the installation, I got many errors last month whether installing the package on Windows or Ubuntu.
@ReOp14 Жыл бұрын
Im at the start of the tutorial after adding the env.render().. why is it that its not rendering anything when I run the code? I'm running python=3.9 on a windows machine w/ conda
@ReOp14 Жыл бұрын
Alright I found a fix by restarting my pc and downgrading to gym==0.25.0
@connorvaughan3562 жыл бұрын
Very excited for this series. I'm following along and when the lunar lander game displays, it plays incredibly quickly. Probably 4-5 times faster than in the video. Does anyone know how to adjust the speed at which the game plays?
@nikhilvarmadandu15892 жыл бұрын
did you find any method to do so?
@EnglishRain2 жыл бұрын
What does one use this for IRL?
@shreeshaaithal-2 жыл бұрын
Then can u say how can I make gym to play valorant game 😅 can we do this with gym or can it play call of duty: cold war
@poomchantarapornrat56852 жыл бұрын
What operating system do you use to run these on?
@criscanto70402 жыл бұрын
Awesome
@sanjaydokula51402 жыл бұрын
I see that yours is using cuda device, how do i make mine use cuda device instead of cpu?
@bluedade2100 Жыл бұрын
Guys anyone having problem with installing/running stabe baseline in MacBook? I can't run on either MacBook or linux
@Veptis2 жыл бұрын
I have watched a bunch of videos about what reinforcement learning can do. But I gave up on the Steve Brunton series. Perhaps I watch this series instead and understand how learning is done everything I did so far has been just gradient based learning. And I don't know if reinforcement learning applies to language. Maybe in a conversational setting. I have a game from my childhood: Mirror's Edge mobile edition. Which you can't no longer buy as EA removed it from the store instead of updating it. As it essentially just has 6 discrete inputs I could see how it can be learned. But the levels are limited, so it might overfit easily. And rewards can't just be time, as that requires success in the first place.
@PerfectNight1232 жыл бұрын
Does anybody know how to train the model using GPU? I tried changing the model parameter to device='cude', but it's still using cpu device when learning.
@adomet2123 Жыл бұрын
Did you find the way ?
@wwooo62023jk2 жыл бұрын
So expect your class!
@user-yw5jc1fi2l2 жыл бұрын
You still used the random sample for testing.
@sharmakartikeya2 жыл бұрын
Hi Harrison sir, I live in India and conversion from USD to INR is quite expensive. Is there any way to get a discount?
@sentdex2 жыл бұрын
Send me an email to harrison@pythonprogramming.net
@narayanbandodker54822 жыл бұрын
At first I thought it is a part 3 of a series and missed something. Then I read the description to find out that's the package's name. Big dumb moment
@AIdreamer_AIdreamer8 ай бұрын
Can you please talk about how we use the RL to model and optimize satellite networks and HAP( high altitude platforms)??
@pythonocean78792 жыл бұрын
❤️ for ❤️
@vaibhavkumar6422 жыл бұрын
💥💥💥
@rverm1000 Жыл бұрын
coding along it doesnt work. at least not in google colab
@whoisabishag34332 жыл бұрын
Timestamps [ 00:01:22 ] ... : just pip install
@migarsormrapophis27552 жыл бұрын
KZbin: 2 Comments Meanwhile I count five
@sentdex2 жыл бұрын
math and programming hard.
@Stinosko2 жыл бұрын
Hello 👋👋👋
@monlewi19762 жыл бұрын
wow
@piyushjaininventor2 жыл бұрын
You are still taking random actions.
@karthikbharadhwaj94882 жыл бұрын
Hey Sentdex, Actually in env.step() method you have passed the env.action_space.sample() instead of model.predict() !!!!! @sentdex