Hello again, Stable-Baselines (SB3) maintainer here ;) Nice video showing the variability of results in RL. For what you are doing, you should take a look at SB3 callbacks, especially CheckpointCallback and EvalCallback to save the best model automatically. And to have a more fair comparison, you can always check the tuned hyperparameters from the RL Zoo.
@ILoveMattBellamy2 жыл бұрын
Hi Antonin, any chance you can share an example of exporting the SB model into C++. I found the example of the PPO in cpp but I am currently mostly working with a DQN structure. Thanks!
@robmarks68002 жыл бұрын
Hello Antonin, when I'm running SB3's DQN/A2C algs I see very little GPU utilization, using nvidia-smi I get around 15%. I have seen many others have the same problem but all have gone unanswered. Is this some problem on our end, or is it something inherent in SB3/pytorch or is it the incremental online aspect of rl the problem? I just got access to a tesla v100 so I'm kinda sad that i get literally ZERO second speedup using it. I don't really have any experience in profiling my application but maybe that's something I have to learn. What do you recon?
@ApexArtistX Жыл бұрын
Can u do tutorial on the checkpointcallback thing
@bluebox6307 Жыл бұрын
For some reason, this is not only helpful but actually entertaining. Usually, I barely comment, but this is some good stuff :)
@HT792 жыл бұрын
Side note: To create those folders, instead of an if clause, you can simply use makedirs with exist_ok = True
@renkeludwig74052 жыл бұрын
Only P.2 and I already love this series so much! Thanks for all your great content man!
@fuba442 жыл бұрын
This is cool, looking forward to this series.
@satyamedh10 ай бұрын
sb3 makes it so easy that a video about saving the model is longer than the inital intro and first steps
@PerfectNight1232 жыл бұрын
Quick question, how are you training on CUDA device? I have a GPU installed but im training on cpu device?
@anotherKyle5 ай бұрын
i would really like to know this!
@harrivayrynen2 жыл бұрын
I like this series a lot. I have tried to learn To use ml-agents with Unity, but getting started is quite hard. Yes, there is official examples in Mlagents repo. But to get something new to work is hard to me. Hopefully there will be new book for this scene. There are some, but those are quite old.
@jyothishmohan56132 жыл бұрын
Wow...one video per day ...that's super cool 😎 👌 👍
@giacomocarfi50152 жыл бұрын
great video sentdex!! will you also talk about multi agent reinforcement learning?
@yawar582 жыл бұрын
Can't wait for the custom environment video. If I may suggest, please do a trading environment as an example. Thanks.
@sentdex2 жыл бұрын
your wait is now already over ;) kzbin.info/www/bejne/q3zRm3qkbct5bZI
@carlanjackson14902 жыл бұрын
Say you partially train a model for say 50000 steps. Is it possible to once its finished you wish to reload that same trained model and continue training it for an additional say 20000 steps. I have a partially trained DQN but its not performing as well as it should and would like to continue the training but I am not sure if it is possible or will I just have to train an entirely new model
@ytsks2 жыл бұрын
What you said about the jumping and rolling robot dog got me thinking - do you have a way to force minimum exertion policy? This is a guiding principle in most living organisms, only use the minimal effort (energy) that will produce the desired result. It purpose While you likely don't care about the energy conservation in that way, it could filter out the behavior patterns like the one you mentioned.
@anasalrifai22172 жыл бұрын
Thanks for the content. Can you show us how to customize actor and critic network architecture?
@serta57272 жыл бұрын
Stable Baselines 3 is very helpful!
@alessandrocoppelli30566 ай бұрын
hello,i'm trying to use PPO and A2C for my discrete-box environment. i have set negative rewards in order to teach the agent to avoid impossible operation in my environment. most of the training time is spent to learn to avoid those operations with negative rewards. Is there a method to directly "tell" the agent (inside the agent itself) to avoid those operations, instead of spend training time? thanks in advance
@georgebassemfouad2 жыл бұрын
Looking forward for the custom environment.
@sarc0072 жыл бұрын
How to save the most optimized value of the model. I understood that this follow value "ep_rew_mean tag: rollout/ep_rew_mean" should be highest possible and this "value_loss tag: train/value_loss" should be lowest possible , so how to get or save the best model when this happens, any idea?
@Nerdimo2 жыл бұрын
Probably should have commented last video, but I keep on getting this error on Mac OS: 2022-05-23 16:22:46.979 python[10741:91952] Warning: Expected min height of view: () to be less than or equal to 30 but got a height of 32.000000. This error will be logged once per view in violation. I tried to resolve using some stuff on stack overflow but the same thing happens. This error, I believe prevents me for running more than 1 episode. It will run one episode in the environment then crash :/
@walterwang5996 Жыл бұрын
I used the save code similar to this, but it gave me several curves in tensorboard. and the reward curves are not even close, they have big gaps. I am wondering why, and my training cost time, can I train it and save the model after some episodes? Thanks.
@nnpy2 жыл бұрын
Awesome move forward ⚡
@datajake27426 күн бұрын
If model.learn returns a trained model (per the docs), what is the for loop doing?
@scudice2 жыл бұрын
I have an issue after leading a trained agent, the model.predict(obs) outputs always the same action, even though the agent was not doing that at all during learning.
@sudhansubaladas23222 жыл бұрын
Well explained...please make some video on machine translation from scratch like loading huge data , training and testing etc...
@ApexArtistX Жыл бұрын
Can you load and continue training instead of starting from scratch again
@MegaBd239 ай бұрын
When i do this, i don't get the rollout/ep_len_mean or rollout/ep_rew_mean graphs, but instead a bunch of time graphs...
@arjunkrishna57902 жыл бұрын
Great video!
@KaikeCastroCarvalho9 ай бұрын
Hello, Is possible to use DQN model in tensorboard?
@MuazRazaq2 жыл бұрын
Hey thankyou so much for such a good explanation. I wanted to ask how you are using GPU? Because whenever I run the same code it gives "Using CPU Device", I have a NVIDIA GeForce GTX 1650 Card
@rohitchan0072 жыл бұрын
Thank you so much for this
@sarvagyagupta17442 жыл бұрын
Hey, thanks for the video. I was wondering if I can ask some questions around loading a model here or would you prefer somewhere else?
@marcin.sobocinski2 жыл бұрын
🤚Can I have a question: is there going to be ML Agents RL tutorial as well❓It could be a nice sequel to SB3 series 😀
@enriquesnetwork2 жыл бұрын
Thank you!
@konkistadorr2 жыл бұрын
Hey, great vidéos as always :) Shouldn't you use predict_step instead of predict for faster execution ?
@sentdex2 жыл бұрын
Possibly, first im hearing about it, but I'm certainly no SB3 expert. Try it and let us know results!
@karthikbharadhwaj94882 жыл бұрын
Hey Sentdex, Actually in env.step() method you have passed the env.action_space.sample() instead of model.predict() !!!!!
@vaizerdgrey2 жыл бұрын
Can we a tutorial on custom policy
@unknown3.3.342 жыл бұрын
Bro Plz help me. I would like to learn reinforcement learning. I'm good at machine learning( supervised and unsupervised) and deep learning. I would like to learn reinforcement learning but don't know where to start it. Plz guide me through my journey bro. Where should I start
@niklasdamm69002 жыл бұрын
22.12.22 15:00
@Stinosko2 жыл бұрын
Hello again
@raven90572 жыл бұрын
"nvidia-smi" nice flex :P
@sentdex2 жыл бұрын
what, those lil guys? ;) I like to check while recording sometimes bc if the GPU 0 hits 100%, it will cause a lot of jitter/lag in the recording. There's a real reason, I promise :D
@raven90572 жыл бұрын
im having real good results with TRPO under sb3-contrib
@raven90572 жыл бұрын
managed to hit a reaward-mean of 209 with only 600k steps
@ruidian81572 жыл бұрын
Another side note: When dealing with files and directory, basically anything related to path, it is recommended to use pathlib instead of os.
@iceman11252 жыл бұрын
why?
@mehranzand28732 жыл бұрын
thanks
@RafaParkoureiro Жыл бұрын
I wonder, how they play so good atari games if they cant land on moon properly
@andhikaindra5427 Жыл бұрын
Hii can I ask you something : import os import gymnasium as gym import gym.envs.registration import pybullet_envs import rl_zoo3.gym_patches from stable_baselines3.common.vec_env import DummyVecEnv, VecNormalize from stable_baselines3 import PPO apply_api_compatibility=True # Note: pybullet is not compatible yet with Gymnasium # you might need to use `import rl_zoo3.gym_patches` # and use gym (not Gymnasium) to instantiate the env # Alternatively, you can use the MuJoCo equivalent "HalfCheetah-v4" vec_env = DummyVecEnv([lambda: gym.make("HalfCheetah-v4")]) # Automatically normalize the input features and reward vec_env = VecNormalize(vec_env, norm_obs=True, norm_reward=True, clip_obs=10.) model = PPO("MlpPolicy", vec_env) model.learn(total_timesteps=2000) # Don't forget to save the VecNormalize statistics when saving the agent log_dir = "/tmp/" model.save(log_dir + "ppo_halfcheetah") stats_path = os.path.join(log_dir, "vec_normalize.pkl") env.save(stats_path) # To demonstrate loading del model, vec_env # Load the saved statistics vec_env = DummyVecEnv([lambda: gym.make("HalfCheetah-v4")]) vec_env = VecNormalize.load(stats_path, vec_env) # do not update them at test time vec_env.training = False # reward normalization is not needed at test time vec_env.norm_reward = False # Load the agent model = PPO.load(log_dir + "ppo_halfcheetah", env=vec_env) And I got this : Traceback (most recent call last): File "d:\download baru\import gymnasium as gym 2.py", line 27, in env.save(stats_path) ^^^ NameError: name 'env' is not defined What should I do? Thanks
@cashmoney5202 Жыл бұрын
You never initialized env?
@andhikaindra5427 Жыл бұрын
@@cashmoney5202 It's from stable baselines3. So what should I do?
@Berserker_WS Жыл бұрын
@@andhikaindra5427 you are using a variable named "vec_env" and not "env" (which is normally used). To solve this error it would be enough to change "env.save(stats_path)" for "vec_env.save(stats_path)"