Saving and Loading Models - Stable Baselines 3 Tutorial (P.2)

Рет қаралды 44,533

Күн бұрын

Пікірлер: 57

@AntoninRaffin2 2 жыл бұрын

Hello again, Stable-Baselines (SB3) maintainer here ;) Nice video showing the variability of results in RL. For what you are doing, you should take a look at SB3 callbacks, especially CheckpointCallback and EvalCallback to save the best model automatically. And to have a more fair comparison, you can always check the tuned hyperparameters from the RL Zoo.

@ILoveMattBellamy 2 жыл бұрын

Hi Antonin, any chance you can share an example of exporting the SB model into C++. I found the example of the PPO in cpp but I am currently mostly working with a DQN structure. Thanks!

@robmarks6800 2 жыл бұрын

Hello Antonin, when I'm running SB3's DQN/A2C algs I see very little GPU utilization, using nvidia-smi I get around 15%. I have seen many others have the same problem but all have gone unanswered. Is this some problem on our end, or is it something inherent in SB3/pytorch or is it the incremental online aspect of rl the problem? I just got access to a tesla v100 so I'm kinda sad that i get literally ZERO second speedup using it. I don't really have any experience in profiling my application but maybe that's something I have to learn. What do you recon?

@ApexArtistX Жыл бұрын

Can u do tutorial on the checkpointcallback thing

@bluebox6307 Жыл бұрын

For some reason, this is not only helpful but actually entertaining. Usually, I barely comment, but this is some good stuff :)

@HT79 2 жыл бұрын

Side note: To create those folders, instead of an if clause, you can simply use makedirs with exist_ok = True

@renkeludwig7405 2 жыл бұрын

Only P.2 and I already love this series so much! Thanks for all your great content man!

@fuba44 2 жыл бұрын

This is cool, looking forward to this series.

@satyamedh 10 ай бұрын

sb3 makes it so easy that a video about saving the model is longer than the inital intro and first steps

@PerfectNight123 2 жыл бұрын

Quick question, how are you training on CUDA device? I have a GPU installed but im training on cpu device?

@anotherKyle 5 ай бұрын

i would really like to know this!

@harrivayrynen 2 жыл бұрын

I like this series a lot. I have tried to learn To use ml-agents with Unity, but getting started is quite hard. Yes, there is official examples in Mlagents repo. But to get something new to work is hard to me. Hopefully there will be new book for this scene. There are some, but those are quite old.

@jyothishmohan5613 2 жыл бұрын

Wow...one video per day ...that's super cool 😎 👌 👍

@giacomocarfi5015 2 жыл бұрын

great video sentdex!! will you also talk about multi agent reinforcement learning?

@yawar58 2 жыл бұрын

Can't wait for the custom environment video. If I may suggest, please do a trading environment as an example. Thanks.

@sentdex 2 жыл бұрын

your wait is now already over ;) kzbin.info/www/bejne/q3zRm3qkbct5bZI

@carlanjackson1490 2 жыл бұрын

Say you partially train a model for say 50000 steps. Is it possible to once its finished you wish to reload that same trained model and continue training it for an additional say 20000 steps. I have a partially trained DQN but its not performing as well as it should and would like to continue the training but I am not sure if it is possible or will I just have to train an entirely new model

@ytsks 2 жыл бұрын

What you said about the jumping and rolling robot dog got me thinking - do you have a way to force minimum exertion policy? This is a guiding principle in most living organisms, only use the minimal effort (energy) that will produce the desired result. It purpose While you likely don't care about the energy conservation in that way, it could filter out the behavior patterns like the one you mentioned.

@anasalrifai2217 2 жыл бұрын

Thanks for the content. Can you show us how to customize actor and critic network architecture?

@serta5727 2 жыл бұрын

Stable Baselines 3 is very helpful!

@alessandrocoppelli3056 6 ай бұрын

hello,i'm trying to use PPO and A2C for my discrete-box environment. i have set negative rewards in order to teach the agent to avoid impossible operation in my environment. most of the training time is spent to learn to avoid those operations with negative rewards. Is there a method to directly "tell" the agent (inside the agent itself) to avoid those operations, instead of spend training time? thanks in advance

@georgebassemfouad 2 жыл бұрын

Looking forward for the custom environment.

@sarc007 2 жыл бұрын

How to save the most optimized value of the model. I understood that this follow value "ep_rew_mean tag: rollout/ep_rew_mean" should be highest possible and this "value_loss tag: train/value_loss" should be lowest possible , so how to get or save the best model when this happens, any idea?

@Nerdimo 2 жыл бұрын

Probably should have commented last video, but I keep on getting this error on Mac OS: 2022-05-23 16:22:46.979 python[10741:91952] Warning: Expected min height of view: () to be less than or equal to 30 but got a height of 32.000000. This error will be logged once per view in violation. I tried to resolve using some stuff on stack overflow but the same thing happens. This error, I believe prevents me for running more than 1 episode. It will run one episode in the environment then crash :/

@walterwang5996 Жыл бұрын

I used the save code similar to this, but it gave me several curves in tensorboard. and the reward curves are not even close, they have big gaps. I am wondering why, and my training cost time, can I train it and save the model after some episodes? Thanks.

@nnpy 2 жыл бұрын

Awesome move forward ⚡

@datajake2742 6 күн бұрын

If model.learn returns a trained model (per the docs), what is the for loop doing?

@scudice 2 жыл бұрын

I have an issue after leading a trained agent, the model.predict(obs) outputs always the same action, even though the agent was not doing that at all during learning.

@sudhansubaladas2322 2 жыл бұрын

Well explained...please make some video on machine translation from scratch like loading huge data , training and testing etc...

@ApexArtistX Жыл бұрын

Can you load and continue training instead of starting from scratch again

@MegaBd23 9 ай бұрын

When i do this, i don't get the rollout/ep_len_mean or rollout/ep_rew_mean graphs, but instead a bunch of time graphs...

@arjunkrishna5790 2 жыл бұрын

Great video!

@KaikeCastroCarvalho 9 ай бұрын

Hello, Is possible to use DQN model in tensorboard?

@MuazRazaq 2 жыл бұрын

Hey thankyou so much for such a good explanation. I wanted to ask how you are using GPU? Because whenever I run the same code it gives "Using CPU Device", I have a NVIDIA GeForce GTX 1650 Card

@rohitchan007 2 жыл бұрын

Thank you so much for this

@sarvagyagupta1744 2 жыл бұрын

Hey, thanks for the video. I was wondering if I can ask some questions around loading a model here or would you prefer somewhere else?

@marcin.sobocinski 2 жыл бұрын

🤚Can I have a question: is there going to be ML Agents RL tutorial as well❓It could be a nice sequel to SB3 series 😀

@enriquesnetwork 2 жыл бұрын

Thank you!

@konkistadorr 2 жыл бұрын

Hey, great vidéos as always :) Shouldn't you use predict_step instead of predict for faster execution ?

@sentdex 2 жыл бұрын

Possibly, first im hearing about it, but I'm certainly no SB3 expert. Try it and let us know results!

@karthikbharadhwaj9488 2 жыл бұрын

Hey Sentdex, Actually in env.step() method you have passed the env.action_space.sample() instead of model.predict() !!!!!

@vaizerdgrey 2 жыл бұрын

Can we a tutorial on custom policy

@unknown3.3.34 2 жыл бұрын

Bro Plz help me. I would like to learn reinforcement learning. I'm good at machine learning( supervised and unsupervised) and deep learning. I would like to learn reinforcement learning but don't know where to start it. Plz guide me through my journey bro. Where should I start

@niklasdamm6900 2 жыл бұрын

22.12.22 15:00

@Stinosko 2 жыл бұрын

Hello again

@raven9057 2 жыл бұрын

"nvidia-smi" nice flex :P

@sentdex 2 жыл бұрын

what, those lil guys? ;) I like to check while recording sometimes bc if the GPU 0 hits 100%, it will cause a lot of jitter/lag in the recording. There's a real reason, I promise :D

@raven9057 2 жыл бұрын

im having real good results with TRPO under sb3-contrib

@raven9057 2 жыл бұрын

managed to hit a reaward-mean of 209 with only 600k steps

@ruidian8157 2 жыл бұрын

Another side note: When dealing with files and directory, basically anything related to path, it is recommended to use pathlib instead of os.

@iceman1125 2 жыл бұрын

why?

@mehranzand2873 2 жыл бұрын

thanks

@RafaParkoureiro Жыл бұрын

I wonder, how they play so good atari games if they cant land on moon properly

@andhikaindra5427 Жыл бұрын

Hii can I ask you something : import os import gymnasium as gym import gym.envs.registration import pybullet_envs import rl_zoo3.gym_patches from stable_baselines3.common.vec_env import DummyVecEnv, VecNormalize from stable_baselines3 import PPO apply_api_compatibility=True # Note: pybullet is not compatible yet with Gymnasium # you might need to use `import rl_zoo3.gym_patches` # and use gym (not Gymnasium) to instantiate the env # Alternatively, you can use the MuJoCo equivalent "HalfCheetah-v4" vec_env = DummyVecEnv([lambda: gym.make("HalfCheetah-v4")]) # Automatically normalize the input features and reward vec_env = VecNormalize(vec_env, norm_obs=True, norm_reward=True, clip_obs=10.) model = PPO("MlpPolicy", vec_env) model.learn(total_timesteps=2000) # Don't forget to save the VecNormalize statistics when saving the agent log_dir = "/tmp/" model.save(log_dir + "ppo_halfcheetah") stats_path = os.path.join(log_dir, "vec_normalize.pkl") env.save(stats_path) # To demonstrate loading del model, vec_env # Load the saved statistics vec_env = DummyVecEnv([lambda: gym.make("HalfCheetah-v4")]) vec_env = VecNormalize.load(stats_path, vec_env) # do not update them at test time vec_env.training = False # reward normalization is not needed at test time vec_env.norm_reward = False # Load the agent model = PPO.load(log_dir + "ppo_halfcheetah", env=vec_env) And I got this : Traceback (most recent call last): File "d:\download baru\import gymnasium as gym 2.py", line 27, in env.save(stats_path) ^^^ NameError: name 'env' is not defined What should I do? Thanks

@cashmoney5202 Жыл бұрын

You never initialized env?

@andhikaindra5427 Жыл бұрын

@@cashmoney5202 It's from stable baselines3. So what should I do?

@Berserker_WS Жыл бұрын

@@andhikaindra5427 you are using a variable named "vec_env" and not "env" (which is normally used). To solve this error it would be enough to change "env.save(stats_path)" for "vec_env.save(stats_path)"