Building a Custom Environment for Deep Reinforcement Learning with OpenAI Gym and Python

Рет қаралды 143,586

Nicholas Renotte

Күн бұрын

Пікірлер: 309

@Paul_Jeong96 3 жыл бұрын

Thank you for your tutorial, I hope to see how you can visualize the environment in the upcoming tutorial!

@NicholasRenotte 3 жыл бұрын

Me too! Keen to do a ton more stuff with RL and possibly PyGame!

@NicholasRenotte 3 жыл бұрын

@Eliseo Raylan awesome! Let me know how you go with it!

@tomtalkscars9494 3 жыл бұрын

Love these. Building custom environments is one of the biggest areas missing with the OpenAI stuff imo. Would be cool to see one bringing in external data. Like predicting the direction of the next step of a Sine Wave or something simple like that.

@NicholasRenotte 3 жыл бұрын

Definitely, got way more stuff on RL planned once the Python course is out!

@laser7861 2 жыл бұрын

Great tutorial. Simple and to the point, especially for someone who is familiar with RL concepts and just wants to get the nuts and bolts of an OpenAI gym env.

@tawsifkamal88 3 жыл бұрын

Really informative video! As a high schooler self-learning RL, tutorials such as these are really helpful for showing applicability in RL.

@NicholasRenotte 3 жыл бұрын

Ohhh man, going to have something awesome for you in a few days time then!

@Techyisle 3 жыл бұрын

Your tutorials were awesome, and I just finished your 3-hour RL tutorial, and I would like to see a Pygame implementation as soon as possible :) If possible, try to create a different set of advanced videos where you will explain the math and intuition behind RL, along with code implementations (to cater a different audience). Something I like about you is that you respond to each every comment, a characteristic which I don't see often from others. Kudos to you! Thanks again mate! Stay safe!

@NicholasRenotte 3 жыл бұрын

Thanks @Techy Isle, I'm definitely going to be going into more detail. Been studying some hardcore DL stuff like crazy while producing the Python basics course!

@baronvonbeandip Жыл бұрын

This is way more useful than the last one. The more you can modify OpenAI's envs, it seems, the more that you can get out of the reinforcement learning schema.

@user___01 3 жыл бұрын

Man you can't stop giving us this gold of a tutorial!

@NicholasRenotte 3 жыл бұрын

Definitely! Two a week that's the goal man!

@MuazRazaq 3 жыл бұрын

I Can't stop myself from commenting on this exceptionally good tuotorial. Sir, really amazing job. I must say you should continue this good work, the way you explain each and every line is something that is very rare in the material that is available till now. Much love from a Pakistani student currently in South Korea 😍

@NicholasRenotte 3 жыл бұрын

Ohhhh thanks so much @Muaz! Soo glad you enjoyed it.

@pratyushpatnaik4617 3 жыл бұрын

Sir that was exceptionally good!!! 🔥 I would really love to see the render function in play using pygame. Waiting eagerly for it!!!!

@NicholasRenotte 3 жыл бұрын

Definitely, can't wait to finally do something with Pygame!

@viswanathansankar3789 3 жыл бұрын

@@NicholasRenotte Yeah Please do it as soon as possible...

@albertsalgueda1036 2 жыл бұрын

Yes! there is a need for Environment viz.

@prakhars962 3 жыл бұрын

just recommended this video to one of my coursemate. your videos are worth sharing.

@NicholasRenotte 3 жыл бұрын

Thanks soo much!

@charlesewing9772 2 жыл бұрын

Hi, great video! I was just wondering what happens if say for example the temperature is at 100 and the model try's to add 1 to temperature (so now outside the limits), does it then resample automatically or would you have to implement this in the code yourself?

@yxzhou5402 2 жыл бұрын

As a beginner of RL, all your videos really help me a lot so thank u!!! And I just wonder if there is any chance to see the tutorial on how to build the env with multi-dim action?

@SatoBois 3 жыл бұрын

Hello Nick! I love your tutorial and it's actually helping so much in university especially consider the lack of documentation for openai. I was actually doing a custom environment for tictactoe to practice but for some reason when I run dqn.fit() like you did with the same everything for the keras-rl training part I get this: "ValueError: Error when checking input: expected dense_16_input to have 2 dimensions, but got array with shape (1, 1, 3, 3)" I don't quite understand why it got that shape because my tictactoe game's observation space is a np.array([Discrete(3)]*9) to represent the nine tiles and the three possibilites of what could be in them. Again, thank you for the helpful tutorials!

@myceliumbrick1409 Жыл бұрын

yep i have the same error. Did you manage to solve the issue?

@ProfSoft 3 жыл бұрын

great job , thanks i have a question , why in the model building you put the last layer activaition function to 'linear' , i think we should make it softmax because i think it is classification problem ??

@NicholasRenotte 3 жыл бұрын

Hmmm, could definitely change the activation function there!

@ProfSoft 3 жыл бұрын

@@NicholasRenotte Thank you very much, I just wanted to make sure there was no specific reason for choosing linear Activaotion function, God bless you for this great effort

@oliverprislan3940 2 жыл бұрын

Thank you Nicholas, this is a very good example to give it a kick start.

@stevecoxiscool 2 жыл бұрын

I guess what urks me the most about all the universe/retro/baselines gym examples is that it's not straight forward to get your bright/shiny newly trained model to run in other environments. These gym examples have so many interdependencies and one does not really know what is going on inside the box. This is why I am glad you are doing the video on getting other environments to work with RL algos. Unreal is my choice sine Unity already has a ML examples.

@NicholasRenotte 2 жыл бұрын

100% I took a look into the Unity environment over the Christmas break and was godsmacked. Well documented, logging and training was clear. I love OpenAI Gym but it seriously Unity ML agents appear to be so much easier to deal with.

@stevecoxiscool 2 жыл бұрын

@@NicholasRenotte I really wish Unreal was at par with Unity on the ML technology. I am using UnrealPythonPlugin to send images to a remote python client running opencv DNN. The video doing this on my youtube is a few years old. Your custom gym environment linked to Unreal is doable. Thanks for your videos !!!!

@sommojames 2 жыл бұрын

Great video, but what's the point with observation space? Looks like your agent is not using it

@markusbuchholz3518 3 жыл бұрын

Nicholas as I mention sometimes ago your YT channel is outstanding and your effort impressive. The RL is my favourite branch of ML so I extra enjoyed watch your performance. Exceptionally, you built also customised environment. The idea can be easily populated and applied to other specific tasks. It is a great pleasure to watch your channel and I will recommend everyone to be here (to subscribe) Have a nice day!.

@NicholasRenotte 3 жыл бұрын

Thank you so much @Markus! Glad you enjoyed the RL videos, I think it's a super interesting field with a ton of interesting applications. I'm hoping later on this year we might be able to apply some of it into hardware applications with Raspberry Pi or ROS!

@markusbuchholz3518 3 жыл бұрын

@@NicholasRenotte Thank you wonderful feedback! Yes ROS/ROS2 is great robotics framework. Now I am more inspired by Nvidia Jetson Xavier since it is "slightly" more powerful. Good luck!!!

@NicholasRenotte 3 жыл бұрын

@@markusbuchholz3518 oooh yeah, I took at that yesterday. Looks awesome! The OAK camera looks promising as well!

@islam6916 3 жыл бұрын

Thank you for the video ⚡⚡⚡ I hope you can make a Custom Agent Next time ✅ Looking forward to see that ✨

@NicholasRenotte 3 жыл бұрын

Heya! Definitely, code is 80% of the way there, I should have it up in the coming weeks!

@islam6916 3 жыл бұрын

@@NicholasRenotte Great !!!

@jiajun898 3 жыл бұрын

Great tutorial. A question though. What would be the benefit of transferring your reinforcement learning from the keras implementation to the openai gym environment implementation?

@NicholasRenotte 3 жыл бұрын

This is Gym, it's more the rl agents that I've started migrating (better stability, control and exporting).

@TheNativeTwo Жыл бұрын

Great video, I like how you explain each line of code. My one complaint is not your fault... Getting the right environment and versions of the packages. I got right to the end... And couldn't get it working. A bit frustrating lol.

@davidowusu1184 3 жыл бұрын

Great Video. I was able to use this as a basis to create an environment for my specific needs. I have one question though Once you've trained your model and saved your weights, how do you use it? I mean actually pass values to the model to get an action as a response

@NicholasRenotte 3 жыл бұрын

You can pass the new state to the model and actions are returned as the output. Can then go and plug it into the real model/iot suite etc.

@davidowusu1184 3 жыл бұрын

@@NicholasRenotte Thanks so much for the wonderful content and thanks even more these replies. You're awesome.

@sebatinoco 3 жыл бұрын

Hi Nicholas, amazing video! Quick question, how can I access the current action that the AI is taking? Thanks!

@NicholasRenotte 3 жыл бұрын

Heya @Sebastian, I don't believe it's easily accessible through keras-rl. If you're using StableBaselines, you can access it through DQN.predict(obs) e.g. github.com/nicknochnack/StableBaselinesRL/blob/main/Stable%20Baselines%20Tutorial.ipynb shown towards the end.

@padisalashanthan98 Жыл бұрын

Great video! I am a little confused on how to solve a multi-states problem. Can you please give some pointers on that?

@KEFASYUNANA Жыл бұрын

Great Videos. Any idea on how to handle 2 or 3 states/observations in the codes say temperature and pressure or humidity

@melikad2768 2 жыл бұрын

Hi Nick. Thank you veryy much, I learned a lot. But there is a question: How can see which action should the shower can take? I mean, how can i understand about the action which the agent can take based on the reward?

@Spruhawahane 2 жыл бұрын

On my mac the kernel keeps dying when I run the basic cartpole example. Don't know how to troubleshoot. Pls help.

@OmarAlolayan 3 жыл бұрын

Thank you Nicholas ! Can you please advice me on how to use the step function if I have a mutlidiscrete action space?

@NicholasRenotte 3 жыл бұрын

Heya @Omar, what does your output look like if you run env.action_space.sample()

@OmarAlolayan 3 жыл бұрын

@@NicholasRenotte Hi Nicholas, Thank you for your reply it is big action space it is a 3D action space, (3, 10, 10). array([[[2, 0, 2, 2, 1, 1, 0, 0, 1, 1], [1, 1, 2, 2, 2, 0, 1, 0, 2, 0], [1, 2, 0, 1, 0, 2, 0, 1, 1, 1], [1, 0, 1, 0, 0, 1, 2, 0, 1, 1], [1, 2, 0, 2, 2, 0, 1, 0, 0, 2], [2, 2, 0, 2, 0, 1, 1, 0, 2, 2], [2, 0, 1, 1, 0, 0, 1, 1, 1, 1], [0, 2, 2, 2, 2, 1, 0, 0, 0, 2], [1, 2, 2, 0, 1, 1, 1, 2, 2, 2], [2, 0, 0, 1, 1, 2, 1, 1, 0, 2]], [[1, 2, 1, 0, 1, 1, 1, 2, 0, 1], [0, 1, 0, 0, 1, 1, 2, 2, 1, 2], [0, 2, 1, 0, 2, 1, 2, 2, 2, 1], [1, 2, 2, 0, 0, 2, 0, 2, 2, 0], [0, 2, 0, 0, 0, 0, 1, 2, 1, 2], [1, 2, 1, 1, 1, 2, 0, 1, 2, 1], [1, 1, 1, 2, 2, 1, 2, 0, 0, 2], [2, 1, 0, 1, 1, 2, 0, 0, 0, 2], [0, 0, 1, 1, 1, 0, 1, 2, 2, 1], [2, 0, 2, 1, 1, 0, 0, 2, 1, 0]], [[2, 1, 1, 2, 1, 1, 2, 1, 0, 2], [0, 1, 2, 1, 0, 0, 1, 1, 0, 0], [0, 0, 0, 1, 1, 2, 1, 2, 0, 1], [2, 1, 0, 0, 0, 1, 2, 0, 1, 2], [2, 0, 2, 1, 0, 0, 2, 0, 2, 1], [0, 1, 0, 1, 1, 0, 2, 0, 0, 2], [1, 2, 0, 1, 0, 2, 2, 2, 2, 0], [0, 0, 0, 1, 1, 2, 2, 2, 0, 0], [1, 2, 2, 2, 1, 0, 2, 0, 1, 1], [2, 1, 0, 0, 1, 0, 2, 1, 2, 1]]])

@NicholasRenotte 3 жыл бұрын

@@OmarAlolayan oh wow, can you try using this with stable-baselines instead? It might be easier to model as the algorithm will pick up the observation space without the need to define the neural network.

@AldorCrap Жыл бұрын

very useful tutorial. Although I would need some help with mine. I'm working to train a model for optimal path routing. I'm struggling on defining the observation_space, like does it have to be the whole road graph (as a spaces.Graph) or Box with the parameters (like current_coords, dest_coords, edges_max_car_speed) or maybe both? how do I should approach this?

@idrisbima5369 2 жыл бұрын

Hello Nick, wonderful video. I am having the same error message you pointed out in the video and tried resolving it as shown but it is giving me a different error message stating the name model is not defined. Please help

@julian.estevez 8 ай бұрын

Thanks a lot for the clarity of explanation.

@Sam-iy1kv Жыл бұрын

Hi, very nice video ! May I ask one question, what if I need a continuous model for the training task, the discrete action will not feasible , how can I do ?

@khaileng3020 Жыл бұрын

to fix the Sequential error: just arrange the order of importing library import numpy as np from tensorflow.keras.models import Sequential from tensorflow.keras.layers import Dense, Flatten from tensorflow.keras.optimizers import Adam import tensorflow as tf from rl.agents import DQNAgent from rl.policy import BoltzmannQPolicy from rl.memory import SequentialMemory states = env.observation_space.shape actions = env.action_space.n print(actions) def build_model(states, actions): model = tf.keras.models.Sequential() model.add(Dense(24, activation='relu', input_shape=states)) model.add(Dense(24, activation='relu')) model.add(Dense(actions, activation='linear')) return model model = build_model(states, actions) print(model.summary()) def build_agent(model, actions): policy = BoltzmannQPolicy() memory = SequentialMemory(limit=50000, window_length=1) dqn = DQNAgent(model=model, memory=memory, policy=policy, nb_actions=actions, nb_steps_warmup=10, target_model_update=1e-2) return dqn dqn = build_agent(model, actions) dqn.compile(Adam(lr=1e-3), metrics=['mae']) dqn.fit(env, nb_steps=50000, visualize=False, verbose=1)

@traze78 Жыл бұрын

thanksssssss!! it helped a lot

@nilau8463 2 жыл бұрын

Thank you for the guide, really gave me a good idea how to implement my own models!

@NuHoaNgonLa 3 жыл бұрын

what does the Env argument inside the ShowerEnv() class do?

@NicholasRenotte 3 жыл бұрын

Should be the parent class, I may have forgotten to run super().__init__() inside of the __init__ function.

@erfankhordad9403 3 жыл бұрын

Thanks Nicholas for the great explanation. I have tested this custom environment with PPO and MlpPolicy and got very low rewards around -40 (even with 200000 time steps for model.learn). Any idea why I get poor results? thanks

@NicholasRenotte 3 жыл бұрын

Same env as this one or custom one? Might need a little HPO or possibly an alternate algorithm, I think I did it with a slightly different model in the full RL course with better results!

@kayleechu931 2 жыл бұрын

Hi Nicholas, thanks a lot for your video! I wonder how can I know about the objective function of the agent? Is there a way that I can change the objective function myself? Thanks a lot!

@candychebet896 11 ай бұрын

Hello. Did you get to doing the visualization?

@DreamRobotics 3 жыл бұрын

Very simple and very nice. Good work.

@NicholasRenotte 3 жыл бұрын

Thanks so much @Dr. Abdul-Mannan Khan!

@talitaaraujo1327 3 жыл бұрын

Man, your videos are so great! Congrats!!!!! I have one question: i can't install a keras-r12. Maybe you can help me.

@NicholasRenotte 3 жыл бұрын

Try keras-rl2, it should be an l instead of 1

@talitaaraujo1327 3 жыл бұрын

@@NicholasRenotte Thank you!!!

@ehrotraabhishekm4824 Жыл бұрын

Fantastic video, thank you so much for it....i have one doubt regarding DQN in gym....can you please share some details for how to proceed DQN with multi-dimension state space(4 D) which was 1 D in your case (temp)

@travelthetropics6190 3 жыл бұрын

Thanks for the informative series on reinforcement learning. Are you running this on CPU or GPU ? At [23:23]. I have noticed that in your PC, it is like 47-55 sec per 10000 steps. I am getting 118-120 sec with my GPU and 59-63 sec with my CPU only. It seems like, this small model works better with CPU only, may be due to the extensive copying time to GPU :D

@NicholasRenotte 3 жыл бұрын

Yeah, I noticed that as well, with RL oftentimes the model won't benefit as much from GPU acceleration.

@svh02 3 жыл бұрын

hey @Nicholas, awesome as usual !! Any reason why you chose to build your agent with Keras-RL and not with the ones provided by Stable-Baselines? Hope you keep making videos about custom environmets. I think that's what's most useful. KZbin is already crowded with videos about the common environmets for games and stuff like that.

@NicholasRenotte 3 жыл бұрын

Was a little early on when I did this, I've since transitioned most of my rl projects to sb! Got plenty planned on custom environments, stay tuned!

@samuelebolotta8007 3 жыл бұрын

Hi Nicholas, great great work! It would be interesting to see a parallel with ML agents from Unity, to see the differences with OpenAI Gym. Thanks!

@NicholasRenotte 3 жыл бұрын

YESS! I've been waiting for someone to ask for it, I've started testing it out already, should have a tutorial on it kinda soonish!

@jugalyadav7110 2 жыл бұрын

Hello Nicholas, firstly this video helped me a lot to get my basics cleared up regarding RL. Currently I am working with my own custom env and building a SAC model over it. I wanted to plot the actor and critic losses, and from your video I get that it should be done within the render function. It would be great if you could post some video summarizing the plots in render function. Cheers !

@SuperHockeygirl98 Жыл бұрын

Hey , thank you so much for this video. It really helped me. I have a question: can you define your observation space using CSV files and then iterate over it, so the agent needs to deal with differing environments?

@RafalSwiatkowski 3 жыл бұрын

Greetings from Poland. Extra tutorial, it will be great if you show how to combine pygame with reinforcement learning

@NicholasRenotte 3 жыл бұрын

Woah Poland, what's happening! Definitely, I'll get cracking on it. Much love from Sydney!

@RafalSwiatkowski 3 жыл бұрын

@@NicholasRenotte Thank u master ;)

@NicholasRenotte 3 жыл бұрын

@@RafalSwiatkowski anytime!! 🙏

@monirimmi8616 2 жыл бұрын

Hi Nick, Thank you very much for your nice explanation with outstanding implementation. To this end, I have a question, How can I check the model parameter update, for example, the weight of each layer? When each training episode is done? Is there any way to check those parameters?

@NicholasRenotte 2 жыл бұрын

I think you can export the final keras model, this should allow you to see the model weights etc

@vincentroye 3 жыл бұрын

Excellent tutorial, thanks! Is it possible for a RL model to output a pair of ints or floats ( like [1.5, 2.8] ) instead of a discrete value? What would the output layer look like?

@NicholasRenotte 3 жыл бұрын

I believe so, you would need your final layers to have a linear activation function and need a box state space! What's the use case if you don't mind me asking @Vincent?

@vincentroye 3 жыл бұрын

@@NicholasRenotte thanks for answering. I'd be interested to see how a model could output the best geometrical coordinates for a given state. It could be a 2D game where the player would have to avoid bombs that pseudo-randomly hit a finite surface for example.

@NicholasRenotte 3 жыл бұрын

@@vincentroye oh got it! Might I suggest you approach it slightly differently. You would ideally store the state of the objects coordinates and just output the actions for your agent in response to those coordinates. It would be akin to your agent walking around using something like sonar.

@vincentroye 3 жыл бұрын

@@NicholasRenotte could the actions in that case be to move x (left or right) and y (up or down) at the same time? That would be the reason for having 2 outputs. I'd be interested to see how a dqn agent would train the model in that case. In your video it takes a discrete value as nb_actions, how would that be done with 2 continuous outputs? that's where I'm a bit confused, that would give place to a huge amount of possible actions.

@andreamaiellaro6581 Жыл бұрын

Hi Nicholas, I found this tutorial of great help!Thank you. However I'd like to ask you if what I have in mind is correct or not: within the step function can I update what side the observation_state?

@sandeepagarwal8566 2 жыл бұрын

Thank you for the tutorial.Like sklearn has hyperparameter tuning,Please let us know how can we tune hyperparameters in case of DQN...like any package or library or any kind of reference would be helpful...Thanks

@kushangpatel983 3 жыл бұрын

Really useful tutorial, Nick! Keep it up, mate!

@NicholasRenotte 3 жыл бұрын

Thanks @Kushang!

@Oriol.FernandezPena 3 жыл бұрын

Your content is the best!! 🔥🔥

@NicholasRenotte 3 жыл бұрын

Thanks so much!!! 🙏 🙏

@master231090 3 жыл бұрын

Amazing video! I had a question about the activation layer and performing the final action. Your activation layer is a linear function. How does link to picking the action?

@aayusheegupta 2 жыл бұрын

Hello Nicholas! Great tutorial on building customized environment with Gym. Could you please share any pointers on how to load our own dataset while building an environment? I want to load and train RL agent with natural language sentence embeddings and create a proof tree.

@PedroAcacio1000 2 жыл бұрын

Great video! Thank you very much, sir! What about if my problem only allow me to determine the reward on the next step, after we took some action? Do you have any video talking about such problems? Thanks again, your content is helping a lot.

@boonkhao 8 ай бұрын

Hi, it is a great video tutorial for customizing environment. However, I copy your code and run on jupyter notebook, I stuck in a problem that rl.agents could not find the version of keras. I have tried many ways to solve this but I am still cannot solve it. So, please help me.

@montraydavis 2 жыл бұрын

Fantastic tutorial! I have a question though regarding the DQNAgent test. I noticed that at the test function, the only two actions being sent to step are the low and high values. Why is that? How would I go about this because I need action to be equal to 0, 1 or 2 for my application. Thanks a lot for this resource!

@dhiyamdumur6245 2 жыл бұрын

Hi Nicholas! Very informative video! I would like to know if we can implement DDPG in the context of routing in a simulated networking environment to assess its performance in terms of network delay. Thank you

@NicholasRenotte 2 жыл бұрын

Yeah probably! I would think you would have different routes or paths for network load then reward based on latency or something of the like!

@ameerazam3269 3 жыл бұрын

Again Best ever explanation Sir appreciate your work keep it up for us

@NicholasRenotte 3 жыл бұрын

Thanks so much @Ameer!

@jordan6921 3 жыл бұрын

Ooo pygame would be such a cool thing to see. I wonder if Retro Gym environments work too!

@NicholasRenotte 3 жыл бұрын

IKR, Pygame is definitely on the list! I've tested with some of the Atari envs and them seem to work, take a while to train but they work!

@sahilahammed7386 2 жыл бұрын

hi, do the Observation space and state are same? here observation space isn't used while model training. right?

@mzadeh 3 жыл бұрын

Thank you very much, very clear and clean.

@NicholasRenotte 3 жыл бұрын

Thank you so much @Mostafa!

@saaddurrani8930 3 жыл бұрын

i am doing a project: RL for smart car (prototype )by using DQN or any other RL algorithm. So i am thinking to feed in images as a state (from the camera mounted on the car) and my car is able to take 3 actions (forward, right and left).. I am keeping it quiet simple i.e by keeping the car in front of our goal, and as the car sees the goal i want to reward it and take the next action , now if it takes sucha random action where the goal is no more in the vision of the camera, it gets a penalty (state,action,reward/panelty, next state and so on). The episode time is limited to 2 mins.My aim is that the car moves towards it goal (and the more it moves towards the goal the more the size of that feature would be larger, and hence it will get another reward bcz its moving towards its goal) (goal would be an image "Triangle" at the end of the room infront of the car intial position. Now before implementing my DQN into the real life prototype i need to train it on open AI gym (3d). I have no idea how i can build such a environment where i can train my DQN RL by simulation. any help and suggestion are apreciated

@NicholasRenotte 2 жыл бұрын

Take a look at how some of the video game driving environments are built! Should be a good start for how to kick it off!

@fernandomelo8460 3 жыл бұрын

First, your channel is amazing. Second, i try to adapt your custom env to a trading env, but, when i use Deep Learning/Keral RL2 it doesnt looks good, my Reward is always the same (and the maximum). I think the problem is the NN architecture or/and Rl models (BoltzmannQPolicy/DQNAgent), because the loop before Deep Learning part looks ok, do you have any tips?

@NicholasRenotte 3 жыл бұрын

Check this out: kzbin.info/www/bejne/emrWhmSegbljh7s

@mohamadalifahim344 3 жыл бұрын

Overly tensor flow for baseline import A2C not work. How to solve

@anamericanprofessor 2 жыл бұрын

Any good links to actually overriding the render function for showing our own custom visualization?

@hanswurst9667 3 жыл бұрын

Cheers Nicholas! How would one got about adding another dimension to the state? for example velocity of the water?

@NicholasRenotte 3 жыл бұрын

Heya @hans, you could add another box which represents the by grouping them into a Dict space: e.g. from gym.spaces import Discrete, Box, Dict spaces = { 'temperature': Box(low=np.array([0]), high=np.array([100])), #degrees 'velocity':Box(low=np.array([0]), high=np.array([30])) #litres/sec } dict_space = gym.spaces.Dict(spaces) self.observation_space = dict_space

@hanswurst9667 3 жыл бұрын

@@NicholasRenotte Thanks so much for the great reply! You wouldnt know by chance any great resources/tutorials concerning multi agent rl?

@NicholasRenotte 3 жыл бұрын

@@hanswurst9667 nothing that I've tested out already unfortunately :( Will let you know when I find something that works seamlessly!

@vigneshpadmanabhan 2 жыл бұрын

Is there a Deep Reinforcement learning algorithm we can experiment on regression based tabular data or sensor data etc? If so it would be much appreciated if you could make a video on it. Thanks !

@christiansiemering8129 3 жыл бұрын

Hi Nicholas, thx for your great video! Is there an easy way to create customized multi-agent environments with aigym? I want to create an AI that competes against another agent in a Multiplayer-"game".

@NicholasRenotte 3 жыл бұрын

AFAIK it's not super straightforward with pre-built RL packages that are out there atm. Will probably get to it in future vids!

@zahrarezazadeh293 2 жыл бұрын

Thanks Nicholas for the nice tutorial! I have two questions. 1. I'm trying to implement this on PyCharm with Python 3.10, on a MacOS Monterey, Core i3, but with built-in Python 2 something. I can't install and import tensorflow. It says it can't find a satisfying version. Any idea where the problem comes from? different Python versions? any solutions? 2. I'm starting to build my own environment, which is not like any of the ones available. It's a 2D path an agent should try to stay close to, by going left and right, with some gravity. Any suggestions where to start coding it? or any environments you know similar to this? THANK YOU!

@NicholasRenotte 2 жыл бұрын

Woah fair few there, take a look at some of the existing Gym envs, I think there might be some path focused ones I might have seen a while ago.

@alirezaghavidel4594 3 жыл бұрын

Thank you for your amazing work. I have a question regarding the defined Environment. I defined the self.state as vector in the __int__ function (self.state =np.zeros(shape=(5,),dtype = np.int64)), but when I want to recall the self.state in Step function, it is an integer. How can I have the vector state in the Step function as well?

@NicholasRenotte 3 жыл бұрын

I just took a look at my code and I think it's not perfect tbh. Try setting initial state inside of the reset method.

@alirezaghavidel4594 3 жыл бұрын

@@NicholasRenotte Thank you. I did

@alirezaghavidel4594 3 жыл бұрын

@@NicholasRenotte I have another question and I would be appreciated if you help me. I defined the environment for multi-component example and I defined the action space and observation space as vectors and now I want to recall them for RL in keras (you used states=env.observation_space.shape and actions = env.action_space.n for input parameters of built_model function). How can I recall them for my multi component example? do you have any example for multi-component example for RL in keras? Thank you

@frankkreher4832 2 жыл бұрын

Thank you, once again, for the very educational video. Great work!

@kheangngov8005 2 жыл бұрын

Hello Nicholas, Ur video is very helpful. I have some questions to ask. I wonder if it is possible to customize actions space for each state and reward only given at the terminal state. For example state 1, with 3 actions, state 2 with 5 actions, 3 with 10 actions, and reward can be calculated based on those action sequence whether it is a win or lose. Thank you.

@lukejames3570 3 жыл бұрын

two quesitons, Is the aciton came out of the network go to next step of the env? and how can you be sure the output of the network is 0 1 2 instead of other random number?

@hariprasad1168 3 жыл бұрын

Thanks! This is really helping me a lot.

@bananabatsy3708 3 жыл бұрын

I am just getting a NotImplemented Error in the FOR LOOP cell. I looked up. It has to do with inheritance. But I cannot find how.

@NicholasRenotte 3 жыл бұрын

Heya @BananaBatsy, where about's is the error being triggered?

@TheOfficialArcVortex 2 жыл бұрын

Any chance you could do a tutorial on how to use this for physical computing? For example, how would you implement two led's say using GPIO on a raspberry pi when temp goes up or down and an input sensor for temperature. Or say an accelerometer and a motor for balancing.

@PhilippWillms 2 жыл бұрын

Where does the 24 come from in defining the neural network layers?

@NicholasRenotte 2 жыл бұрын

Completely subjective, could change it to a larger or smaller value depending on the complexity of the problem you're trying to solve Philipp!

@fidelesteves6393 3 жыл бұрын

What an amazing tutorial! Thanks

@NicholasRenotte 3 жыл бұрын

Thanks so much @Fidel!

@zitongstudio 3 жыл бұрын

Hi, I don't understand why during training the reward is around -0.5, while during testing the reward is around -60. Is it because that the number of steps used for training and testing are different? For training it is over 10000 steps, for testing only 60 steps.

@NicholasRenotte 3 жыл бұрын

Different starting points without enough steps to get to the final result. Increasing testing steps would allow the agent to iterate closer.

@caio.cortada Жыл бұрын

Have you done any of those using MuJoCo environments? Do you have any bibliography on that?

@Antonio-om4sg 2 жыл бұрын

Would it be possible to see the evolution (a plot) of the temperature of the water when the agent is run on the scenario? For each episode we would see, for each step, the evolution of the water temperature

@edzme 3 жыл бұрын

Yep this is exactly what I was looking for. Could you make an example with a Dict space? Please thx!

@NicholasRenotte 3 жыл бұрын

You got it, added to the list @Ed!

@edzme 3 жыл бұрын

@@NicholasRenotte yessssssss!

@NicholasRenotte 3 жыл бұрын

@@edzme yeahhhhyaaaa!!

@samiul2009 3 жыл бұрын

Awesome!! I was wondering how would it work with path finding with some obstacles.

@NicholasRenotte 3 жыл бұрын

Normally the RL agent takes a while to find it's way but eventually works around it. I'm actually working on RL for super mario atm, it's definitely taking a while though @Samiul!

@samiul2009 3 жыл бұрын

@@NicholasRenotte wow! Waiting for rendering work of the environment

@NicholasRenotte 3 жыл бұрын

@@samiul2009 awesome, yup got pygame in the pipeline!

@ihebbibani7122 3 жыл бұрын

As usual , excellent content. Thank you so much :)

@NicholasRenotte 3 жыл бұрын

Thanks so much @Iheb!

@raihankhanphotography6041 3 жыл бұрын

Thank you for the tutorial. This was super helpful!

@NicholasRenotte 3 жыл бұрын

Anytime @RaihanKhan!

@jossgm7480 Жыл бұрын

How can I access the Q-Values? I see that in the training the "mean_q" variable is displayed. How can I access these values (mean q values)?

@alessandroceccarelli6889 3 жыл бұрын

Shouldn’t you use a softmax output function instead?

@NicholasRenotte 3 жыл бұрын

In retrospect, I think I could've done some tweaking to support an alternate activation.

@apreceptorswanhindi 2 жыл бұрын

Hey man, Thanks for the wonderful video. I made a custom environment, and Keras-rl2 is taking a lot of time, not utilizing the GPUs. How can I optimize the training of this or similar codes using TensorFlow 2 on remote GPU with Ubuntu 20.0? 20.04.4 LTS (GNU/Linux 5.13.0-52-generic x86_64) NVIDIA-SMI 515.48.07 Driver Version: 515.48.07 with four NVIDIA GeForce RTX 3080 GPUs 10 GB each

@JJGhostHunters Жыл бұрын

I love these tutorials, however I have spent hours and hours trying to even get the most simple environment to "render". I have tried two computers with Spyder, Jupyter Notebooks and even the command line and have never been able to even get a window to pop up with a rendered environment. I am continuing to learn theory of RL, however it is very frustrating not be able to follow along with these tutorials.

@juanguitarte5480 3 жыл бұрын

How do I custom my environment to make it visual? Using pygame? or how? Thank you

@NicholasRenotte 3 жыл бұрын

Yep, can definitely do it with PyGame!

@Mesenqe 2 жыл бұрын

Nice tutorial, could you please make one tutorial on how to use RL in image classification.

@summanthreddemulkkalla6786 2 жыл бұрын

sir can you implement the optimally placing the Electric vehicle charging stations using DQN please thank you

@tareklimem 2 жыл бұрын

Hei Nick, i need a class for reinforcement learning. How can we get in contact please?

@davidromens9541 3 жыл бұрын

Have the libraries for kera rl or keras-rl2 been updated recently? I have been building a custom environment and training a NAF agent to solve it. Last week it was working, but when I came back this week, the code is throwing a key error:0 when my NAF.fit line is run. Any suggestions or help would be greatly appreciated.

@davidromens9541 3 жыл бұрын

FYI I am using google colab which requires me to reinstall all libraries every session. I know its not ideal, but unfortunately I am on Windows.

@NicholasRenotte 3 жыл бұрын

Heya @David, not too sure I haven't been using keras-rl2 lately, I've been working with stable baselines in it's place. Did you have errors that you can share?

@davidromens9541 3 жыл бұрын

@@NicholasRenotte Turned out to be a weird bug with google colab. Restarted my computer and is working fine now. Thanks for the reply though!

@NicholasRenotte 3 жыл бұрын

@@davidromens9541 anytime, you're welcome. Weird though. Building anything interesting in the RL space?

@fufufukakaka 3 жыл бұрын

keras-rl2 is already archived.

@vts_22 2 жыл бұрын

Hi Nicholas, I watched all of the Reinforcemenet videos of yours and on the internet.(And i have been trying for 10 hours) What if my state is [10,20,30,40] and my action_space is Discrete 4. I am getting "DQN expects a model that has one dimension for each action, in this case 4" My shape is (None, 1,4 ) i cant fix it.

@vts_22 2 жыл бұрын

I should probably change DQN and ADAM

@NicholasRenotte 2 жыл бұрын

Discrete(4) will return actions 0,1,2,3 as integer values. That sounds like your observation space is incorrect, looks like that would be (None, 4) if you've got [10,20,30,40]

@vts_22 2 жыл бұрын

@@NicholasRenotte I edited Sequential layers by looking at your videos, my observation_space was same with you observation_space in one video. I watched all your videos thanks for videos.I built basic snake game but im amining to design 2d space travel game with gravitation and orbits. My problem was creating wrong layers, i understand it better. Thanks again for your excellent videos.

@sarash5061 3 жыл бұрын

This was a very good video and I've learned alot. Thanks. I'm wondering if you make a custom environment or basically a custom agent in price optimisation? I want to train an agent for price optimisation however, I don't know what reward should be for the agent. I do have the distribution of the past sales but don't know how to define the reward for my agent. Should it be the probability of buying or should it be price? Can you please help me here or create a content for it please?

@NicholasRenotte 3 жыл бұрын

Depends what you're optimizing for, for price presumably it would be max profit or sales?

@sarash5061 3 жыл бұрын

@@NicholasRenotte in both cases my agent would be a customer, right? Logically increasing sale or profit would be in favor of the company not the customer. Then how do I train my agent while the reward is in favor of the company and not necessarily in favor of customer (agent)?

@NicholasRenotte 3 жыл бұрын

@@sarash5061 the agent is more likely to be the dynamic price adjuster. So it will make dynamic adjustments to price in response to sales or profit. E.g. 1. Agent sets initial price 2. Environment (aka customers) respond by buy X quantities 3. Calculate reward as sales or profit 4. Agent proceeds to learn to adjust price to maximize sales or profit 5. Customers buy more...and so on

@sarash5061 3 жыл бұрын

@@NicholasRenotte thank you very much. I will give it go.

@perryfisch2003 3 жыл бұрын

Your tutorials are all amazing! Thank you for taking the time to make all of these videos! I do have a question regarding making a custom Gym environment in which there's an OpenGL rendered environment that can be seen/viewed, similar to the race car environment. Say a 50 x 50 cell grid with a robot contained within it and several obstacles to navigate around as well as a target/goal. I've searched high and low and cannot find any good examples. Could you please point me in the right direction? Many thanks in advance.

@NicholasRenotte 3 жыл бұрын

Not sure about OpenGL, but I've started doing it with PyGame and integrating that way!

@perryfisch2003 3 жыл бұрын

@@NicholasRenotte Great, I have no attachment to OpenGL, I just thought that was the way its done. Do you have an example with PyGame?