Train AI to Beat Super Mario Bros! || Reinforcement Learning Completely from Scratch

Рет қаралды 12,153

Күн бұрын

Today we'll be implementing a Reinforcement Learning algorithm named the Double Deep Q Network algorithm. A lot of other videos will use a library like Stable Baselines, however, today we'll be building this completely from scratch. It'll be used to train the computer to play Super Mario Bros on the NES! This is a tutorial aimed at people that have a base level understanding of ML, but not necessarily reinforcement learning. Also, it's perfect if you're looking for a personal project to add to your resume that can be completed in a weekend.
Additionally, if you don't have the resources to train this locally, I highly recommend checking out Google Colab Notebooks!
This is my first ever KZbin video and I've been really excited to share this with you guys! If there are any questions or if anyone has any tips/advice, please don't hesitate to comment down below!
00:00 Demo & Intro
03:02 Key Reinforcement Learning Vocabulary
07:47 Epsilon-Greedy Approach
09:32 Replay Buffer
10:20 Action-Value Function Intuition
15:19 The DDQN Algorithm
18:39 DDQN Pseudocode
19:39 Implementation in Code
30:21 The AI Beats the Level!
30:56 Conclusion
SOURCE CODE
github.com/Sourish07/Super-Ma...
PAPERS USED AS REFERENCE
Human-level control through deep reinforcement learning
www.nature.com/articles/natur...
Deep Reinforcement Learning with Double Q-learning
arxiv.org/pdf/1509.06461.pdf
DOCUMENTATION
PyTorch Documentation
pytorch.org/tutorials/interme...
pytorch.org/rl/reference/data...
Gymnasium Documentation
gymnasium.farama.org/index.html
gymnasium.farama.org/api/wrap...
TEXTBOOKS
Learning Deep Learning by Magnus Ekman
www.nvidia.com/en-us/training...
Reinforcement Learning, Second Edition by Sutton, Barto
mitpress.mit.edu/978026203924...
OTHER
CNN Explainer
poloclub.github.io/cnn-explai...
Introducing ChatGPT
openai.com/blog/chatgpt
All content on this channel is produced by and is the intellectual property of Sourish Kundu LLC.

Пікірлер: 108

@RaunakPandey 7 ай бұрын

Great video! I like how Sourish made complex topics seem so simple. Can’t wait for more videos to come

@sourishk07 7 ай бұрын

Thanks Raunak! More to come, I promise!

@akshaynaik4197 7 ай бұрын

This is awesome, Sourish! I love the animations! Explanations were incredibly thorough and well put. I can't wait to see what you do next!

@sourishk07 7 ай бұрын

Thanks Akshay! I really appreciate it

@bordplate. 5 ай бұрын

I've watched a bunch of videos on DQN implementation the last days, and only after having gone through a bunch of trouble to implement mine do I see your video. This is hands down the best video on this topic I've seen. Great work!

@sourishk07 5 ай бұрын

Thank you for those kind words! I'm glad it was helpful. Feel free to share this with any friends that you think may be interested as well!

@Chak29 7 ай бұрын

This is awesome, could not have ever guessed it's your first video! Can't wait for the next video!!

@sourishk07 7 ай бұрын

Thank you so much! Next video is in the works right now :)

@archansen8084 6 ай бұрын

Super cool video! Can’t wait to see what comes next from the channel!

@sourishk07 6 ай бұрын

Thank you Archan! I promise you won't be disappointed :)

@saanvim1788 7 ай бұрын

This is so cool and helpful Sourish ! Super interesting watch

@sourishk07 7 ай бұрын

I appreciate it Saanvi! I'm glad you enjoyed it

@bhaskarmondal7461 Ай бұрын

such an awesome tutorial. I have always loved video games and loved ML since got to know about it. This is project helped me a lot with Reinforcement Learning Concepts and at last I got to combine my two favorite things together !

@sourishk07 Ай бұрын

I'm really glad to hear that! Don't worry, I have more RL content in the pipeline!

@bhaskarmondal7461 Ай бұрын

that's great, we are all waiting@@sourishk07

@SybilGrace 3 ай бұрын

Great job. Looking forward to watching more videos from you. Machine learning is so cool.

@sourishk07 3 ай бұрын

Thank you so much! And yes I agree!

@biancaturman8917 7 ай бұрын

This is amazing!!!

@sourishk07 7 ай бұрын

Thank you Bianca! Stay tuned for more videos coming soon

@mayankgarg9728 4 ай бұрын

Very smooth and hierarchical way of explaining. Should create more such videos.

@sourishk07 4 ай бұрын

Thank you! Don't worry, we got a couple more in the pipeline!

@Spideyy2099 4 ай бұрын

Please make more videos like this! Everyone else like and share this!! Its simple and clear with lots of education. Plus you can see all the code at once and not have to jump around the video! Gem of a coding video!

@sourishk07 4 ай бұрын

Thank you so much for the kind words! Don’t worry we have some more in the pipeline :)

@_kumar06 2 ай бұрын

Best KZbin channel for learning deep learning. Please make more videos like this.

@sourishk07 2 ай бұрын

Thank you so much! Don't worry, I have many more machine learning topics in the works which I'm really excited to share with you guys!

@suryaraghavendran3627 7 ай бұрын

Fantastic video!

@sourishk07 7 ай бұрын

Thanks Surya! Excited to produce more videos for you

@ArtOfTheProblem 3 ай бұрын

really nice work

@user-wm6fn5bb3k 2 ай бұрын

The best channel. It helped me a lot!

@sourishk07 2 ай бұрын

Thank you for this kind words and I’m glad I was able to help! Looking forward to sharing my next RL video with you!

@hakan6449 Ай бұрын

Most simple yet excellent video on RL. Great Work!. Maybe in the next video you can implement it without a ready environment like gym.

@sourishk07 Ай бұрын

Hi! I'm glad you liked the video! Don't worry, I do plan on making a video about custom environments soon! Stay tuned

Ай бұрын

@@sourishk07That would be nice to see.

@mariorodriguez8854 28 күн бұрын

keep this way to teach, thanks!

@sourishk07 15 күн бұрын

You bet! Thanks for watching!

@mridinithippisetty8867 6 ай бұрын

Awesome vid!

@sourishk07 6 ай бұрын

Thanks for the visit Mridini!

@Trails_in_the_Sky 7 ай бұрын

What version of python did you use? I'm trying to run the repo code but I can't install the requirements at all.

@sourishk07 6 ай бұрын

Thank you for asking! I apologize for not providing clearer instructions. I use Python 3.10.12 and I've updated the repo with some more details about my installation process.

@oleander1956 4 ай бұрын

Hey Sourish, ive watched a couple of other videos where some of the versions were outdated and so a lot of things were broken and i couldnt fix them or understand the errors cusse i truly dont know what im doing. I just started your video and you mentioned how you understood every line of code as a sophomore . I want that . What resources should i follow in order to fully understand what im doing in this game? Thank you. Great content

@sourishk07 4 ай бұрын

Hi! Thanks for the comment! My goal for this video was to serve as the introduction to the DDQN algorithm such that you can understand the code after watching it. If there are any confusing parts in the video or code that I can clarify, please let me know! But if you're talking more generally about ML, then I highly recommend Andrew Ng's "Deep Learning Specialization" course on Coursera. It is literally the one-stop shop for Machine Learning and has been the foundation for all of my exploration in the ML space. I also recommend Learning Deep Learning by Magnus Ekman! It's a great book to get more familiar with ML.

@evangill6484 9 күн бұрын

Have you by any chance done or found any follow-up types of projects that incorporate multiprocessing? I was curious if there's a way to speed up the training by doing a few simulations at the same time and communicating results to the agent. Any thoughts or ideas?

@sourishk07 8 күн бұрын

Hi! That’s a great question. I’m actually working on running parallel environments right now so it’s a crazy coincidence that you asked. Stay tuned :)

@sohamkundu9685 4 ай бұрын

Great video!

@sourishk07 4 ай бұрын

Thanks for the visit

@gamermanv 5 ай бұрын

dope video! If I had to make a graph showcasing the rewards vs episodes, how would i go about doing it?

@sourishk07 5 ай бұрын

Great question! I would import matplotlib at the top. And then before training starts, I would create an empty list. For each episode, set a reward counter to 0 and accumulate the rewards gained while playing the episode. After the episode ends, append the total reward to the list. Let me know if something doesn't make sense! Pseudocode: import matplotlib.pyplot as plt rewards = [ ] for i in range(NUM_EPISODES): reward_counter = 0 while not done: reward = env.step(...) reward_counter += reward rewards.append(reward_counter) plt.plot(rewards)

@gamermanv 5 ай бұрын

@@sourishk07 that makes perfect sense! Thank you for the explanation!

@U_Lambda 2 ай бұрын

Question, in your setup, where are you defining/tweaking your reward function? Btw great video!

@sourishk07 2 ай бұрын

Thanks so much for watching! And that's a good question; Maybe I should've specified more clearly in the video, but the reward function is already handled by the gym_super_mario_bros package. Check out the 'Reward Function' section in the gym_super_mario_bros documentation! pypi.org/project/gym-super-mario-bros/

@joyantamitra8186 4 ай бұрын

Please publish more. I have learnt a lot

@sourishk07 3 ай бұрын

Thank you! I'm glad you enjoyed the video. And don't worry there are more videos like this one in the pipeline! Feel free to check out my other videos on my channel in the meantime!

@junyehu2315 4 ай бұрын

Is torchrl has to be installed by torch 2.1.0++? when I use pip install torchrl, it uninstall my torch 1.13 and install 2.1.1

@sourishk07 4 ай бұрын

I actually updated the requirements.txt today! It uses PyTorch v2.1.1 and torchrl v0.2.1. If you have to stick to PyTorch v1.13 then I recommend installing a specific torchrl version that is compatible with v1.13 (pip install torchrl==x.xx) You may need to consult the PyTorch or torchrl documentation for the right version numbers

@fruitpnchsmuraiG 5 ай бұрын

would i be able to run the repo on my local system even if it it lacks the harware requirements like a very solid gpu

@sourishk07 5 ай бұрын

To be honest, it really depends. I trained this code on a RTX 3080 and it took me ~48 hours, but I didn't really dive too deep into the hyperparameter optimizations. If you feel like your GPU isn't powerful enough, I highly recommend checking out Google Colab Notebooks where you can run a Jupyter Notebook in the cloud that has access to GPUs!

@ArtOfTheProblem 3 ай бұрын

would be cool to make a super simple interactive environment of this, something that kids or non-experts could play with, to see playing results. have you seen that?

@sourishk07 3 ай бұрын

I'm glad you enjoyed the video! When you say interactive environment that someone could play with to see playing results, what exactly do you mean? The gym_super_mario_bros library already supports direct keyboard inputs. Also, if you meant a way someone could see the latest model weights without training, then I would recommend looking at how to save & load checkpoints. In the repo, main.py has some code to save checkpoints during training. Let me know if you meant something else though!

@ArtOfTheProblem 3 ай бұрын

Good question. I was thinking a demo where a non expert could "see the magic in action" so I'm thinking it would be something like...A. Showing an initially unlearned behaviour (random) as well as time unfolding, B. allowing users to adjust key learning parameters (what would you say would be most important, gamma/discount factor, learning rate, what else?) and maybe C. showing some visual of how the network weights are updating (perhaps just a pattern/visual) mainly to show 'change' taking place....curious what you think would be most useful, i'm thinking as a demo for non experts. @@sourishk07

@brianferrell9454 4 ай бұрын

Not sure why people dislike this video, it was awesome!

@sourishk07 4 ай бұрын

Thank you for those kind words! I'm glad you enjoyed it

@kag46 3 ай бұрын

Hey Sourish, thanks for a great video! I can see that my 7 y.o. machine not utilized by 100%, is it possible to increase game engine speed? it looks like it plays faster than real time, but still slower than machine can and I believe it can do it better at learn phase! xD Initially I thought setting DISPLAY = False will speed it up, but seems not..

@sourishk07 3 ай бұрын

Don't worry, that is also a concern of mine! That's why I'm working on a follow up video where I try some hyperparameter tuning and parallelize multiple instances of the environment. Running multiple environments at the same time should especially help in maximizing CPU usage, while the GPU can continue handling the neural networks.

@theashbot4097 4 ай бұрын

This is soo cool! You put a lot of effort into this video! I have use Reinforcement Learning before in the unity game engine, and I have made a tutorial on how to use it in unity, but I have never seen it been used outside of unity, and it looks very cool! I want to take my Reinforcement Learning knowledge and take it out of unity, and start to train a robot in real life. The robot is not coded python, it is coded in java. I have not starting researching it yet so I do not know how hard it will be but I was wondering if you have any knowledge on how to connect a Neural Network yo java. If you do not that is just fine. I do not think it should be too hard to research.

@sourishk07 4 ай бұрын

Thanks for watching and I’m glad you enjoyed it! With regards to your robot, that sounds like a really cool project! I’m not exactly sure how best to connect the two. Maybe you can create a simple API in Python that your Java code can call?

@theashbot4097 4 ай бұрын

@@sourishk07Thank you for making this! Ya that is what I was thinking. I am first going to try it in a C# project because I am more familiar with it. Then later move on to java.

@ApexArtistX 5 ай бұрын

What I’m looking for external game bot

@pgiralt 5 ай бұрын

The video says it uses Gymnasium, but as far as I can tell, you're using the older Gym and not the newer (maintained) Gymnasium, correct?

@sourishk07 5 ай бұрын

Yes, you're correct. I apologize for the oversight in the video. gym_super_mario_bros is a pretty old library (last updated in June of 2022) and it hasn't been updated to use the new Gymnasium library. However, the latest version of the older gym library is v0.26.2 so it does include the new, breaking changes that were introduced in v26 Gymnasium API. This is why in the code I have to set the apply_api_compatibility flag to True when making the environment.

@NeuralGamerAI 4 ай бұрын

I have a problem with the box2d library. I've tried everything but couln't get to work

@sourishk07 4 ай бұрын

The box2d library shouldn't be a dependency for this project! Is your pip somehow trying to install it or were you asking just in general?

@66a2.vijayendarreddy3 Ай бұрын

I have a doubt on what should we use as controllers Should those be images as 0-4 png

@sourishk07 Ай бұрын

Regardless of what you name the PNG files for each controller, I recommend using a dictionary that maps each action to its corresponding image. The index value of each action that is available to your agent depends on your chosen action space. In this video, we chose RIGHT_ONLY, but you can select others from gym_super_mario_bros.actions.

@ApexArtistX 5 ай бұрын

can you do a custom environment that plays external web game

@sourishk07 5 ай бұрын

I've actually received another request for a video on custom environments as well! It's a pretty hard topic to tackle, but it's definitely on my list. I'll keep you posted on when I'm able to create it! Thank you for the request!

@PriyankaJain-dg8rm 3 ай бұрын

Approximately how many episodes did it take for your mario to learn and reach the flag?

@sourishk07 3 ай бұрын

So I trained for 50,000 episodes but I would see the level being completed at around 40,000 albeit pretty inconsistently

@user-kl1yh4ub1o 4 ай бұрын

that sound greats

@sourishk07 4 ай бұрын

Thank you for watching!

@PriyankaJain-dg8rm 3 ай бұрын

how to end the training process?

@sourishk07 3 ай бұрын

So I typically train for a set number of episodes while checkpointing every so often. In this case, I would checkpoint every 5000 episodes. Then once the training finishes, I can load the weights from any of the checkpoints to see how the model performs with that many episodes. For evaluation, I would also lower epsilon to a small number, maybe like 0.1.

@vivekpadman5248 5 ай бұрын

These algorithms are too old now, we have to have some foundational model in rl soon, but ya gr8 work bro ❤

@sourishk07 5 ай бұрын

Yeah you're probably right haha. Gotta start from the basics first I suppose. And thank you for the view!

@vivekpadman5248 5 ай бұрын

@@sourishk07 yup that's right man, all the best 👍 rl is a world of itself

@kushagrasingh6361 16 күн бұрын

can someone provide the trained model

@sourishk07 15 күн бұрын

Hey! Thanks for the question. I actually have a follow up video planned where I'll be optimizing the training process for this agent. I'll make sure to upload the model then!

@CouchPotator 3 ай бұрын

huh, Kush Gupta did this and achieved a much better result with far less episodes. I wonder why.

@sourishk07 3 ай бұрын

Hey thanks for pointing that out! I checked out his video and he was using a different algorithm, PPO. That might be one reason. A second is that my hyperparameters might not be the most optimized, which is something I leave to the viewer to experiment with on their own!

@KushGupta1 3 ай бұрын

The funny thing is I tried using DDQN first but I wasn't able to beat a lot of levels using this approach & it was taking way too long, so I eventually switched to PPO

@sourishk07 3 ай бұрын

@@KushGupta1 Haha well it's reassuring that long training times with DDQN aren't only a problem for me! Btw, I loved your video Kush

@agenticmark 2 ай бұрын

302 subs is crazy low for this content.

@sourishk07 2 ай бұрын

Haha thank you for those kind words! I'm glad you enjoyed the video!

@curtisnewton895 2 ай бұрын

try explaining that again to AI beginners and non mathematicians what are you trying to prove here ?

@sourishk07 2 ай бұрын

Hi! Thanks for sharing your concern! I completely understand that this video might be a tad overwhelming for complete beginners and people that aren't too comfortable with math. That's why at the beginning of the video, I specified that my target audience was people that have some basic understanding of the ML fundamentals, which includes the prerequisite calculus. It's my fault if that wasn't clear enough. However, if you have any specific questions about the video or about machine learning in general, please do let me know! I'll either answer them myself or point you to resources that answer the question better. Don't hesitate to ask!

@JarppaGuru 2 ай бұрын

oh noh. not intelligence. python and screen recognize do better. FIRST TIME. no need train. why train we just say do this. if there any AI then it should do lot mmore on first time . object is go right. if enemy jump over or on it. if obsticke jump. this is ground this is brick. everything is lava. dont touch lava

@sourishk07 2 ай бұрын

Yes, you make a good point, but imagine having to code up those specific rules for each separate game! The goal with RL is to have a generalized algorithm that can work with any environment! Thank you for watching though!

@mahdi_c200 3 ай бұрын

this tutorial does not help beginners , it has a lot of useless theory !! , I know you spent a lot of time to record , edit , and upload this video , but please consider that , for some beginner like me , I cannot figure out a lot of theory out of development environment , I need to know how to handle my code . and I prefer to watch a better tutorial that has more works and coding on it , instead of watching 31 minute useless video

@prabhmeet6842 3 ай бұрын

This is one of the best tutorial It has maths+code so one dosent needs to watch another video for the basics

@sourishk07 3 ай бұрын

Thanks, I'm glad you think so!

@citra5431 2 ай бұрын

he doesn't even go that deep into the theory, he gives the perfect combination of both. if you didn't want to watch the theory you could have just gone to the implementation section. but if you don't even know the absolute basics why would you try to code it? you won't understand anything and it will just be a copying exercise

@mahdi_c200 3 ай бұрын

@dennisestenson7820 3 ай бұрын

Perhaps this video isn't for you right now.

@revimfadli4666 3 ай бұрын

Tbf coding something this complex from scratch isn't exactly beginner stuff either, but xor or fashion mnist, now _that_ is beginner level from scratch project

@sourishk07 3 ай бұрын

Hi, I'm sorry to hear that! So if I understand correctly, the difficult part of this video was getting PyTorch and the python environment set up on your local machine? If so, let me know if I can see if I can make a video that talks about that topic!

@GamingAmbienceLive 2 ай бұрын

This video is for nobody its for entertainment there’s no value in it beyond that

@sourishk07 2 ай бұрын

@@GamingAmbienceLive I appreciate the comment and I'm sorry you believe that. If you have any specific problems or concerns about the video, please feel free to let me know!