Q Learning Explained (tutorial)

Рет қаралды 346,556

Siraj Raval

Күн бұрын

Пікірлер: 273

@Sluppie 5 жыл бұрын

Always love it when a programming-related video starts with "Hello, World." Not even kidding.

@ImtithalSaeed 3 жыл бұрын

exactly

@amidhmi5243 7 жыл бұрын

It's a delicate balance between efficiency and curiosity

@KunwarPratapSingh41951 7 жыл бұрын

Hey Siraj, I want to thank you for bringing such epic videos on AI and mathematics to inherit hunger for learning. *A good teacher is one which ignites a spark in student* ...Love from India.😂😢🤓❤

@SirajRaval 7 жыл бұрын

sending hugs

@adityapatil325 7 жыл бұрын

I think I should unsubscribe until I learn enough Deep Learning, as these videos are giving me existential crisis.

@stefan-ls7yd 7 жыл бұрын

Aditya Patil 😂

@greenbillugaming2781 7 жыл бұрын

Aditya Patil same here bro :(

@SirajRaval 7 жыл бұрын

nah dont worry im just going to get better at explaining

@stefan-ls7yd 7 жыл бұрын

Siraj Raval +100 IOTA

@hdef6602 7 жыл бұрын

yup

@manuelkarner8746 5 жыл бұрын

love your videos great Work buddy ! i am starting to study AI next week and I am freaking excited as well as confident because your channel (and brilliant and other stuff) is increadibly helpful :D

@waltzofthestars2078 3 жыл бұрын

omgod besides the video being really well made and offering quality explanation(kudos!) you are the first guy from India with a totally nice pronunciation!

@philrowlands1087 2 жыл бұрын

Except for the use of ‘less’ rather than ‘fewer’ but I still think you’re amazing

@CodeEmporium 7 жыл бұрын

I have an AI test next week. It's like you uploaded this for me. Thanks!

@DriftyG 7 жыл бұрын

Great video Siraj, thanks for bringing your knowledge into the video game world! :)

@SirajRaval 7 жыл бұрын

no problem driftwood this is fun!

@smtabatabaie 7 жыл бұрын

Man these reinforcement learning series are amazing !

@AhmadM-on-Google 7 жыл бұрын

Okay Siraj these explanations were amazing... intuitive and easily absorbed ! Nice effects too xD

@AhmadM-on-Google 7 жыл бұрын

Damn why u getting dislikes tho. Anyways what you think about creating a general game bot to rekk all online scores

@aaronsilver-pell411 7 жыл бұрын

This is helping me to understand life strategies, lmao. Great video Siraj.

@slugfiller 7 жыл бұрын

This video misses an important issue with Q learning. The Q function is based on possibly getting a reward in the future, even if a reward is not available right away. If the algorithm keeps getting into new states, it might posit that the lack of rewards are simply a case of it getting closer to a large reward in the far future. It won't know to "correct the value down" until it loops back to (and finds itself unable to escape) a previous state. It can become "stuck" with that bias, so long as it's not sure it's in a closed state list. The more states are in the system, the larger this bias can become.

@WesKeppy 7 жыл бұрын

"Of course you are, you beautiful wizard."

@SirajRaval 7 жыл бұрын

haha its true

@siriwessberg7460 5 жыл бұрын

Me and my study partner literally refer to you as "our friend Siraj" when we are studying ML because you never fail to help us understand all the concepts that were unclear before

@fish1968-utf8 Жыл бұрын

I think key of q-learning is to mimic human's leanring that we always learn under some motivation, otherwise we could n't be too good at anything. The algorithm introduce result of action as feedback into shaping decision making

@jinxblaze 7 жыл бұрын

i did this for my college project :) thanks siraj

@rhejamphi 7 жыл бұрын

Someone has a supply of NZT.

@alberjumper 7 жыл бұрын

Great video! Q Learning FTW! And thanks for the shoutout :D

@SirajRaval 7 жыл бұрын

np alberto

@UnboxingSve 7 жыл бұрын

Amazing work Siraj!

@zakarie 7 жыл бұрын

Great siraj, keep them coming

@dsuryas 7 жыл бұрын

Man, how do you gather so much information so quickly???😱

@dan-garden 7 жыл бұрын

Deepak Surya AI

@dan-garden 7 жыл бұрын

Def Tank Not sure but probably :P

@mooe20 7 жыл бұрын

Deep learning man...

@dsuryas 7 жыл бұрын

AJ J that'll be crazy fast 😂😂

@g0d182 7 жыл бұрын

Internet

@tallwaters9708 7 жыл бұрын

With all this talk about agents you should consider perhaps doing an extensive video on Multi-Agent Systems. Jason etc...

@GameCasters 6 жыл бұрын

where can i find a simpler video? most of the terms he uses, i don't understand

@vitulus_ 6 жыл бұрын

4:05 "it's called the fuck you function" - that's what I heard

@daesoolee1083 7 жыл бұрын

Wow. You're really really good at explaining things in a super easy and fun way :) Amazing video! I love it!

@trevorgustavgreen8148 5 жыл бұрын

Awesome, I found a hidden gem on youtube

@talkohavy 4 жыл бұрын

I watched this video TWICE !!! My first time: With zero background, I understood nothing! I Rage quit in the middle. *A full semester passes by in which I took Deep Reinforcement Learning class * My second time: Oh...! So that's what he was talking about! I guess there's a special lingo that needs to be learnt, and making a 10 minute youtube video about it is absolute pointless, cause the ones with zero background won't understand jack shit, and the ones with background already know this shit. So... you know.

@saveerjain8168 4 жыл бұрын

I have background in other types of ai, yet I want to learn q learning. I know the lingo but don’t know this. Boom, it’s helpful.

@kalebbruwer 6 жыл бұрын

I get the concepts of quite a few different models now(RNN,CNN,normal feed worward), but I have trouble putting it into code. Can you please point me to a good resource to learn tensorflow itself from, please?

@LawrenceDCodes. Жыл бұрын

Here in 2023 because ... reasons

@Nick_With_A_Stick 9 ай бұрын

Just so happens we all have the same reason 😉

@beepbop_p 3 ай бұрын

We do be having these reasons

@lucutes2936 2 ай бұрын

None

@RendallRen 7 жыл бұрын

I didn't get the 'liberal arts major' reference. Who is the bearded man in the inset at 0:52?

@khiljichand 7 жыл бұрын

Love your videos! Thanks so much for making Machine Learning interesting. :D

@SirajRaval 7 жыл бұрын

thanks!

@aksjhdbaksjhdbNotASpam Жыл бұрын

Great easily understandable video!

@CTimmerman 7 жыл бұрын

TD is nice for Mario, because jumping a Goomba has little influence on the rest of a level, but MC is great for Go, where early moves are important later.

@cash4laughs71 7 жыл бұрын

Best teacher ever.

@nfcopier1 7 жыл бұрын

Siraj, you understand computer science far more deeply than I do. But I think you need to review clean coding practices. For developing an algorithm, it might not be a big deal. But if you want to share your code with others - for distribution, review, or learning - readable code will make the process much smoother.

@zachwhelpley661 5 жыл бұрын

Dude, you're allowed to blink haha

@alljiang 7 жыл бұрын

27 liberal arts majors watched this video

@debayondharchowdhury2680 5 жыл бұрын

163 now.

@nizamuddinahmed8913 5 жыл бұрын

204 man

@JohnDoe-uq2qd 5 жыл бұрын

237

@brianmvukwe5506 9 ай бұрын

This is top tier content man. Thank you so much!

@bauwndule 7 жыл бұрын

Man, the AI community can't thank you enough!

@SirajRaval 7 жыл бұрын

thanks for watching!

@Ludens93 Жыл бұрын

I know Q learning is a type of reinforcement learning. But I'm wondering if adding human feedback in the loop makes the model more accurate and less prone to mistakes.

@StickgeneralArmy 2 ай бұрын

oh that makes so much sense. huh. cause then you just make an n+1 dimension matrix with all but the last axes being the different input values from sensory data and the final axis being its control over a single given output. Want more outputs, just increase the depth of the final dimension to however many outputs you need. Add in a random chance to fail and a small delapidation function such that it slowly forgets irrelevant data and it should be good to go

@prinkle12 7 жыл бұрын

Hi Siraj, Thanks for introducing and teaching such great stuff every week. These are of great help. Recently I was studying about agent-critic reinforcement learning and I found its methodology quite similar to GAN where agent performing the role of generator and critic as the discriminator. Can you please provide your thoughts on this?

@Zohbie 7 жыл бұрын

0:50 For that joke you really deserve a subscribtion! :D

@precogtyrant 7 жыл бұрын

much better than your earlier ones. Pace is good and there're less memes and gimmicks. Good job!

@dmarsub 7 жыл бұрын

4:20 q funktions seem great for speedrunning, but I wonder if there is only limited computing power TD algorithm could learn quicker and if you finish the learning process with a q algorithm it might have some cornerstones to find out the best way in a better manner

@aliazizi129 7 жыл бұрын

U do really great job Siraj,but ur videos look like just a lecture and no effective learning just some informations that flooded all over the internet.it would be better to explain exactly what u do on a example code and explain the entire code step by step.i hope U really do it i have so many unsolved questions that no one can answer it if u explain Ur code for this video and other videos like this U will help us so much. thx alot man.

@keanzoe 7 жыл бұрын

the title said "Q Learning Explained" not how to make Q Learning..you have to know the difference

@bobsmithy3103 6 жыл бұрын

I've found that for the 50 or so videos of his I've watched over the past year and a half, I have learnt absolutely nothing. I've probably wasted 50-100 hours on his channel, just following along, rewatching stuff, trying to understand stuff, but never actually understanding. Most of the time, watching his videos gave me the impression that I was learning when in actuality I wasn't learning crap. I can't even recall anything that I learnt from watching any of his videos this video included. Most of what he says is just lost in the jaron and at times I even feel like he's not even trying to teach but just to give the illusion of teaching others. His videos will probably be helpful for those that already know what he's talking about or are already deep in the field, but for individuals that are just starting off on this ml journey, I doubt they'll learn much from him.

@subjord 6 жыл бұрын

@@bobsmithy3103 You need to do the coding. You won't learn programming by watching stuff.

@yelenayu8522 4 ай бұрын

Hi Siraj, thanks for sharing. It seems the code is no longer working.

@prateek6502-y4p 5 жыл бұрын

How do u expect one to grasp everything if u explain with the light speed

@BillBaxter 5 жыл бұрын

Playback speed 0.75x. I’m serious.

@saurabhiim 6 жыл бұрын

Hi Siraj, I request that you make your videos in such a manner where any layman can understand the logic behind the same ...I know its sometimes very tough by knowledge is that only ...to make the complex things easy ....cheers

@EickSternhagen 4 жыл бұрын

Quality of a certain action in a certain state. Bellman equation. Algorithm.

@boffo25 7 жыл бұрын

What were you trying to cook with q learning?

@SirajRaval 7 жыл бұрын

haha needed my green screen

@JamonAllan Ай бұрын

Just so you know I have CIA papers dating back to 63 mentioning A-Q machine learning!

@st101k 7 жыл бұрын

As an always very informative and perfect ☺

@enobil 7 жыл бұрын

For mario, think about the state space. Then think about markov chain approach that needs exact match of the state. I hope you get my point that no one is going to have enough ram and time for it to train for mario. You can downsample as you can but still the approach is not allowing a reasonably sized state space.

@BigDvsRL 5 жыл бұрын

Nice :) Hope this will help me create an AI which solves "Plague Inc" xDDD

@Sohlstyce 4 жыл бұрын

Nah teach it to infect Greenland first

@AlexeyKravets 5 жыл бұрын

Why are there no subtitles, at least in English? Are you not interested in foreign listeners?

@crazyoldhippieguy 4 жыл бұрын

So is python the laugage for deep learning?

@JamonAllan Ай бұрын

This is what Qanon think is their saviour!! Q is AI I told them!

@fish1968-utf8 Жыл бұрын

What's 'Q'? The 'q' in q-learning stands for quality. Quality in this case represents how useful a given action is in gaining some future reward.

@Mirandorl 5 жыл бұрын

No one ever seems to talk about how the agent knows which actions are available to it. Where are the options "left and right" defined?

@Yannoux2000 7 жыл бұрын

that helped me out understanding my issue with exploration thx.

@nuadathesilverhand3563 6 жыл бұрын

Dude, can you slow down a little so that I don't have to hear you gasping for air? Or so that I can figure out what you're saying?

@sadeghshaikhi5950 4 жыл бұрын

i wasn't able to get anything from this

@get_downed_boi6270 3 ай бұрын

When dumb= 1

@adamwespiser9209 7 жыл бұрын

Surprisingly good....hmmmmm. Great job!

@AfdalWahyu 7 жыл бұрын

Hi siraj, great video. btw you should change your mic to more high quality mic. as headphone user i'm not comfortably enough watching your videos i still can hear noise in the background. anyway keep the good work

@otonanoC 5 жыл бұрын

Why is Siraj in the kitchen? Is he about to show us how to cook something?

@cheungtyrone3615 5 жыл бұрын

This explains the algorithm neatly and vividly. I happened to encounter Q learning when reading a paper and I had been consulting blogs and posts on this for quite a while but I always felt like something is missing out. Actually, Bellman equation is the only ingredient that matters in this recipe, but with "rigorously" formated text only, it can be hard to figure out what it is doing.

@grantstenger6182 7 жыл бұрын

Why is the q_table initialized as np.zeros((n_states, n_states, 3))? 3 is the number of actions, right (i.e. drive left, drive right, do nothing)? Why would we need two dimensions for the number of states?

@paviad 7 жыл бұрын

Hey Siraj, great video (and your other videos are also great). I got a question, what happens if it takes a long time to complete an "episode"? How do you efficiently train the network in that case?

@darshanshah9623 4 жыл бұрын

Which Software you use for Animation?

@recklesflam1ngo968 4 жыл бұрын

Don't watch this person since he is plagiarizing others works

@marcel2711 4 жыл бұрын

why anyone talking about reinforcement learning does this only in python. I wanna see examples in c++, in c#, in java.. why only python?

@beypazariofficial 3 жыл бұрын

dude just get the point, you can implement it in any programming language. and converting python code to c# is easy. c# has all the things python has. maybe lots of lines but, it's easy to do.

@sarangs8441 7 жыл бұрын

Can you make a video on setting up python with all its libraries needed for your videos. I am having a hard time knowing what all libs I need for your older videos. Which version python do you use: 32bit or 64bit.

@VictorGallagherCarvings 7 жыл бұрын

You need to check out the youtube channel 'sendex'. Also I am 99% confident that he is using the 64bit version.

@sarangs8441 7 жыл бұрын

Victor Gallagher thanks a lot

@bauwndule 7 жыл бұрын

senTdex right?

@VictorGallagherCarvings 7 жыл бұрын

Sorry, yes sentex is right.

@VictorGallagherCarvings 7 жыл бұрын

got it wrong again, 'sentdex'

@Madlion 7 жыл бұрын

Is model here referred to model of the environment/world?

@Cyphlix 6 жыл бұрын

0:49 this is why I subbed

@vikramb183 6 жыл бұрын

I'm confused about how this is machine learning. It seems to me that the computer just creates a lookup table. Could you please clarify, for I'm sure I am missing something?

@mahirgulzar5403 6 жыл бұрын

Well he only talked about finding the optimal policy.. But not to forget you have to generalize on the optimal policy i.e you drop that agent in an environment which it doesn't know about or let me put it this way.. Suppose your agent has learned the policy that whenever the distance between the agent and an obstacle is precisely 5 meter it should apply brakes but heres the catch.. The state space is not discrete always it can be continuous. In-fact in real life it is continuous. So machine learning comes to play at this point.. It takes the state vector and applies function approximation (can be a neural network) to spit out an action on it.. Hope that helps.. :) PS: ML is nothing but function approximation or curve fitting..

@vikramb183 6 жыл бұрын

Thanks! I think I understand now.

@matthewdaly8879 7 жыл бұрын

Does this mean that the player has to have already been in a state and taken some actions to make an optimal decisions, or is there a technique to use past results to estimate future rewards i.e. a neural network? With the state consisting of two different variables in this case, it seems like to would take a while for the car to find the best actions to take for each occurring state in a reasonable time. I'm a little confused.

@dustinandrews89019 7 жыл бұрын

"a technique to use past results to estimate future rewards i.e. a neural network", yes. Q learning is exactly that. Start with a untrained agent that knows nothing of the environment. Also strongly bias that agent to random actions at first in order to gather data. Next, allow the agent to take some number of actions in the environment, while recording the entire session. At the end take the reward, in this case it could be "units to the right of the start." No go back over the recording and apply a share of that score to every move (perhaps with a decay for the older actions.) Finally, feed each instance of the replay, one at a time, into the network. You have to provide the state of the world + action and train it towards the score. If you keep doing this your network will start to converge on an understanding of what moves will create what score. Once you gain some data stop purely randomly sampling actions. Start using predictions from your model to inform the next move (the rate at which you go from pure random to pure agent is an important hyper-parameter.) If all goes well your agent learns better and better actions to take in each situation until it knows how to get very good scores. At least, that's how it should work! I'm struggling to get my model to converge on a similar toy example.

@SirajRaval 7 жыл бұрын

what dustin said

@matthewdaly8879 7 жыл бұрын

Thanks

@dustinandrews89019 7 жыл бұрын

Well that made my day. Thanks Siraj!

@shantomathew-fh3hv 7 жыл бұрын

Thanks for doing this Siraj. But I am running this in Ubuntu. I am not able to see anything though the code runs fine as i can see the iterations. Any idea how to fix it.

@PranshuTople 5 жыл бұрын

how do i get the gui of mountain and car?

@Metalwrath2 5 жыл бұрын

open ai gym

@PranshuTople 5 жыл бұрын

@@Metalwrath2 thanks

@ronstubed 6 жыл бұрын

How do we check the convergence of Q matrix?

@diegoosorio7752 3 жыл бұрын

Great video!

@aey2579 4 жыл бұрын

This guy is too smart for me. I need someone who is on my low IQ to explain this.

@Foxhood 4 жыл бұрын

I would suggest to look up Code Bullet. He does fun AI stuff in an easier to comprehend, sillier manner. A.I Learns to DRIVE does Q-Learning and its bigger brother Deep Q network. Not as in-depth, but fun to see and gives a good starting idea of what the AI does.

@somtoachu5704 7 жыл бұрын

this and genetic algorithm which one is better?

@aniseedus 6 жыл бұрын

Must the reward be only discrete 1 or 0? Can it be an intermediate fraction or decimal?

@Palamdrone 7 жыл бұрын

Hey Siraj, thanks for the video. I have a probably naive question. With Q learning, an observation and score is given to an agent by the environment. Does this mean that q learning requires an environment that is perfectly informed? For gym, goals are clearly defined to the world such as getting to a destination. What if the goal is not so well defined? Example: What if I want to use q learning for exploration of a fixed geometry environment with the goal of finding resources that are not immediately know to the environment. Now it's unclear to me as how to define the score for each frame as the environment would not know about where the resources are in the first place until the agent is close enough to spot it. Sorry for the lengthy comment!!! I understand you are very busy and my comment might be extremely dumb so any help is greatly appreciated! Thanks again!

@eagleswildcard 7 жыл бұрын

Great work man

@daggawagga 7 жыл бұрын

What kinds of approaches can you take when there isn't an obvious reward metric to feed to your algorithm? Let's say you wanted to make an AI to begin and finish a game that doesn't seem very linear such as Zelda or Metroid, or an analogous but unknown game. Do you just cram as many item counters as you can for measuring rewards?

@BosakMaw 7 жыл бұрын

What is the difference between Q-function and Value-function? The formulas look very similar to me

@neilslater8223 7 жыл бұрын

The Q function is also called the action value function, the main difference is that V(s) is the value of "being in state s" Q(s, a) is value of "being in state s, and taking action a". Both have similar-looking Bellman equations and Bellman optimality equations, and they can be related to each other based on how teh environment works and what the current policy is (no equation-writing in KZbin comments, so cannot show you :-(. Many model-free RL algorithms use Q because it gives you a way to select the next action (just pick a so that Q(s, a) is maximum out of all possible Q(s, ?) - whilst if you have V(s), the only way to maximise it is to look ahead to see what the next state will be, which you can only do if you have some way to predict what the next state (i.e. a model)

@souravjamwal77 7 жыл бұрын

From where should I start learning AI and Machine Learning Pls help me guys I am a beginner and I know Python programming

@davidrey6126 6 жыл бұрын

First learn calculus 1 and 2, if possible learn calculus 3. After calculus 2 you can start learning probability and statistics. Make sure to learn the full details for statistics, not just basic stuff like normal distributions. Learn many different well known distributions such as chi square, gamma, etc. Learn how to do inferential statistics such as point estimation, interval estimation. Once you spend 1 to 2 years learning the Math and also at the same time brush up your python and R skills for data science packages. Finally you may start looking into machine learning techniques. But first start with good ol linear regression and logistic regression (more math + linear algebra). And then learn statistical learning methods that have been rebranded into machine learning methods from 50 to 100 years ago. Once you're done with that you can start learning some more modern methods like reinforcement learning and neural networks. There is no ONE, BEST, Tool in machine learning. Every method is suitable for different cases. But yes, neural networks are cool so everyone talks about it.

@Raj_Patel21 7 жыл бұрын

I am new here and dont know from where to start learning this stuff. Any suggestions

@samacumen 6 жыл бұрын

Hi Siraj. Thanks for the video. Can you tell me what tools do you use to edit those awesome videos? Thanks.

@andriibogomazov7863 7 жыл бұрын

Nice kitchen background, but the plain background with memes are easier to watch and focus... btw did you slow down the video by 10-20% ?)

@chicken6180 7 жыл бұрын

can anyone explain what the value of n_states represents in the program? does this mean there are only 40 possible positions in the environment or what? thanks in advance edit - i'm thinking that the "3" in "q_table = np.zeros((n_states, n_states, 3))" represents the fact that there are 3 possible actions for the car? i'm confused.

@philrowlands1087 2 жыл бұрын

You are brilliant. I only dream of having your ease of understanding of these processes!🎉

@Perryman1138 7 жыл бұрын

One thing I’ve wondered is how to tackle AI for tasks that can’t be parallelized or sped up (I.e. model-free, real time task based AI)

@Yannoux2000 7 жыл бұрын

Perryman1138 record some example data so you would pre train your model. if i do remember correctly it s called imitation learning. by giving the agent some interessing actions path with aready computed rewards. the more the better.

@Perryman1138 7 жыл бұрын

Ah I see. In particular, I was thinking of Dungeon Crawl Stone Soup, a color-graphics terminal roguelike for which recordings of thousands of games exist in text format, but I believe it might only record the game outputs, not the player inputs. Still, a fascinating concept! Thanks!