Q-Learning Explained - A Reinforcement Learning Technique

  Рет қаралды 223,137

deeplizard

deeplizard

Күн бұрын

💡Enroll to gain access to the full course:
deeplizard.com/course/rlcpailzrd
Welcome back to this series on reinforcement learning! In this video, we'll be introducing the idea of Q-learning with value iteration, which is a reinforcement learning technique used for learning the optimal policy in a Markov Decision Process.
We'll illustrate how this technique works by introducing a game where a reinforcement learning agent tries to maximize points, and through this, we'll also learn about Q-tables and the trade-off between exploration and exploitation.
Sources:
Reinforcement Learning: An Introduction, Second Edition by Richard S. Sutton and Andrew G. Bartow
incompleteideas.net/book/RLboo...
Playing Atari with Deep Reinforcement Learning by Deep Mind Technologies
www.cs.toronto.edu/~vmnih/doc...
🕒🦎 VIDEO SECTIONS 🦎🕒
00:00 Welcome to DEEPLIZARD - Go to deeplizard.com for learning resources
00:30 Help deeplizard add video timestamps - See example in the description
08:08 Collective Intelligence and the DEEPLIZARD HIVEMIND
💥🦎 DEEPLIZARD COMMUNITY RESOURCES 🦎💥
👋 Hey, we're Chris and Mandy, the creators of deeplizard!
👉 Check out the website for more learning material:
🔗 deeplizard.com
💻 ENROLL TO GET DOWNLOAD ACCESS TO CODE FILES
🔗 deeplizard.com/resources
🧠 Support collective intelligence, join the deeplizard hivemind:
🔗 deeplizard.com/hivemind
🧠 Use code DEEPLIZARD at checkout to receive 15% off your first Neurohacker order
👉 Use your receipt from Neurohacker to get a discount on deeplizard courses
🔗 neurohacker.com/shop?rfsn=648...
👀 CHECK OUT OUR VLOG:
🔗 / deeplizardvlog
❤️🦎 Special thanks to the following polymaths of the deeplizard hivemind:
Tammy
Mano Prime
Ling Li
🚀 Boost collective intelligence by sharing this video on social media!
👀 Follow deeplizard:
Our vlog: / deeplizardvlog
Facebook: / deeplizard
Instagram: / deeplizard
Twitter: / deeplizard
Patreon: / deeplizard
KZbin: / deeplizard
🎓 Deep Learning with deeplizard:
Deep Learning Dictionary - deeplizard.com/course/ddcpailzrd
Deep Learning Fundamentals - deeplizard.com/course/dlcpailzrd
Learn TensorFlow - deeplizard.com/course/tfcpailzrd
Learn PyTorch - deeplizard.com/course/ptcpailzrd
Natural Language Processing - deeplizard.com/course/txtcpai...
Reinforcement Learning - deeplizard.com/course/rlcpailzrd
Generative Adversarial Networks - deeplizard.com/course/gacpailzrd
🎓 Other Courses:
DL Fundamentals Classic - deeplizard.com/learn/video/gZ...
Deep Learning Deployment - deeplizard.com/learn/video/SI...
Data Science - deeplizard.com/learn/video/d1...
Trading - deeplizard.com/learn/video/Zp...
🛒 Check out products deeplizard recommends on Amazon:
🔗 amazon.com/shop/deeplizard
🎵 deeplizard uses music by Kevin MacLeod
🔗 / @incompetech_kmac
❤️ Please use the knowledge gained from deeplizard content for good, not evil.

Пікірлер: 84
@absimaldata
@absimaldata 3 жыл бұрын
Why you are so so clear in explaining, I mean why others fail to deliver the tutorials with such clarity like you do?? I dont know whats wrong with everyone. Omg you are impressive.
@richardkessler2171
@richardkessler2171 5 жыл бұрын
One of the best series I've viewed on RL. Really great job teaching the content without boring the audience. Also...really enjoy the closing snippets that keep me excited to see the end. Excellent!
@deeplizard
@deeplizard 5 жыл бұрын
Thank you, Richard! Really happy to hear that!
@pawelczar
@pawelczar 5 жыл бұрын
This whole series is great! I love the way you explain all math concepts and questions and more that happy that you did stop on that but you introduced practical example. Cant wait for next episodes :D
@deeplizard
@deeplizard 5 жыл бұрын
Thanks, pawelczar! Glad you're liking it!
@davidkhassias4876
@davidkhassias4876 3 жыл бұрын
Can't wait for coming episodes, because this series is amazing! And they/you helped me a lot. Thank you so much!
@abcqer555
@abcqer555 5 жыл бұрын
Hi Lizard People, I feel so fortunate to have come across your channel. Your lessons/videos are very clear, concise, well produced, entertaining, and I am excited for all the videos that will be coming out. Keep up the fantastic work!
@deeplizard
@deeplizard 5 жыл бұрын
Hey Paul - Thank you! We're glad you're here!
@justinheehaw
@justinheehaw 3 жыл бұрын
I gave up when I see 1:29 for the first time (because I'm not so good at math and English) But when I came back again today and watched the entire video, I found this video the most well explained one. Especially the Q table section.
@Asmutiwari
@Asmutiwari 4 жыл бұрын
These series are so so informative !! I wish you could make videos on dynamic navigation techniques using DRL
@Ayushsingh-zw3yk
@Ayushsingh-zw3yk 2 ай бұрын
nice explanation deeplizard
@deeplizard
@deeplizard 5 жыл бұрын
Check out the corresponding blog and other resources for this video at: deeplizard.com/learn/video/qhRNvCVVJaA
@shoaibalyaan
@shoaibalyaan 4 жыл бұрын
AMAZING SERIES! Absolutely loved it!
@obensustam3574
@obensustam3574 4 ай бұрын
Very good content, I watched the videos in this playlist to prepera for my exam. Thank you 😊
@tingnews7273
@tingnews7273 5 жыл бұрын
What I learned: 1.Q-learning:Learning the optimal policy in MDP. 2.Q-learning work:learning the Q-values for each state-action pair 3.Value iteration:Q-learning iteravely updates the Q-value(Will be more clear in future I thought) 4.Q-table:store the Q-value for all state-action pair 5.Exploration:exploring the environment to find out infomation about it. 6.Exploitation:exploiting the information that is already known about(tip:epison greed)
@guineteherve9751
@guineteherve9751 Жыл бұрын
Your work is simply incredible. Thank you!
@asdfasdfuhf
@asdfasdfuhf 3 жыл бұрын
This was an exciting video, finally, we are getting to the good stuff.
@adamhendry945
@adamhendry945 3 жыл бұрын
PHENOMENAL! Your videos are THE BEST! Can you PLEASE PLEASE PLEASE do a series on Actor-Critic methods!!
@NC700xLover
@NC700xLover 2 жыл бұрын
Really nice video, thanks for the clear explanations!
@mohammadmohi8561
@mohammadmohi8561 3 жыл бұрын
u r an AI, so nicely explained all these hard concepts so easily. thank u so much
@arkadipbasu828
@arkadipbasu828 Жыл бұрын
Super explanation. Thanks from India
@DreadFox_official
@DreadFox_official 8 ай бұрын
Hey, I loved your video. Thank you so much
@neogarciagarcia443
@neogarciagarcia443 4 жыл бұрын
exploration of reinforcement learning is going fine !
@shashankdhananjaya9923
@shashankdhananjaya9923 3 жыл бұрын
Awesome explanation. I like this
@mauriziovassallo5499
@mauriziovassallo5499 4 жыл бұрын
Very clear :)
@xiaojiang2610
@xiaojiang2610 4 жыл бұрын
Better than my engineering teacher.
@rosameliacarioni1022
@rosameliacarioni1022 2 жыл бұрын
Thanks so muuuuch !
@mateusbalotin7247
@mateusbalotin7247 2 жыл бұрын
Thank you!
@arefeshghi
@arefeshghi 3 жыл бұрын
Good balance of exploration and exploitation will bring good results in life too! We are all lizards! :)
@namitaa
@namitaa 4 жыл бұрын
you saved my life bro
@NoNTr1v1aL
@NoNTr1v1aL 2 жыл бұрын
Amazing video!
@michaelscott8572
@michaelscott8572 4 жыл бұрын
Thanks for the good explanation and all your work. A little hint if I may: Don't explain the words using the same words: Exploitation and Exploration
@madhesh18
@madhesh18 4 жыл бұрын
Really good work
@cedrichung6820
@cedrichung6820 3 жыл бұрын
How are you so good at explaining😍😍😍😍
@hazzaldo
@hazzaldo 5 жыл бұрын
Brilliant video. One of the best RL teaching series/materials I've come across anywhere on the internet (if not the best). Look forward to watching the rest of the series. On this video, I have 3 questions: 1- Just to clarify is there a difference between the Q-function and Optimal Q-function? If so, is the difference that when a Q-function perform Q-value iteration, and eventually converge on the optimal Q-values, then this is called the Optimal Q-function? 2- What does the capital `E` signify in the Bellman Optimality equation? 3- So far I have only learnt the definition of a "policy". Putting it in practice, giving the scenario in this video (about the lizard navigating an environment), where does the policy come into play here? Re-phrasing the question, what part of this scenario is the policy? Many thanks
@deeplizard
@deeplizard 5 жыл бұрын
Thanks, hazzaldo! 1. Your assumption is correct. 2. E is the notation for "expected value." 3. Recall that a policy is a function that maps a given state to the corresponding probabilities of selecting each possible action from that state. The goal is for the lizard to navigate the environment in such a way that will yield the most return. Once it learns this "optimal navigation," it will have learned the optimal policy.
@hazzaldo
@hazzaldo 5 жыл бұрын
​@@deeplizard TY very much. I do have another question, that I left in the Exploration vs Exploitation video part of this series. If you ever get the time, would really appreciate any clarification on it. Many thanks again for the answer to my question and this great series.
@sontapaa11jokulainen94
@sontapaa11jokulainen94 3 жыл бұрын
Is the exploration vs exploitation part only a part of the training or does it happen also when actually using the learned q table and also can the policy be that "Take the action which has the largest q value and sometimes explore" (eq can that be an example of a policy in this case)? So the policy is just the probability of taking some action in a state so can the policy just be written as: "Take the action which has the largest q value" (this is just an example for exploitation)?
@MohsinKhan-ve1hn
@MohsinKhan-ve1hn 4 жыл бұрын
Your voice is great
@davidli9872
@davidli9872 6 ай бұрын
Are you here after Reuter's article on OpenAI's q*?
@ayrmendina8314
@ayrmendina8314 6 ай бұрын
Yeap 😂 hi 👋
@louerleseigneur4532
@louerleseigneur4532 4 жыл бұрын
merci merci hats off
@yashas9974
@yashas9974 2 жыл бұрын
Link to the talk that appeared at the end of the video?
@patite3103
@patite3103 3 жыл бұрын
your videos are awsome! Please correct the corresponding quiz since the answer is uncorrect to me. Could you do a video explaining the first three steps and how the q-tables updates. This would really help to understand how the update works. thank you!
@krajkumar6
@krajkumar6 3 жыл бұрын
Hey @deeplizard, Many thanks for this video. I'm reading 'Reinforcement Learning: An Introduction, Second Edition by Richard S. Sutton and Andrew G. Bartow' and I'd like to know whether Q- learning technique described here is the same as dynamic programming explained in the book?
@krajkumar6
@krajkumar6 2 жыл бұрын
it is Temporal difference learning technique
@tallwaters9708
@tallwaters9708 2 жыл бұрын
I'll tell you what I really don't get, it seems the equation only updates the q table based on the current and next state. But the Bellman equation seems to imply that all future states are considered, is there some recursion thing going on?
@iAndrewMontanai
@iAndrewMontanai 4 жыл бұрын
What should I do in case of continuous tasks? Like in Flappy Bird (if its continuos, but anyway), I guess Q Table would be infinite here or just would have big fixed value to save memory. Can you give some recommendations or explain please? Wanna start implementation, but dont know how Q Table should look like in this case and how to interact with it correctly ( and i hope there will be no other surprises lol )
@deeplizard
@deeplizard 4 жыл бұрын
Yes, a Q-table would not be feasible for this task. Keep going in the series, and you will see how you can use Deep Q-Learning for these tasks. Essentially, you substitute a Q-table for a neural network.
@TheOfficialJeppezon
@TheOfficialJeppezon 4 жыл бұрын
You say that Q-learning tires to find the best policy. However, I thought q-learning is an off-policy algorithm. I also have troubles understanding the on/off policy concept.
@SugamMaheshwari
@SugamMaheshwari 4 жыл бұрын
Your voice is just amazing 😍😍😍😍😍
@ashabrar2435
@ashabrar2435 3 жыл бұрын
{ "question": "Q table is defined as _______________ and _______________", "choices": [ "action and state", "action and agent", "state and environment", "environment and action" ], "answer": "action and state", "creator": "Hivemind", "creationDate": "2021-01-03T20:01:41.577Z" }
@deeplizard
@deeplizard 3 жыл бұрын
Thanks, ash! Just added your question to deeplizard.com/learn/video/qhRNvCVVJaA :)
@nodstradamus
@nodstradamus 5 жыл бұрын
Thanks for the video, it was useful. But for me it would have been even more useful if you'd explained the Gamma Value (i.e. Discount Factor) in the formula as well..
@deeplizard
@deeplizard 5 жыл бұрын
Hey Aleistar - You're welcome! We first introduce the discount rate (gamma) a couple of episodes back where we learned about expected return. Check out the video/blog where it is introduced and defined here: deeplizard.com/learn/video/a-SnJtmBtyA Let me know if this helps!
@MrGenbu
@MrGenbu 4 жыл бұрын
hi i wanted to ask a Q , when the agent is trained on this lizard example in this board 3*3 , if we placed it in a board of 6*6 can it still perform or this is another kind of reinforcement learning
@deeplizard
@deeplizard 4 жыл бұрын
This technique would still work with a 6x6 board.
@MrGenbu
@MrGenbu 4 жыл бұрын
@@deeplizard but we need to train it first to generate the Q table rigth ? i mean it can not be trained and run on different board even using a 3*3 board with different reward places will not work? like regression u make a line then use it as u like but here u can not becuase the states should be the same ? is not it ?
@deeplizard
@deeplizard 4 жыл бұрын
Yes, I thought you were asking in general if the Q-learning with value iteration technique would work on a 6x6 board. If you changed the board, then you would need to change and initialize the Q-table as well before training starts.
@MrGenbu
@MrGenbu 4 жыл бұрын
@@deeplizard so this kind of agent is environment specific did u watch openAI hide and seek agent it seems it adapts to a new environemt without training i see this kind of agents is limited in its use as it can not be used is the real world as it needs to be trained on every new environment i am a newpie so i am rly just asking u to get more clear answer ? if u had seen the openAI video i would like to know which type of reinforecemnt learning can adapt to new environments
@rursus8354
@rursus8354 2 жыл бұрын
Won't a square become empty when the cricket(s) is(are) eaten?
@saumyachaturvedi9065
@saumyachaturvedi9065 4 ай бұрын
I guess crickets make sound, so lizard can take that as input as well to take the path
@deeplizard
@deeplizard 4 ай бұрын
🦗🎶🦎
@arnabjana2620
@arnabjana2620 2 жыл бұрын
{ "question": "What is optimal Q-value for a policy?", "choices": [ "Expected return for the reward at time (t+1) and maximum discounted reward thereafter for a state-action pair.", "It gives the optimal policy for the optimal expected return for an agent for each state-action pair.", "It is the reward for the action 'a' taken in state 's' at time 't'.", "Maximum accumulated reward by following the policy from time (t+1)." ], "answer": "Expected return for the reward at time (t+1) and maximum discounted reward thereafter.", "creator": "Arnab", "creationDate": "2021-08-03T08:12:26.884Z" }
@XxGabberlordxX
@XxGabberlordxX 5 жыл бұрын
Hello, can someone explain me pls why there are 6 empty states?
@deeplizard
@deeplizard 5 жыл бұрын
The six empty states are arbitrary. Think about a video game where some actions will cause you to gain points, some actions will cause you to lose points or lose the game, and some actions will have no immediate effect on your score. With the lizard game example, we have a similar set up where moving to a tile with crickets will gain points, moving to a tile with a bird will lose points/lose the game, and moving to an empty tile has no immediate effect on our score.
@XxGabberlordxX
@XxGabberlordxX 5 жыл бұрын
@@deeplizard Hey ty for the answer! When I take a look at the picture I still don't get why there are 6 empty tiles. I don't count 6 empty tiles 🤔
@deeplizard
@deeplizard 5 жыл бұрын
In the photo, the lizard is on one of the empty tiles. The lizard is the agent, and she is free to move to any tile.
@XxGabberlordxX
@XxGabberlordxX 5 жыл бұрын
@@deeplizard Wow that was so obvious. Ty for the help :) Now i get it. Nice video and have a nice day :)
@himanshu-negi
@himanshu-negi 2 жыл бұрын
The dound at 00:30 I hear in every video is quite disturbing 😅
@shreyasrajanna7361
@shreyasrajanna7361 5 жыл бұрын
Where is the next video
@deeplizard
@deeplizard 5 жыл бұрын
It is being developed! Aiming to add a new video to this series every 3-4 days.
@shreyasrajanna7361
@shreyasrajanna7361 5 жыл бұрын
@@deeplizard Your videos are really good.
@shreyasrajanna7361
@shreyasrajanna7361 5 жыл бұрын
Can you make videos on Kaggle projects or make a project which may make learning even more interesting.
@deeplizard
@deeplizard 5 жыл бұрын
We may do some Kaggle videos in the future. We do have the following two series that show practical deep learning projects in both Keras and TensorFlow.js. deeplizard.com/learn/playlist/PLZbbT5o_s2xr83l8w44N_g3pygvajLrJ- deeplizard.com/learn/playlist/PLZbbT5o_s2xrwRnXk_yCPtnqqo4_u2YGL
@EarlWallaceNYC
@EarlWallaceNYC 2 жыл бұрын
O' the puns, ... exploit vs explore
@adwaitnaik4003
@adwaitnaik4003 4 жыл бұрын
channel name is creepy but explanation is amazing...
@deeplizard
@deeplizard 4 жыл бұрын
👻
@adwaitnaik4003
@adwaitnaik4003 4 жыл бұрын
@@deeplizard :)
@muhammadsohailnisar6600
@muhammadsohailnisar6600 4 жыл бұрын
please remove the sound played with the logo at the start of video. the sound is very bad especially when one listens it on head phones.
@MohdDanish-bh1ok
@MohdDanish-bh1ok 4 жыл бұрын
Luv u babe.
@pututp
@pututp 3 жыл бұрын
I am too stupid to understand the video.. My bad..
What is Q-Learning (back to basics)
45:44
Yannic Kilcher
Рет қаралды 91 М.
100❤️
00:20
Nonomen ノノメン
Рет қаралды 64 МЛН
Кәріс өшін алды...| Synyptas 3 | 10 серия
24:51
kak budto
Рет қаралды 1,2 МЛН
🍕Пиццерия FNAF в реальной жизни #shorts
00:41
A pack of chips with a surprise 🤣😍❤️ #demariki
00:14
Demariki
Рет қаралды 27 МЛН
The Most Important Algorithm in Machine Learning
40:08
Artem Kirsanov
Рет қаралды 253 М.
Q-learning - Explained!
11:54
CodeEmporium
Рет қаралды 13 М.
AI Learns to Walk (deep reinforcement learning)
8:40
AI Warehouse
Рет қаралды 8 МЛН
Training AI to Play Pokemon with Reinforcement Learning
33:53
Peter Whidden
Рет қаралды 6 МЛН
Intelligence and Stupidity: The Orthogonality Thesis
13:03
Robert Miles AI Safety
Рет қаралды 666 М.
Reinforcement Learning, by the Book
18:19
Mutual Information
Рет қаралды 75 М.
100❤️
00:20
Nonomen ノノメン
Рет қаралды 64 МЛН