I gave up when I see 1:29 for the first time (because I'm not so good at math and English) But when I came back again today and watched the entire video, I found this video the most well explained one. Especially the Q table section.
@absimaldata3 жыл бұрын
Why you are so so clear in explaining, I mean why others fail to deliver the tutorials with such clarity like you do?? I dont know whats wrong with everyone. Omg you are impressive.
@richardkessler21715 жыл бұрын
One of the best series I've viewed on RL. Really great job teaching the content without boring the audience. Also...really enjoy the closing snippets that keep me excited to see the end. Excellent!
@deeplizard5 жыл бұрын
Thank you, Richard! Really happy to hear that!
@obensustam357411 ай бұрын
Very good content, I watched the videos in this playlist to prepera for my exam. Thank you 😊
@pawelczar6 жыл бұрын
This whole series is great! I love the way you explain all math concepts and questions and more that happy that you did stop on that but you introduced practical example. Cant wait for next episodes :D
@deeplizard6 жыл бұрын
Thanks, pawelczar! Glad you're liking it!
@abcqer5556 жыл бұрын
Hi Lizard People, I feel so fortunate to have come across your channel. Your lessons/videos are very clear, concise, well produced, entertaining, and I am excited for all the videos that will be coming out. Keep up the fantastic work!
@deeplizard6 жыл бұрын
Hey Paul - Thank you! We're glad you're here!
@tingnews72736 жыл бұрын
What I learned: 1.Q-learning:Learning the optimal policy in MDP. 2.Q-learning work:learning the Q-values for each state-action pair 3.Value iteration:Q-learning iteravely updates the Q-value(Will be more clear in future I thought) 4.Q-table:store the Q-value for all state-action pair 5.Exploration:exploring the environment to find out infomation about it. 6.Exploitation:exploiting the information that is already known about(tip:epison greed)
@mohammadmohi85613 жыл бұрын
u r an AI, so nicely explained all these hard concepts so easily. thank u so much
@Asmutiwari4 жыл бұрын
These series are so so informative !! I wish you could make videos on dynamic navigation techniques using DRL
@arkadipbasu8282 жыл бұрын
Super explanation. Thanks from India
@adamhendry9454 жыл бұрын
PHENOMENAL! Your videos are THE BEST! Can you PLEASE PLEASE PLEASE do a series on Actor-Critic methods!!
@guineteherve9751 Жыл бұрын
Your work is simply incredible. Thank you!
@asdfasdfuhf4 жыл бұрын
This was an exciting video, finally, we are getting to the good stuff.
@Ayushsingh-zw3yk9 ай бұрын
nice explanation deeplizard
@davidkhassias48764 жыл бұрын
Can't wait for coming episodes, because this series is amazing! And they/you helped me a lot. Thank you so much!
@DreadFox_official Жыл бұрын
Hey, I loved your video. Thank you so much
@yelircaasi3 жыл бұрын
Really nice video, thanks for the clear explanations!
@cedrichung68203 жыл бұрын
How are you so good at explaining😍😍😍😍
@hazzaldo5 жыл бұрын
Brilliant video. One of the best RL teaching series/materials I've come across anywhere on the internet (if not the best). Look forward to watching the rest of the series. On this video, I have 3 questions: 1- Just to clarify is there a difference between the Q-function and Optimal Q-function? If so, is the difference that when a Q-function perform Q-value iteration, and eventually converge on the optimal Q-values, then this is called the Optimal Q-function? 2- What does the capital `E` signify in the Bellman Optimality equation? 3- So far I have only learnt the definition of a "policy". Putting it in practice, giving the scenario in this video (about the lizard navigating an environment), where does the policy come into play here? Re-phrasing the question, what part of this scenario is the policy? Many thanks
@deeplizard5 жыл бұрын
Thanks, hazzaldo! 1. Your assumption is correct. 2. E is the notation for "expected value." 3. Recall that a policy is a function that maps a given state to the corresponding probabilities of selecting each possible action from that state. The goal is for the lizard to navigate the environment in such a way that will yield the most return. Once it learns this "optimal navigation," it will have learned the optimal policy.
@hazzaldo5 жыл бұрын
@@deeplizard TY very much. I do have another question, that I left in the Exploration vs Exploitation video part of this series. If you ever get the time, would really appreciate any clarification on it. Many thanks again for the answer to my question and this great series.
@NoNTr1v1aL3 жыл бұрын
Amazing video!
@arnabjana26203 жыл бұрын
{ "question": "What is optimal Q-value for a policy?", "choices": [ "Expected return for the reward at time (t+1) and maximum discounted reward thereafter for a state-action pair.", "It gives the optimal policy for the optimal expected return for an agent for each state-action pair.", "It is the reward for the action 'a' taken in state 's' at time 't'.", "Maximum accumulated reward by following the policy from time (t+1)." ], "answer": "Expected return for the reward at time (t+1) and maximum discounted reward thereafter.", "creator": "Arnab", "creationDate": "2021-08-03T08:12:26.884Z" }
@ashabrar24354 жыл бұрын
{ "question": "Q table is defined as _______________ and _______________", "choices": [ "action and state", "action and agent", "state and environment", "environment and action" ], "answer": "action and state", "creator": "Hivemind", "creationDate": "2021-01-03T20:01:41.577Z" }
@deeplizard4 жыл бұрын
Thanks, ash! Just added your question to deeplizard.com/learn/video/qhRNvCVVJaA :)
@shashankdhananjaya99234 жыл бұрын
Awesome explanation. I like this
@shoaibalyaan4 жыл бұрын
AMAZING SERIES! Absolutely loved it!
@xiaojiang26104 жыл бұрын
Better than my engineering teacher.
@yashas99743 жыл бұрын
Link to the talk that appeared at the end of the video?
@rosameliacarioni10223 жыл бұрын
Thanks so muuuuch !
@tallwaters97082 жыл бұрын
I'll tell you what I really don't get, it seems the equation only updates the q table based on the current and next state. But the Bellman equation seems to imply that all future states are considered, is there some recursion thing going on?
@deeplizard6 жыл бұрын
Check out the corresponding blog and other resources for this video at: deeplizard.com/learn/video/qhRNvCVVJaA
@MohsinKhan-ve1hn5 жыл бұрын
Your voice is great
@SugamMaheshwari4 жыл бұрын
Your voice is just amazing 😍😍😍😍😍
@davidli9872 Жыл бұрын
Are you here after Reuter's article on OpenAI's q*?
@ayrmendina8314 Жыл бұрын
Yeap 😂 hi 👋
@tomdexter50293 ай бұрын
No, what's that OpenAI shit?
@sontapaa11jokulainen944 жыл бұрын
Is the exploration vs exploitation part only a part of the training or does it happen also when actually using the learned q table and also can the policy be that "Take the action which has the largest q value and sometimes explore" (eq can that be an example of a policy in this case)? So the policy is just the probability of taking some action in a state so can the policy just be written as: "Take the action which has the largest q value" (this is just an example for exploitation)?
@michaelscott85724 жыл бұрын
Thanks for the good explanation and all your work. A little hint if I may: Don't explain the words using the same words: Exploitation and Exploration
@mateusbalotin72473 жыл бұрын
Thank you!
@patite31033 жыл бұрын
your videos are awsome! Please correct the corresponding quiz since the answer is uncorrect to me. Could you do a video explaining the first three steps and how the q-tables updates. This would really help to understand how the update works. thank you!
@neogarciagarcia4435 жыл бұрын
exploration of reinforcement learning is going fine !
@madhesh184 жыл бұрын
Really good work
@arefeshghi4 жыл бұрын
Good balance of exploration and exploitation will bring good results in life too! We are all lizards! :)
@namitaa4 жыл бұрын
you saved my life bro
@rursus83543 жыл бұрын
Won't a square become empty when the cricket(s) is(are) eaten?
@TheOfficialJeppezon4 жыл бұрын
You say that Q-learning tires to find the best policy. However, I thought q-learning is an off-policy algorithm. I also have troubles understanding the on/off policy concept.
@iAndrewMontanai5 жыл бұрын
What should I do in case of continuous tasks? Like in Flappy Bird (if its continuos, but anyway), I guess Q Table would be infinite here or just would have big fixed value to save memory. Can you give some recommendations or explain please? Wanna start implementation, but dont know how Q Table should look like in this case and how to interact with it correctly ( and i hope there will be no other surprises lol )
@deeplizard5 жыл бұрын
Yes, a Q-table would not be feasible for this task. Keep going in the series, and you will see how you can use Deep Q-Learning for these tasks. Essentially, you substitute a Q-table for a neural network.
@megasage3 жыл бұрын
The dound at 00:30 I hear in every video is quite disturbing 😅
@mauriziovassallo54995 жыл бұрын
Very clear :)
@MrGenbu5 жыл бұрын
hi i wanted to ask a Q , when the agent is trained on this lizard example in this board 3*3 , if we placed it in a board of 6*6 can it still perform or this is another kind of reinforcement learning
@deeplizard5 жыл бұрын
This technique would still work with a 6x6 board.
@MrGenbu5 жыл бұрын
@@deeplizard but we need to train it first to generate the Q table rigth ? i mean it can not be trained and run on different board even using a 3*3 board with different reward places will not work? like regression u make a line then use it as u like but here u can not becuase the states should be the same ? is not it ?
@deeplizard5 жыл бұрын
Yes, I thought you were asking in general if the Q-learning with value iteration technique would work on a 6x6 board. If you changed the board, then you would need to change and initialize the Q-table as well before training starts.
@MrGenbu5 жыл бұрын
@@deeplizard so this kind of agent is environment specific did u watch openAI hide and seek agent it seems it adapts to a new environemt without training i see this kind of agents is limited in its use as it can not be used is the real world as it needs to be trained on every new environment i am a newpie so i am rly just asking u to get more clear answer ? if u had seen the openAI video i would like to know which type of reinforecemnt learning can adapt to new environments
@krajkumar63 жыл бұрын
Hey @deeplizard, Many thanks for this video. I'm reading 'Reinforcement Learning: An Introduction, Second Edition by Richard S. Sutton and Andrew G. Bartow' and I'd like to know whether Q- learning technique described here is the same as dynamic programming explained in the book?
@krajkumar63 жыл бұрын
it is Temporal difference learning technique
@louerleseigneur45324 жыл бұрын
merci merci hats off
@XxGabberlordxX5 жыл бұрын
Hello, can someone explain me pls why there are 6 empty states?
@deeplizard5 жыл бұрын
The six empty states are arbitrary. Think about a video game where some actions will cause you to gain points, some actions will cause you to lose points or lose the game, and some actions will have no immediate effect on your score. With the lizard game example, we have a similar set up where moving to a tile with crickets will gain points, moving to a tile with a bird will lose points/lose the game, and moving to an empty tile has no immediate effect on our score.
@XxGabberlordxX5 жыл бұрын
@@deeplizard Hey ty for the answer! When I take a look at the picture I still don't get why there are 6 empty tiles. I don't count 6 empty tiles 🤔
@deeplizard5 жыл бұрын
In the photo, the lizard is on one of the empty tiles. The lizard is the agent, and she is free to move to any tile.
@XxGabberlordxX5 жыл бұрын
@@deeplizard Wow that was so obvious. Ty for the help :) Now i get it. Nice video and have a nice day :)
@shreyasrajanna73616 жыл бұрын
Where is the next video
@deeplizard6 жыл бұрын
It is being developed! Aiming to add a new video to this series every 3-4 days.
@shreyasrajanna73616 жыл бұрын
@@deeplizard Your videos are really good.
@shreyasrajanna73616 жыл бұрын
Can you make videos on Kaggle projects or make a project which may make learning even more interesting.
@deeplizard6 жыл бұрын
We may do some Kaggle videos in the future. We do have the following two series that show practical deep learning projects in both Keras and TensorFlow.js. deeplizard.com/learn/playlist/PLZbbT5o_s2xr83l8w44N_g3pygvajLrJ- deeplizard.com/learn/playlist/PLZbbT5o_s2xrwRnXk_yCPtnqqo4_u2YGL
@saumyachaturvedi906511 ай бұрын
I guess crickets make sound, so lizard can take that as input as well to take the path
@deeplizard11 ай бұрын
🦗🎶🦎
@nodstradamus5 жыл бұрын
Thanks for the video, it was useful. But for me it would have been even more useful if you'd explained the Gamma Value (i.e. Discount Factor) in the formula as well..
@deeplizard5 жыл бұрын
Hey Aleistar - You're welcome! We first introduce the discount rate (gamma) a couple of episodes back where we learned about expected return. Check out the video/blog where it is introduced and defined here: deeplizard.com/learn/video/a-SnJtmBtyA Let me know if this helps!
@davidak_de5 ай бұрын
Q-Star Lizard Gang 2024
@adwaitnaik40034 жыл бұрын
channel name is creepy but explanation is amazing...
@deeplizard4 жыл бұрын
👻
@adwaitnaik40034 жыл бұрын
@@deeplizard :)
@EarlWallaceNYC3 жыл бұрын
O' the puns, ... exploit vs explore
@muhammadsohailnisar66004 жыл бұрын
please remove the sound played with the logo at the start of video. the sound is very bad especially when one listens it on head phones.
@pututp3 жыл бұрын
I am too stupid to understand the video.. My bad..