RL in Games Intro
1:34
2 жыл бұрын
Successes and Limits
5:53
3 жыл бұрын
Intro
0:54
3 жыл бұрын
Discrete vs Continuous
6:22
3 жыл бұрын
Conclusion
2:49
3 жыл бұрын
Self Supervision
6:33
3 жыл бұрын
Correlation
4:01
3 жыл бұрын
Invariances
5:14
3 жыл бұрын
Model-Free RL Examples
1:42
3 жыл бұрын
REINFORCE Algorithm
5:17
3 жыл бұрын
Model Based RL Examples
4:35
3 жыл бұрын
Policy Gradient Intro
4:20
3 жыл бұрын
Actor-Critic Training
2:31
3 жыл бұрын
Model Learning and Usage
4:16
3 жыл бұрын
Model-Based RL Intro
2:00
3 жыл бұрын
Imitation Learning
13:57
3 жыл бұрын
Direct Policy Search and Actor-Critic
13:53
Model-Based RL
10:53
3 жыл бұрын
Experience Replay
8:40
3 жыл бұрын
Target Networks
6:58
3 жыл бұрын
Double DQN
9:15
3 жыл бұрын
Deep Deterministic Policy Gradients
8:36
Transformers
5:58
3 жыл бұрын
The Bellman Equation
9:48
3 жыл бұрын
Temporal Difference and Q Learning
14:16
Solving MDPs
15:32
3 жыл бұрын
Intro to RL
16:38
3 жыл бұрын
Markov Decision Processes
14:05
3 жыл бұрын
Пікірлер
@jackman2532
@jackman2532 11 күн бұрын
how did we computed the values of the probablity in the value iteration??
@mrunalwaghmare
@mrunalwaghmare 14 күн бұрын
Professional yapper
@sayakbanerjee7214
@sayakbanerjee7214 20 күн бұрын
Indians make understanding so easy for everyone. This is a much easy to understand explanation than what they taught me here at CMU😅
@sanchitagarwal8764
@sanchitagarwal8764 Ай бұрын
Excellent explanation sir
@amn1981
@amn1981 Ай бұрын
One of the best RL videos!! Thank you for sharing!
@imreezan
@imreezan Ай бұрын
why is it V2 0.72? and not 0.8? The reward for moving right from 3,3 is suppoused to be +1 right? and the V(S`) is suppoused to 0 since there will no value if we are in that state, since it is terminal. So V2 is suppoused to be 0.8 right?
@Hareo8891
@Hareo8891 Ай бұрын
Great video! But I think P(s' | s, a) is 0.21?
@Joseph-kd9tx
@Joseph-kd9tx Ай бұрын
For the value update equation, wouldn't it be simpler to take the R(s,a,s') out of both the sum and the argmax? Given that R(s,a,s') would equal just R(s) in this case? So it would be R(s) + argmax(sum(P*gamma*V(S'))). The sum over all possible next states for P always equals 1. Thus this R(s)/R(s,a,s') term would be the same in or out of this part
@Joseph-kd9tx
@Joseph-kd9tx Ай бұрын
3:53 agent is in so much constant pain that it just decides to end itself, how interesting
@whilewecan
@whilewecan 2 ай бұрын
The place (-0.51 North, -0.43 East, 0.15 South, and 0.42 West) ... I do not understand why -0.51 tp gp tp North. The prospect is bright to go North then to East.
@tahmidkhan8132
@tahmidkhan8132 2 ай бұрын
like this if you're todd neller.
@jackdoughty6457
@jackdoughty6457 3 ай бұрын
For v2 why would (2,3)=0 if there is still a small chance we go right torwards -1? Wouldn't (2,3)=-0.09 in this case?
@tirth8309
@tirth8309 3 ай бұрын
Suyog sir idhar se kuch sikhlo
@quentinquarantino8261
@quentinquarantino8261 3 ай бұрын
Doesn't R(s,a,s') actually mean ending up in s' by chosing action a and being in a? So why is this not the same as being in state s'?
@relaxo883
@relaxo883 4 ай бұрын
nobody explain why we get square for L1 and circle for L2.
@Saumillakra
@Saumillakra 2 ай бұрын
We get a square for L1 because since here we take 2 weights so its equation will be of the form |w1| + |w2| which gives a square, if you try to plot it. L2 gives an equation of the circle. Hope that clears your doubt!
@keyoorabhyankar5863
@keyoorabhyankar5863 5 ай бұрын
Is it just me or are the subscripts in the wrong directions?
@akinolalanre7144
@akinolalanre7144 6 ай бұрын
Great work but please the red ink on the screen doesn't align with the flow of the explanation and it's create a kind of imbalance.
@SO5RQ
@SO5RQ 6 ай бұрын
Good day! I was wondering if it's possible to update the github course content with the materials related to this section (the "week 9" in the repository is called "Text and Embeddings" and contains very little about RNNs)? The course is great! I've been following it for the past few weeks and learned a lot from it. I would love to also learn more about RNNs.
@Daaninator
@Daaninator 7 ай бұрын
thank u
@jaiberjohn
@jaiberjohn 7 ай бұрын
Excellent points to think about! This lecture questions the overhyped present-day AI, how it lacks compared to Biological Intelligence
@myfolder4561
@myfolder4561 7 ай бұрын
very well explained. much better than most others on the same topic
@myfolder4561
@myfolder4561 7 ай бұрын
Great explanation!
@royvivat113
@royvivat113 8 ай бұрын
Great video thank you.
@jlopezll
@jlopezll 8 ай бұрын
9:06 Why when iterating v2, the values of the all other squares are 0's? Shouldn't the squares near the terminal states have non-zero value?
@alexwasdreaming9440
@alexwasdreaming9440 3 ай бұрын
I believe it's because not moving is a valid move, otherwise I feel you are right
@Joseph-kd9tx
@Joseph-kd9tx Ай бұрын
I know why. When evaluating any state adjacent to the -1 terminal state, the argmax will always prefer the action that yields 0 rather than -1. Thus it stays at 0. The argmax is choosing the action that goes directly away from the -1 state so that there's no chance in hell that it could land there, even if it slips. However, there is an interesting case where a state adjacent to a -1 would update: if the state is sandwiched between two -1 terminal states. In this case, no matter what action you take, there is a chance of slipping into one of the negative states, and it would therefore update negatively.
@NehaKariya-d1f
@NehaKariya-d1f Ай бұрын
​​@@Joseph-kd9txThanks for clarifying!
@VIJAYALAKSHMIJ-h2b
@VIJAYALAKSHMIJ-h2b 8 ай бұрын
Cant understand how it it 0.52
@QIANWAN-wi3ty
@QIANWAN-wi3ty 8 ай бұрын
4:45开始
@manishkumarkn
@manishkumarkn 8 ай бұрын
can you please suggest some good books for reinforcement learning?
@gunjanshinde396
@gunjanshinde396 8 ай бұрын
Well explained, thank you !!
@harbaapkabaap2040
@harbaapkabaap2040 8 ай бұрын
Best video on the topic I have seen so far, to the point and well explained! Kudos to you brother!
@abdullah.montasheri
@abdullah.montasheri 9 ай бұрын
the state value function Bellman equation includes the policy action probability at the beginning of the equation which you did not consider in your equation. any reason why?
@negrito360
@negrito360 9 ай бұрын
Amazing
@user-canon031
@user-canon031 9 ай бұрын
Good!
@SphereofTime
@SphereofTime 9 ай бұрын
0:10
@meobliganaponerunnom
@meobliganaponerunnom 10 ай бұрын
I don't understand why this works.
@bean217
@bean217 10 ай бұрын
This is probably one of the best videos I've found describing transitioning from tabular Q-learning to using deep Q networks. Other videos seem to sugar coat the topic with terminology that is far too simple to the point where it almost obscures what is actually being discussed.
@anishreddyanam8617
@anishreddyanam8617 10 ай бұрын
Thank you so much! My professor explained this part a bit too fast so I got confused, but this makes a lot of sense!
@Advikchaudhry28
@Advikchaudhry28 11 ай бұрын
VIMAL Sir, your explaintation is better on KZbin then what you teach in Class Lacture
@sgn_sabir
@sgn_sabir 11 ай бұрын
GANs in Bengali Knowledge 😊
@don-ju8ck
@don-ju8ck 11 ай бұрын
🙏🙏🏿
@cuongnguyenuc1776
@cuongnguyenuc1776 11 ай бұрын
great video!! I understand your lecture, i know that the expectation term can be replace by sampling trajectories, however, in many document, the gradient term is reduce to the term inside the expectation bracket which mean they dont take the average over many trajectories, they update the parameter using one trajectory only. Do you know why this the case? ( I see the algorithm in the book by Sutton & Barto)
@tarkemregunduz6872
@tarkemregunduz6872 11 ай бұрын
perfect
@jimmylaihk
@jimmylaihk Жыл бұрын
Excellent explanation.
@kyrohrs
@kyrohrs Жыл бұрын
Great video but how can we use policy iteration for a MDP when the state space grows considerably with each action? I know there’s various methods of approximation for policy iteration but I just haven’t been able to find anything, do you have any resources on this?
@SaloniHitendraPatadia
@SaloniHitendraPatadia Жыл бұрын
According to bellman equation, I got the value 0.8 * (0.72 + 0.9 * 1) + 0.1 * (0.72 + 0.9 * 0) + 0.1 * (0.72 + 0.9 * 0) = 1.62. Please correct where I got wrong.
@mghaynes24
@mghaynes24 Жыл бұрын
The living reward is 0, not 0.72. 0.72 is the V at time 2 for grid square (3,3). Use the 0.72 value to update grid squares (2,3) and (3,2) at time step 3.
@newtondurden
@newtondurden Жыл бұрын
fantastic video, man I was so confused for some reason when my lecturer was talking about it, not supposed to be hard iguess, just how exactly it worked this video helped fill in the details
@TheClockmister
@TheClockmister Жыл бұрын
My bald teacher will talk about this for 2 hours and I won’t understand anything. This helps a lot
@Moch117
@Moch117 10 ай бұрын
lmfaooo
@fa7234
@fa7234 Ай бұрын
does your bald teacher name starts with Charles Isbell
@abhisheksa9552
@abhisheksa9552 Жыл бұрын
Does "phi" over here points to the "vector of features" which represents q-state(s,a)?
@subrahmanyaswamyperuru2675
@subrahmanyaswamyperuru2675 7 ай бұрын
No. It represents the weight vector of the neural network.
@tower1990
@tower1990 Жыл бұрын
There shouldn’t be any value for the terminal state… my god…
@rouzbehh1705
@rouzbehh1705 Жыл бұрын
Thank you so much for your helpful teaching!
@knowledgelover2736
@knowledgelover2736 Жыл бұрын
Great explanation. For time series data, should the value selected for the positional encoding match the number of features or the number of features * time steps? Thanks!