SARSA vs Q Learning

Рет қаралды 11,303

Marcus Fong

2 жыл бұрын

4701 is so fun!

Пікірлер: 18

@ianlee1279 10 ай бұрын

The only video I can fully understand what's going on.

@KP-fy5bf Ай бұрын

You sir explained this absolutely perfectly, ever single detail was covered

@joannewang9558 2 жыл бұрын

Great video! Can you make a linguistics one too?

@harshk2489 2 жыл бұрын

Please make more Video's On reinforcement learning.This is too good.

@jasminwilson9029 Жыл бұрын

Thank you!! This helped me understand it correctly.

@cusematt23 2 жыл бұрын

So high value summary, and please correct me if I'm wrong: SARSA - use eps-greedy twice in each SARS'A' generation. Assign A' to A, loop. If exploring, you are choosing actions with probability equal to 1/number of actions; so it is still possible to randomly choose the optimal action. QLearning - use eps-greedy once to generate SARS'. Then choose arg-max over A' of all Q(S',A'). I was a little confused in the implementation in the HW, but now that I've had a night to sleep on it, it is clear we are arg-maxing over Q(S',A') right? Which early on in the algorithm won't have much information to go on (I was trying in the blackjack for example to choose the best A based on V's we had previously solved). But clearly since this is a learning algorithm, there is no reference to V* and therefore we are simply using Q(S',A')'s that we have generated thus far, or if we havent updated them yet, what we initialized them as. Therefore the the A' selection would simply look like MAX[Q(S'1,A'1), Q(S'2,A'2), .... ]. So for this step we need only the Q values and no "calculation" is needed. Am I on the right path?

@oliverhniu Жыл бұрын

Thank you!😊

@shehz351 10 ай бұрын

Great explanation

@jiaqint961 10 ай бұрын

Thanks!

@ahmedaj2000 Жыл бұрын

thanks!

@felipe_marra Жыл бұрын

tnks

@nikhilbalwani5556 Жыл бұрын

I came here before you posted this to ed!

@ruqu5794 3 ай бұрын

But then does Q-learning not use e-greedy to generate a'? How is a prime generated in Q learning? I know SARSA uses e-greedy but what does Q-learning use to generate a'?

@chiboubamine5970 2 ай бұрын

It uses epsilon-greedy policy to choose an action like sarsa. The only difference is the promise of choosing the next action in sarsa unlike Q-learning