Reinforcement Learning, by the Book

  Рет қаралды 108,912

Mutual Information

Mutual Information

Күн бұрын

Пікірлер: 185
@mCoding
@mCoding 2 жыл бұрын
Let's take a quick break -- immediate cut and continues to next section. Got me good :)
@grahamjoss4643
@grahamjoss4643 Жыл бұрын
Mmmmmm coding
@gnorts_mr_alien
@gnorts_mr_alien 7 ай бұрын
how are you so clear with your word choice, expressions and tone? it's like you're uploading into my brain directly. I'm spreading the word, you deserve to be an educational youtube superstar.
@pluviophilexing2580
@pluviophilexing2580 2 жыл бұрын
You deserve a million followers
@grahamjoss4643
@grahamjoss4643 Жыл бұрын
Agreed. Help spread the word.
@akshaygulabrao4423
@akshaygulabrao4423 Жыл бұрын
the content is too specialized, this is like grad school level cs/stats, i doubt there are like a million of those
@terjeoseberg990
@terjeoseberg990 Жыл бұрын
@@akshaygulabrao4423, Have you been to Google around lunchtime? I believe there might be a million of them right there.
@umairm.5662
@umairm.5662 7 ай бұрын
He has some great intuitions. I am lucky to found him..
@amanmm
@amanmm Ай бұрын
Agreed
@matveyshishov
@matveyshishov 9 ай бұрын
I'm doing a quick refresher, and I am loving your videos, thank you so much, you've clearly done a lot of hard work, both understanding and explaining in a structured and logical way, and it is beautiful! Also, if I may, I want to leave a note for those who are just starting, while I still remember the parts that caused me trouble years ago when I myself was learning this. Despite my great respect to Sutton, both the book and the historical development of RL have not been straightforward, both suffer from idiosyncrasies, ad-hoc solutions and sometime confusion. My advice to a student would be - it's not just you, it's the topic. Take a deep breath and DECOMPOSE what you are reading. Take, for example, the simplest formula in RL, `P(s',r|s,a)`. It often throws people off, because s',r is a joint distribution. Forget it for a moment, assume that P(r'|s') is deterministic (that's what many coding examples do), and you'll be left with a much clearer form `P(s'|s,a)`, with deterministic function `R(s)`. Another similar situation is the term "policy", and specifically policy being probabilistic. I would advise the first time a student reads the book to forget about this probabilistic nature of the policy. Having done that we have a much clearer implementation of the policy function, in fact, the standard coding interview dynamic programming question like knapsack problem, where we populate a memoization table. That table is nothing else than policy. And if we instead of storing it, we replace the table with a neural net, you got the idea.. Having decomposed things this way, you'll see how everything downstream is going to be just this or that block augmented with one more variable or replaced with a neural net. For more advanced students, I'd strongly advise to look into optimal control, as it's like a "smarter big brother of RL", the environment is known, opening doors for proper math analytical solutions, you'll see RL better after visiting this parallel universe, and as a bonus - fall in love with Laplace transform. PS: "Dynamic programming" is a misnomer, there is nothing dynamic nor it is about the modern meaning of the word "programming" 🤦🏻‍♂ .
@kariminem
@kariminem 8 ай бұрын
Thanks for the comment Mat! I am currently doing my masters in RL and I do get the concepts but the real problem is in the code itself, trying to implement it from scratch is like reinventing the wheel I should say, but I think this is an important step for me to actually understand the underlying concepts. Understanding concepts from books is something and implementing it is literally something ELSE.
@polecat3
@polecat3 Жыл бұрын
I found your plain English explanations of the equations particularly helpful. Thank you!
@andrewwalker8985
@andrewwalker8985 2 жыл бұрын
Absolutely phenomenal clarity in your explanation. A million thanks for producing this. I’m about to binge watch the series :-)
@Mutual_Information
@Mutual_Information 2 жыл бұрын
Love it Andrew! Thank you! And if you know anyone else who is interested in RL, it would be huge for me if you shared it with them :)
@kiffeeify
@kiffeeify Жыл бұрын
I can only agree! Very, VERY, very! nice!
@AnasHawasli
@AnasHawasli 4 ай бұрын
dude you just uploaded the best RL explanation on the internet This is Gold thank you
@igwechi
@igwechi 3 ай бұрын
I like how parts of the material are introduced as answers to reasonable questions, this style of teaching resonates with me.
@ansharihasanbasri
@ansharihasanbasri Ай бұрын
omg i can't express how grateful i am for this. i particularly love your animations of how the game goes like one around 9:49. i can clearly see that your understanding of the concept is still really fresh in your mind and your 'beginner's perspective' (i.e. aware of what might be confusing to a beginner and thus preemptively address them) is fresh too. i also particularly love how you don't say things like 'i know these equations look hairy but let me explain' and instead just straight-off explain it bit by bit; i found the former approach, while somewhat validating a beginner's potential 'overwhelm', only 'exacerbate' it unnecessarily. that is, your showing an equation no matter how hairy and just tackle it step by step without 'complaining' somehow inspires us to do the same, if that makes sense. anyhow, again, huge appreciation for making this :D
@offensivearch
@offensivearch Жыл бұрын
I highly recommend people here interested in RL study from Sutton and Barto. I own three RL books (including S&B) and 5-6 if you count books that include some RL methods but aren't entirely about it. S&B is honestly one of the best self-study textbooks I have encountered, it is a classic for a reason. It is much better and easier to learn RL from it than the other books I have (and indeed any source). S&B does a great job in building intuitons and motivations and it orders topics in the perfect order from start to finish (at least for parts I and II, I am less familiar with part III). MI's videos are great, but it is hard to grasp without reading S&B first. If you really want to understand this stuff, I highly recommend reading from S&B directly in addition to videos like these.
@Mutual_Information
@Mutual_Information Жыл бұрын
Agreed. There's no competing with the original source - it's a phenomenal text
@coconut_camping
@coconut_camping Жыл бұрын
My 3 yrs old son probably would understand what RL is by watching this video. The most clear and distinct video of RL I found on KZbin so far. Thanks for sharing it!
@Mutual_Information
@Mutual_Information Жыл бұрын
You have one smart son! :) Thanks for watching!
@nishanthshetty435
@nishanthshetty435 2 жыл бұрын
At 05:24, I think $r \in \mathcal{R} \subseteq \mathbb{R}$ is more appropriate than $ r \in \mathcal{R} \in mathbb{R} $. Loved the video. Looking forward to watching the whole series.
@Mutual_Information
@Mutual_Information 2 жыл бұрын
Ah yes.. that is a better.. silly mistake. Not quite a big enough error to warrant a re-upload, but I appreciate the note
@khaliliskarous2225
@khaliliskarous2225 Жыл бұрын
I really like how you go through a concrete episode. That makes the formalism come to life. Best into to RL I've gone through. Thanks!
@Mutual_Information
@Mutual_Information Жыл бұрын
Thanks - it's nice to hear from people who appreciate the same things I do
@ryandaniels3258
@ryandaniels3258 2 жыл бұрын
After the release of the AlphaTensor paper, this is a pretty timely video, and I wholeheartedly agree with your statement about RL becoming much more relevant outside the realms of simply solving games. Great start to the series, and I look forward to the rest.
@intptointp
@intptointp Жыл бұрын
17:00 Yes, it’s true that the assumptions about being able to reference a world state do not really apply. But I feel that essentially, this is a large part of what we learn when we develop a skill. Your memory and experience is a gradually developing world state. And thus, experts do repeatedly reference an internal world state as they conduct the task. The trained agent is like an experienced expert when we analogize this over to real life.
@grayhat_9x
@grayhat_9x Ай бұрын
For people looking at comments to see whether video is worth watching, it is, regardless of whether you are beginner or not
@munzutai
@munzutai 11 ай бұрын
This video is criminally under-watched. It perfectly clears up the core taxonomy of RL that I was confused about up until now.
@tower1990
@tower1990 Жыл бұрын
What a great lecture! I have been reading Sutton’s book, however the material is often dense and abstract. I like how you visualise the computation processes for us, it really helps understanding the concepts clearly. Thank you. Looking forward to the rest of the series.
@Mutual_Information
@Mutual_Information Жыл бұрын
When I was reading the book originally, I was trying to think of these processes in my head. I was sure many others were trying the same.. and that's a big motivator for this series
@gorini
@gorini Жыл бұрын
University of Alberta's Coursera course and Sutton & Barto's book hurt my ignorant self; but you ease some of the pain. Thank you.
@tamirtsogbayar3912
@tamirtsogbayar3912 Жыл бұрын
In order to learn Deep learning especially RL, I've been revising my algebta calculus for 2 months. extraordinairily, I found your channel
@joehindley6185
@joehindley6185 Жыл бұрын
This is phenomenal maths communication. better than anything I have received in my three year undergraduate degree
@redoxepk
@redoxepk Жыл бұрын
Really like how you break problems down to a low level, including spelling out explicitly what all the variables & their symbology mean. Ty!
@Mutual_Information
@Mutual_Information Жыл бұрын
Yea, that's where the confusion happens. I do it because it confuses me when people don't!
@connorkapooh2002
@connorkapooh2002 Жыл бұрын
@@Mutual_Information you've also nailed the balance too. it's equally as tedious when people read out and overly explain
@glitchAI
@glitchAI Жыл бұрын
I don't comment on YT. but sir you are a great teacher. please keep going.
@Mutual_Information
@Mutual_Information Жыл бұрын
Thanks, editing a video now, so no plans of slowing down
@JaishreeramCoder
@JaishreeramCoder 3 ай бұрын
Every second of the video is pure gold.
@BadiArt
@BadiArt 9 ай бұрын
Handsoff the best explanation/teacher video I've seen in my entire life. well done !
@Mutual_Information
@Mutual_Information 9 ай бұрын
Thank you dude!
@igwechi
@igwechi 3 ай бұрын
This is beautiful work, I’ve watched the first video 3 times and each time I get an even deeper insight of subjects matter. Well done 👍
@Mutual_Information
@Mutual_Information 3 ай бұрын
Thank you - I try to keep the info density high, but yea, that probably makes rewatching a little more necessary.
@Marceloruiz
@Marceloruiz 2 жыл бұрын
I found your channel looking for RL, I started watching your videos and you are amazing, your explanations and examples are very clear, I hope you continue making many more videos!
@Mutual_Information
@Mutual_Information 2 жыл бұрын
Thank you! I''m doing my best :)
@qqq33
@qqq33 Жыл бұрын
I wish these videos were available when I started learning RL years ago. Nice work!
@lawson1310
@lawson1310 Жыл бұрын
Wow, one of the most explanatory videos on RL.
@NicolasChanCSY
@NicolasChanCSY 2 жыл бұрын
This is one of the best, if not the best, explanation I have seen! Can't wait to watch the upcoming videos in the series!
@Mutual_Information
@Mutual_Information 2 жыл бұрын
You won't have to wait long :)
@hedgineering6547
@hedgineering6547 Жыл бұрын
AWESOME VIDEO! Best I've seen on the topic BY FAR
@Mutual_Information
@Mutual_Information Жыл бұрын
ha yes you get what I'm going for! thanks!
@laotzunami
@laotzunami 2 жыл бұрын
Reinforcement learning is so fascinating, I'm so looking forward to the next videos!
@AlisonStuff
@AlisonStuff 2 жыл бұрын
Ooooooooo look at these energetic edits and the new lighting! Very nice!!
@alvinjamur1
@alvinjamur1 Жыл бұрын
your channel deserves 10 million followers. very good content! ⚡️
@Mutual_Information
@Mutual_Information Жыл бұрын
A 1000X compliment - thank you!
@amitpatil4957
@amitpatil4957 10 күн бұрын
best video with clear thought and precise words..
@dmitrideklerk5701
@dmitrideklerk5701 5 ай бұрын
Thank you so much for this video series, and all the useful resources you make available. thank you thank you thank you.
@dmitrideklerk5701
@dmitrideklerk5701 5 ай бұрын
You should create a course, I would buy it, you've given so much value in these free videos. And you have a gift for teaching and breaking things down.
@MathVisualProofs
@MathVisualProofs 2 жыл бұрын
Great video! Excited to see the rest of this series. Nice work!
@Mutual_Information
@Mutual_Information 2 жыл бұрын
Thanks MVP ;)
@japedr
@japedr 2 жыл бұрын
Thanks for making this, the educational value is miles ahead of most material you can find online. Just a nitpick, if you don't mind: at 5:15 the second ∈ should be ⊂ to denote subset (or ⊆, depending on the convention)
@Mutual_Information
@Mutual_Information 2 жыл бұрын
Yea someone else pointed that out. Oops! Not quite a big enough deal to warrant a re-upload, but thanks anyway
@AlisonStuff
@AlisonStuff 2 жыл бұрын
@@Mutual_Information I demand a re-upload.
@Mutual_Information
@Mutual_Information 2 жыл бұрын
@@AlisonStuff lol I accept demands from anyone other than my sister
@tobiasopsahl6163
@tobiasopsahl6163 Жыл бұрын
Starting this series now, really looking forward to it! Your videos have been great so far.
@Mutual_Information
@Mutual_Information Жыл бұрын
Love it Tobias - the series will treat you well!
@miriamshahidi7089
@miriamshahidi7089 6 ай бұрын
Found this channel too late. This is an amazing overview. I said "extremely cool" exactly when the instructor did in the video, bonus points!! :D
@Mutual_Information
@Mutual_Information 6 ай бұрын
We're on the same page :)
@ericvaish8841
@ericvaish8841 3 ай бұрын
This is so good, you have convinced me to start learning RL.
@Mutual_Information
@Mutual_Information 3 ай бұрын
And I'm still learning it (literally, at this very moment). Keep going - it's an interesting journey
@ericvaish8841
@ericvaish8841 Ай бұрын
​@@Mutual_Information Definitely will keep learning. All the best to you too in your learning journey!
@slowloris4346
@slowloris4346 Жыл бұрын
My goodness you are an amazing teacher, keep it up. I have read up to the end of chapter 3 in Sutton and Barto so far - and it was all a bit nebulous in my mind but these videos really helped collate my intuitions on the basics of the topic.
@Mutual_Information
@Mutual_Information Жыл бұрын
That's excellent to hear. You're exactly the audience I'm going for. Sometimes I wonder if these concepts are only clear as I write them, but not necessarily clear to the listener. So it's nice to know it lands.
@123ming1231
@123ming1231 2 жыл бұрын
it is amazing works !!!! Please keep publishing RL video, we r desperately to see it !!!!
@Mutual_Information
@Mutual_Information 2 жыл бұрын
Ok I will! But I need a favor from you - tell EVERYONE! ;)
@AtulSain-d4d
@AtulSain-d4d 6 ай бұрын
Finally, someone understand his audience.
@mirolator
@mirolator Жыл бұрын
These videos are really quite good. I'm going through David Silver's course, then using your videos as a great review and reinforce my understanding. You really do a great job of identifying the concepts that noobs like me would have trouble with and then thoroughly explaining them. I also appreciate that there's no fluff, but lots of substance. Keep up the great work!
@Mutual_Information
@Mutual_Information Жыл бұрын
Thank you very much Miro - glad they're helping. David Silver's lectures have been a solid source for these videos. Also, Hado Van Hasselt's recent series is excellent. I've put in the time for animations and info-density, but those guys are the pros!
@iamlegend3964
@iamlegend3964 4 ай бұрын
Very satisfying and simple explanations.
@dashwudt8369
@dashwudt8369 2 жыл бұрын
Big thanks for comeback!
@prithvidhyani2002
@prithvidhyani2002 5 ай бұрын
wonderfully explained, had to watch it twice but I'm glad I did.
@heyna88
@heyna88 2 жыл бұрын
Always the best. I was looking forward to your next block of videos... and they didn't disappoint! :)
@Mutual_Information
@Mutual_Information 2 жыл бұрын
Happy to hear that :) More on it's way
@finlandvickahe-ty2vu
@finlandvickahe-ty2vu Жыл бұрын
I do hope you can keep this up when you become famous
@NoNTr1v1aL
@NoNTr1v1aL Жыл бұрын
Absolutely amazing playlist! Also, at 5:11, calR is a subset of the real numbers; not an element of the real numbers.
@Mutual_Information
@Mutual_Information Жыл бұрын
ha yea you found it. That's a mistake, but too subtle to warrant a re-upload
@phafid
@phafid 2 жыл бұрын
I bought this book 2 months ago. I just couldn't understand it. something a bit off. you don't know how valuable your video is to give me the idea on what is the book trying to tell me. Thanks you! I am so excited to continue my learning.
@Mutual_Information
@Mutual_Information 2 жыл бұрын
You are welcome! That's exact the circumstance I was trying to solve for :)
@simonhradetzky7055
@simonhradetzky7055 2 жыл бұрын
My Man! Right on time as my RL project in university is starting haha ty♥️
@_tnk_
@_tnk_ 2 жыл бұрын
amazing work, looking forward to all the parts
@augustinlieber8157
@augustinlieber8157 2 ай бұрын
Very clearly explained, thank you!
@marcin.sobocinski
@marcin.sobocinski 2 жыл бұрын
Oh gosh, had no clue what you're talking about while watching the Fisher Information video...but that RL one is a pure gold! I read mentioned RL brick (e..g. book by Sutton and Barto) and I can hardly imagine better explanation and better summary of the MDP and RL general concept. I hope you can follow up with a Python code. If that's going to be as simple, concise, precise and informative as the video it will be wonderful. Thank you!
@Mutual_Information
@Mutual_Information 2 жыл бұрын
Thank you! There is *some* Python code, but not that much. The Python code gets light usage, so I haven't invested in it very much. But! I do have a strong habit of answering questions here. So ask away if you have questions. And you can ask many - totally happy to get you to a place where you think you have a good grasp of the subject.
@marcin.sobocinski
@marcin.sobocinski 2 жыл бұрын
@@Mutual_Information Sorry for a stupid question: what do you mean by "Python code gets light usage"? It's my English 🙁
@Mutual_Information
@Mutual_Information 2 жыл бұрын
@@marcin.sobocinski ah sorry - i meant “in the past, people don’t use the Python code very much. So I haven’t spent a lot of time on Python code for these videos.”
@devnachi
@devnachi Жыл бұрын
Awesome Content, Clearly Explained🔥🔥
@dominichafliger4974
@dominichafliger4974 7 ай бұрын
Props to you man!! This video was absolutely great and explains the topic so good. Thanks a lot and i hope i will understand the next video as easy as this one...
@AllemandInstable
@AllemandInstable 2 жыл бұрын
discovered your channel looking for Fisher Information explanations, and I can tell your channel's content is really good for overview and have nice explanations, and love that you mention that you need to get into books and maths to go in details love your content keep it up !
@des6309
@des6309 2 жыл бұрын
Looking forward!!
@rudolfromisch954
@rudolfromisch954 2 жыл бұрын
Very nice introduction into this topic! I was just about to start in RL and i find this video extremely motivating
@Mutual_Information
@Mutual_Information 2 жыл бұрын
The hope is to lower the cost of learning (from reading a textbook to watching a video) - sounds like it's work! Though, you should def still read the book lol
@rudolfromisch954
@rudolfromisch954 2 жыл бұрын
@@Mutual_Information You are doing a great service. I know i know, i will read it haha
@karthikmurthy2511
@karthikmurthy2511 Жыл бұрын
Thanks for this series.....@9:50, you nailed it.
@hehehe5198
@hehehe5198 5 ай бұрын
thanks man, this is very good and clear
@piero8284
@piero8284 Жыл бұрын
Great content. The only thing that irritates me a lit bit and even in the book it does not develop so much is what the expectation value function exactly express mathematically, I mean, what is taking the expectation E_pi[G | s], as G can be an infinite sum of Rewards that depends on previous states and actions, for me this notation just express the concept in a high level way.
@Mutual_Information
@Mutual_Information Жыл бұрын
I know what you're referring to. E[] is an operator where you have to imagine an integration and the specifics of that integration are omitted, but they matter! Ultimately, you just get used to it..
@mrnogood5326
@mrnogood5326 10 ай бұрын
Thank you ! Very good video 👍
@paulhofmann3798
@paulhofmann3798 7 ай бұрын
Assuming that a process is Markov is not a strong assumption. All processes can be made Markov by including all relevant (past) variables/states in the state variables.
@Mutual_Information
@Mutual_Information 7 ай бұрын
That's true in principle, but it's hard/impossible in practice for real world environments. E.g. markovian models are used alot for financial prediction, but no one has written down a state space that fully captures the history of the economy and stock market. If you just say the latent space *is* all historical observations, that's a huge state space, and so you're not getting anything out of the markov assumption.
@cizbargahyt
@cizbargahyt Жыл бұрын
This is so detailed. I love it! :)
@Mutual_Information
@Mutual_Information Жыл бұрын
As you know, the details matter - glad you like it!
@definitelynorandomvideos24
@definitelynorandomvideos24 Жыл бұрын
Great Intro! I ended up with a small question though: At 8:16 you present the formula of the return with the steps of the summation being k=t+1, while t should be an element of (let's just say it like that for the sake of this question) whole number {0,1,2,...}. Now, would that mean that k=0 in all cases? Thereby reducing gamma to 1, since it's always calculated to the power of 0? E.g.: t=5, k=5+1=6, gamma^(6-5-1)=1 Hope I didn't experess myself to confusingly...
@Mutual_Information
@Mutual_Information Жыл бұрын
G_t is the sum of future rewards at time t. So if there are 4 periods in an episode and the reward is always 1.2, then e.g. G_1 = gamma^(2-1-1)*1.2 + gamma^(3-1-1)*1.2 + gamma^(4-1-1)*1.2. In other words, k changes from 2 to 3 to 4 in the exponent of gamma and t is fixed over the sum, so gamma isn't always a fixed power Does that answer your Q?
@definitelynorandomvideos24
@definitelynorandomvideos24 Жыл бұрын
@@Mutual_Information yes, thank you very much. I didn't watch the full video yet, cause I looked at it in my lunch break and time was up, but I'll make sure to watch the rest today. Thanks a lot for the swift response!
@azizjedidi1180
@azizjedidi1180 Жыл бұрын
Thank you.
@grahamjoss4643
@grahamjoss4643 Жыл бұрын
Keep pushing through. I bet your videos will hit big soon! So your videos Are very concrete and technical. do you have any video ideas for the big picture around reinforcement learning or other data science topics? Perhaps like how do you think like a data scientist or how do you foster creativity when working with a data set?
@Mutual_Information
@Mutual_Information Жыл бұрын
Fortunately I've become quite good at being patient with growth. I'm keeping my expectations quite reasonable. Regarding big picture stuff - I'd like to do advice pieces, but they aren't high in the queue currently. I'm eager to have some solid KZbin wins under my belt before I speak very generally.. but I will one day. Maybe starting early next year.
@IgorAherne
@IgorAherne 2 жыл бұрын
This is such a beautiful explanation. I spent several years getting my head around RL, q-learning, policy gradients, etc and have several "gaps" in my understanding. Your way of explaining is so much on point and yet simple to understand, - a mark of real knowledge. I'm looking forward to watching these lessons. Thank you! PS. came here from Yannic Kilcher :) kzbin.info/www/bejne/qGnamnV3aL-Uh6c
@Mutual_Information
@Mutual_Information 2 жыл бұрын
Glad you enjoyed it Igor ! Yea, Yannic's shoutout was a nice boost :)
@billy.n2813
@billy.n2813 2 ай бұрын
Thank you very much for this
@Laétudiante
@Laétudiante 2 жыл бұрын
Truly great video!!!
@rogiervdw
@rogiervdw 2 жыл бұрын
Terrific, very well explained and well paced. Minor typo: @16:02, where it says -16 it should say -14 (doesn’t affect the best policy but might avoid confusion of the careful observer)
@Mutual_Information
@Mutual_Information 2 жыл бұрын
Ah damn, you're right.. man I really thought I checked the sh*t out of that. lol oh well. I've included a note in the description - thank you!
@lenishpandey192
@lenishpandey192 7 ай бұрын
Wow! Just beautiful.
@marcin.sobocinski
@marcin.sobocinski 2 жыл бұрын
Dziękujemy.
@lukasbieberich1501
@lukasbieberich1501 7 ай бұрын
Well explained! But in 13.40, in the episodic case when we have a time limit T I think the action and time value function are time (t) dependent, because here the number of expected rewards starting in state s really depends on how many possible rewards are left. So in this case t should be another variable in the functions name. Independence makes only sense for the infinite case I think
@lukasbieberich1501
@lukasbieberich1501 7 ай бұрын
I thought about it and it also makes sense when T is the stopping time defined as the first t for which S_t enters a final state. But this is Not completely trivial i think. If T is some finite constant the value should be time dependent. Just in case anybody else stumbles across the same problem..
@azaih
@azaih Жыл бұрын
Thank you for your excellent content
@Mutual_Information
@Mutual_Information Жыл бұрын
and thank you for watching
@theHDfiremaster
@theHDfiremaster 9 ай бұрын
Honestly, I really comment but this is an amazing video!
@yuktikaura
@yuktikaura Жыл бұрын
Keep up the awesome work
@korigamik
@korigamik Ай бұрын
I loved your video. Can you explain what you use to create the animations and for you sync the video and animations together?
@Mutual_Information
@Mutual_Information Ай бұрын
I use the python plotting library Altair, and then I wrote a library to turn those into videos. The syncing is done with blood, sweat and tears. I don't recommend it! I actually need to figure out how to make these videos easier to produce. It's a huge time sink aligning and timing everything.
@korigamik
@korigamik Ай бұрын
@@Mutual_Informationare out able to create these manim style avg animations using Altair? I really want to know
@TimScarfe
@TimScarfe 2 жыл бұрын
Great production value!!
@Mutual_Information
@Mutual_Information 2 жыл бұрын
As does ML Street Talk! Thanks for the love
@alexanderskusnov5119
@alexanderskusnov5119 Жыл бұрын
How about using RL instead of PID regulation? (like adaptive control)
@Mutual_Information
@Mutual_Information Жыл бұрын
Before I make that comparison, I'd do a quick video on PID controllers. It would be harder to discuss using RL *instead* of PID because I'm not familiar with any data/papers which discuss what happens when that migration happens. But PID controls themselves are quite cool. Very simple and very effective tools. Now I want to do that video.. hmm
@gustavojuantorena
@gustavojuantorena 2 жыл бұрын
Wow. Really amazing topic.
@Mutual_Information
@Mutual_Information 2 жыл бұрын
Thank you - yea it's quite a hot topic (though not as hot as it was 4 years ago).
@bzehtabian2402
@bzehtabian2402 28 күн бұрын
Thanks a million!
@CielMC
@CielMC 11 күн бұрын
Might be splitting hairs, but shouldnt 5:20's relationship between R and the Reals be "subset of" instead of "element of"?
@Mutual_Information
@Mutual_Information 7 күн бұрын
Oh yes, good catch. I attempt to be rigorous, but something always falls through the cracks.
@SphereofTime
@SphereofTime 7 ай бұрын
1:00 5:18 r is from finite subset of real number
@ba_ababa
@ba_ababa 23 күн бұрын
Would be keen to hear your updated thoughts on RL in 2025
@НиколайНовичков-е1э
@НиколайНовичков-е1э Жыл бұрын
Thank you!
@ultimateblue4568
@ultimateblue4568 18 күн бұрын
it was great. thanks
@Mewgu_studio
@Mewgu_studio Жыл бұрын
6:27 Anyone knows the use case to understand Policy-pi as a probability distribution? In RL, most case I encountered, policy-pi is a specific choice of action. Thanks in advance.
@Mutual_Information
@Mutual_Information Жыл бұрын
There are some problems where the optimal strategy is not deterministic. The book gives a toy example (I think in the policy gradient chapter). Outside of that, I could image poker being an environment where it pays to be genuinely randomly at points.
@Mewgu_studio
@Mewgu_studio Жыл бұрын
@@Mutual_Information Thanks so much for the response, never expected to get a response from the man himself, never mind this fast... :D Thanks for the clarification.
@fedahussainmuzaffari1910
@fedahussainmuzaffari1910 Жыл бұрын
Finally 🫡
@kmishy
@kmishy 4 ай бұрын
6:50 if policy involves no randomness then why don't simply write pi ( a | s) = 1
@santielewaut
@santielewaut 2 жыл бұрын
Excellent series, but i do need to get that wallpaper haha
@Mutual_Information
@Mutual_Information 2 жыл бұрын
Here you go: github.com/Duane321/mutual_information/blob/main/computer_background/background.png
@rockapedra1130
@rockapedra1130 6 ай бұрын
Thanks!
@Mutual_Information
@Mutual_Information 6 ай бұрын
Wow! Biggest donation ever! Thank you!!
@iamr0b0tx
@iamr0b0tx 2 жыл бұрын
Thanks!
@Mutual_Information
@Mutual_Information 2 жыл бұрын
No thank you Abdulfatah!
@hudhuduot
@hudhuduot 2 жыл бұрын
As a control system researcher, how can I make use of this for making a contribution and producing research papers
@Mutual_Information
@Mutual_Information 2 жыл бұрын
Hm, well this is just an intro series - I won't find anything cutting edge, but maybe it'll inspire some directions?
@patrickorone1149
@patrickorone1149 Жыл бұрын
I already subscribed
@user-wr4yl7tx3w
@user-wr4yl7tx3w 2 жыл бұрын
Awesome!
@TuemmlerTanne11
@TuemmlerTanne11 2 жыл бұрын
Are you still working at Lyft full-time? Can't imagine how much work goes into these videos... no way you are able to do this in your free time?! Anyways, happy to see new videos from you, keep it up :)
@Mutual_Information
@Mutual_Information 2 жыл бұрын
Ha thank you - yea I have a full time job. That’s why I hadn’t posted all year - it does take time!
Monte Carlo And Off-Policy Methods | Reinforcement Learning Part 3
27:06
Mutual Information
Рет қаралды 50 М.
كم بصير عمركم عام ٢٠٢٥😍 #shorts #hasanandnour
00:27
hasan and nour shorts
Рет қаралды 11 МЛН
This Game Is Wild...
00:19
MrBeast
Рет қаралды 194 МЛН
Smart Sigma Kid #funny #sigma
00:33
CRAZY GREAPA
Рет қаралды 22 МЛН
The Boundary of Computation
12:59
Mutual Information
Рет қаралды 1 МЛН
The Man Who Solved the $1 Million Math Problem...Then Disappeared
10:45
An introduction to Reinforcement Learning
16:27
Arxiv Insights
Рет қаралды 662 М.
NVIDIA CEO Jensen Huang Leaves Everyone SPEECHLESS (Supercut)
18:49
Ticker Symbol: YOU
Рет қаралды 985 М.
Is the Future of Linear Algebra.. Random?
35:11
Mutual Information
Рет қаралды 363 М.
This is why Deep Learning is really weird.
2:06:38
Machine Learning Street Talk
Рет қаралды 401 М.