enough exploration for good youtube lecture on ml. i should keep exploit this guy. 0 regret guaranteed :)
@Sad-mm8tm4 жыл бұрын
I hope you will continue making videos forever. Your explanations are the best I've ever seen anywhere + the wide choice of topics gives me food for thought when dealing with my own optimization problems.
@ritvikmath4 жыл бұрын
Thank you :) I'm happy to help
@VahidOnTheMove2 жыл бұрын
If he makes videos forever, we'll get zero regrets.
@Xaka-waka11 ай бұрын
@@ritvikmathdon't let this channel die man
@marcelobeckmann95523 жыл бұрын
Your explanations, didactics, and dynamism are amazing, way better than several university professors. Well done!
@faadi45363 жыл бұрын
What an amazing explanation. I am taking a Machine Learning Course and he tried to explain the concept using Bandits but couldn't quite really grasp it in detail. I understood what we are trying to figure out but wasn't quite their yet. You have made it so much easier. Kudos to You Brother.
@bilalbayrakdar71002 жыл бұрын
Bro I completed my CS degree with your help and now I got accepted for master and you are still here to help. You are a true man, thx mate
@savasozturk007 ай бұрын
After watching 5 videos, finally I found the best lecture teller for this topic. The examples are great, Thanks.
@111dogger3 жыл бұрын
This is the best explanation I have come across so far for the Upper Bound Confidence concept. Thank you!
@itssosh2 жыл бұрын
It would be great if you made a whole playlist where you explain the statistics for machine learning by explaining the formulas in an intuitive way like you do (you make me understand them all). For example, explain the various distributions and their meaning, statistical tests (p-value), etc. Thank you so much for the work you do and the knowledge you share!
@abdulsami58433 жыл бұрын
A thing I absolutely like is how palatable you make these concepts, not too mathematical/theoratical and not overly simplified, just the right balance ( € - greedy is set right 😉)
@Dr.RegulaSrilakshmi9 ай бұрын
U r just awesome ,any person who doesn't have any knowledge of Reinforcement learning can understand,Keep up the spirit...cheers
@pranavlal2289Ай бұрын
Best explanation of MAB ever. Thanks
@AnasHawasli2 ай бұрын
Thank you so much for this simple explanation It was impossible for me to understand this concept without your video NOT EVERYONE SPENT HIS LIFE IN A CASINO I am not familiar with this armed bandit trash Here is a sub!
@malice112 Жыл бұрын
What a great and easy to understand explanation of MAB - thank you for this!!!!
@shahnazmalik65534 жыл бұрын
Your teaching method is highly appreciated. Please make lectures on statistics and machine learning algorithms
@softerseltzer4 жыл бұрын
Love your videos, the quality just keeps going up! PS. the name of the slot machine is "One-armed bandit", because of the long arm-like lever that you pull to play.
@irishryano4 жыл бұрын
....And the bandit bc it’s the WORST odds in every casino
@spicytuna082 жыл бұрын
i guess the slot machine is a bandit cause it keeps robbing money from the players.
@SURBHIGUPTA-o4w6 ай бұрын
Thanks Ritvik! this is the best explanation I have come across so far!
@spicytuna082 жыл бұрын
we need to a person like you to democratize these important concepts cannot express how grateful i am to understand these important concepts which i have struggled in the past.
@heteromodal3 жыл бұрын
Great video, and it's really nice listening to you! Thank you :)
@maxencelaisne41414 жыл бұрын
Thank you so much, I passed my exam thanks to your explanation :)
@ritvikmath4 жыл бұрын
Glad it helped!
@hameddadgour2 жыл бұрын
I just realized that I need to explore more to maximize my happiness. Thank you Multi-Amed Bandit :)
@adanulabidin8 ай бұрын
What an amazing explanation! Thank you so much. Keep making such videos.
@llescarini4 жыл бұрын
Subscribed since few days, your videos are more than excellent! Amazing skill for teaching, thanks a lot.
@ritvikmath4 жыл бұрын
Awesome, thank you!
@CaioCarneloz2 жыл бұрын
The way you explain is stunning, what a awesome lesson.
@jonathanarias27292 жыл бұрын
Why 330 is the response in the explotation example? Should t be; 3000-2396=604??
@khaalidmcmillan92602 жыл бұрын
Well said, needed a refresher after not seeing this for a while and this nailed it. Hopefully you've gone into more advanced topics like MAB reinforcement learning
@bobo06124 жыл бұрын
Hi! Thank you for your video. I have a question at 6:28. Why the roh is not simply 3000 - 2396?
@senyksia4 жыл бұрын
2396 was the happiness for that specific case, where restaurant #2 was chosen to exploit. 330 is the (approximate) average regret for every case. So 3000 - 2396 would be correct if you were only talking about that unique case.
@myoobies4 жыл бұрын
@@senyksia Hey, what do you mean by average regret for every case? I'm still having trouble wrapping my head around this step. Thanks!
@madmax24423 жыл бұрын
@Bolin WU I know it's 8 months already but I wanted to know whether you got the answer or not. I also have the same doubt.
@vahidsohrabi943 жыл бұрын
I'm grateful to you because of this great tutorial.
@gabrieldart9943 Жыл бұрын
This is so cool! Thanks for your clear explanation.
@nassehk4 жыл бұрын
I am new to your channel. You have a talent in teaching my friend. I enjoy your content a lot. Thanks.
@ritvikmath4 жыл бұрын
Thanks!
@traiancoza52143 жыл бұрын
Perfectly explained. Genius.
@raphaeldayan3 жыл бұрын
Amazing explanation, very clear, thank you Sr
@anaydongre12264 жыл бұрын
Thanks so much for explaining this in detail !!
@ritvikmath4 жыл бұрын
You are so welcome!
@nintishia3 жыл бұрын
Very clear explanation. Thanks for this video.
@sahanar86123 ай бұрын
Great Explanation!. Thank you 😊
@jinpark98714 жыл бұрын
Thanks, your work is really awesome.
@ritvikmath4 жыл бұрын
Thank you too!
@soundcollective22403 жыл бұрын
Thanks, it was quite useful, heading to your Thompson Sampling video :)
@DarkNinja-243 жыл бұрын
Wow, great example and amazing explanation!
@A.n.a.n.d.k.r. Жыл бұрын
Awesome cool technique just got hooked to this
@jroseme Жыл бұрын
This was a useful supplement to my read of Reinforcement Learning by Sutton & Barto. Thanks.
@ritvikmath Жыл бұрын
Glad it was helpful!
@krittaprottangkittikun77403 жыл бұрын
This is so clear to me. Thank you for making this video!
@rikki146 Жыл бұрын
i cannot thank you enough for makin this excellent vid!
@SDKIM02112 жыл бұрын
Love your videos. To understand the average regret value for exploitation, which extra material should we refer to? Why not 604?
@prasadbbdАй бұрын
love this video and explanation is very clear
@kunalkasodekar85625 ай бұрын
Perfect Explanation!
@nastya8314 жыл бұрын
thanks man, this is truly helpful! 6 min at 2x and I got it all
@ritvikmath4 жыл бұрын
Great to hear!
@hypebeastuchiha92292 жыл бұрын
My exam is in 2 days and I'm so close to graduating with the highest grades. Thanks for your help!
@warreninganji78814 жыл бұрын
crystal clear explanation worth a subscription for more👌
4 ай бұрын
Awesome! Thank you! You helped me a lot!
@dr.kingschultz2 жыл бұрын
You are very good! Please explore more this topic. Also include the code and explain it
@aryankr Жыл бұрын
Thank you for a great explanation!!
@michaelvogt77876 ай бұрын
multi-armed bandit is a misnomer really... it should be multi-one-armed-bandit problem. slot machines were called one-armed bandits because they have a single arm that is pulled, and the odds of winning are stacked against the player making them bandits. the goal is not so much about finding out which to play, which would become more apparent given enough plays, but instead to determine which mix of N plays to spread out across the group, settling in on the best mix to achieve exploration in balance against exploiting the best returning bandit. i am a career research scientist pioneering in this field for 40 years... i am always reviewing videos to back-share with students and learners and YOURS have Returned the greatest value for my Exploration, and I will be Exploiting YOURs by sharing them the most with my students. its the best compliment i can think of. cheers. dr vogt ;- )
@amirnouripour55012 жыл бұрын
Thanks a lot. Very insightful!
@NoNTr1v1aL3 жыл бұрын
Amazing video!
@yongnaguo87723 жыл бұрын
Thanks! Very good explanation!
@stanislavezhevski28774 жыл бұрын
Great explanation, can you leave a link to the code, which you used in simulations ?
@ritvikmath4 жыл бұрын
Thanks! I have a follow up video on Multi-Armed Bandit coming out next week and the code will be linked in the description of that video. Stay tuned!
@fridmamedov27011 ай бұрын
Simple and accurate. That is it. Thanks!!!
@abogadorobot60943 жыл бұрын
WOW! That's was brilliant! Thank you!
@TheMuser Жыл бұрын
I have explored and finally decided that I am going to exploit you! *Subscribed*
@dr.nalinfonseka70722 жыл бұрын
Excellent explanation!
@Status_Bleach Жыл бұрын
Thanks for the vid boss. How exactly did you calculate the average rewards for the Exploit Only and Epsilon-Greedy strategies though?
@debashishbhattacharjee1112 Жыл бұрын
Hello Ritvik This was a very helpful video. You have explained a concept so simply. Hope you continue making such informative videos. Best wishes.
@ritvikmath Жыл бұрын
Thanks so much!
@francisliubin Жыл бұрын
Thanks for the great explanation. What is the essential difference between contextual bandit (CB) problem vs multi-arm bandit (MB) problem? How does the difference impact the strategy?
@velocfudarks84883 жыл бұрын
Thanks a lot! Really good representation!
@rifatamanna78954 жыл бұрын
It was awesome technique 👍👍 thanks
@ritvikmath4 жыл бұрын
thanks for your words!
@michaelvogt77876 ай бұрын
Nicely done.
@zahrashekarchi61392 жыл бұрын
Thanks a lot for this video! Just one thing I would like to find out here is where we store the result of our learning? like some policy or parameter to be updated?
@victorkreitton22682 жыл бұрын
What ML books do you recommend or use?
@rutgervanbasten21592 жыл бұрын
really nice job! thank you
@vijayjayaraman59906 ай бұрын
Very helpful. How is the regret 300 in the second case? Shouldn't it be 3000 - 2396 = 604?
@davidkopfer32594 жыл бұрын
Very nice explanation, thanks!
@ritvikmath4 жыл бұрын
Glad it was helpful!
@seowmingwei94263 жыл бұрын
Well explained! Thank you!
@welidbenchouche Жыл бұрын
This is more than enough for me
@TheFobJang Жыл бұрын
Would you say exploit only strategy is the same as the eplore-then-commit strategy (also know as explore-then-exploit)?
@bassamry Жыл бұрын
very clear and simple explaination!
@ritvikmath Жыл бұрын
Glad it was helpful!
@jams01014 жыл бұрын
awesome video ! thanks so much
@annahuo66943 жыл бұрын
Great videos ! Thanks for your clarification. It's much clearer for me now. But I just wonder how you calculate the 330 regret in the case of exploitation only ?
@ritvikmath3 жыл бұрын
Good question. You can get that number by considering all possible cases of visiting each restaurant on the first three days. Something like, consider the probability that of the first three days of visits, what is the probability that restaurant 1 is best, vs. probability restaurant 2 is best, etc. You can do this via pencil and paper but I'd recommend writing a simple computer simulation instead.
@annahuo66943 жыл бұрын
@@ritvikmath Thank you for this prompt response. I think I get the idea from the epsilon greedy formula (option number 3 in the example). Thank you a lot, your video is really helpful :)
@alirezasamadi58042 жыл бұрын
You explained so good
@josemuarnapoleon3 жыл бұрын
Nice explanation!
@georgiak78772 жыл бұрын
This is amazing !
@snehotoshbanerjee19388 ай бұрын
Best explanation!!
@ritvikmath8 ай бұрын
Glad you think so!
@avadheshkumar14883 жыл бұрын
excellent explanation!!! thanks
@sbn0671 Жыл бұрын
Well explained!
@shahulrahman25167 ай бұрын
Great video
@yitongchen754 жыл бұрын
Cool explanation. Can you also talk about Upper Confidence Bound Algorithm relating to this?
@ritvikmath4 жыл бұрын
Good timing! I have a video scheduled about UCB for Multi-Armed Bandit. It will come out in about a week :)
@뇌공학박박사 Жыл бұрын
Best example ever!!!
@TheMuser Жыл бұрын
Nicely explained!
@yannelfersi35109 ай бұрын
can you share the calculation for the regret in case of exploitation only?
@wenzhang58792 жыл бұрын
Could you explain the difference between the MAB problem and the ranking and selection problem? Thanks
@PhilipKirkbride4 жыл бұрын
Related to regret, we never really know the true distributions (since we can only infer from taking samples). Would you basically just use your estimated distributions at the end of the 300 days as the basis for calculating regret?
@joyo212210 күн бұрын
can you explain why you need a random number generator?
@tariqrashid67483 жыл бұрын
Great explanation
@manabsaha53363 жыл бұрын
Sir, video on softmax approach.
@quanghoang3801 Жыл бұрын
Thanks! I really wish the RLBook authors could explain the k-armed bandit problem as clearly as you do, their writing is really confusing.
@sampadmohanty85734 жыл бұрын
I knew everything from the start. Ate at the same place for 299 days and got pretty bored. So watched youtube and found this video. Now I am stuck at this same restaurant on the 300th day to minimize my regret. Such a paradox. Just kidding. Amazing explanation and example.
@newwaylw10 ай бұрын
Regret for your exploit only strategy should be 3000-2396=~604 no?
@thinkanime1 Жыл бұрын
Really good video
@ritvikmath Жыл бұрын
Thanks!
@qqabt248163 жыл бұрын
I love this vid! It would be great if you could also do more videos on online learning and regret minimization 😆😆😆
@shantanurouth63833 жыл бұрын
I could not understand how it turned out to be 330, could you explain please?
@sunIess4 жыл бұрын
Assuming a finite horizon (known beforehand), aren't you (in expectation) better off doing all the exploration before starting to exploit?
@ritvikmath4 жыл бұрын
You've just made a very good point. One strategy I did not note is an epsilon-greedy strategy where the probability of explore in the beginning is very high and then it goes to 0 over time. This would likely be a good idea.
@Trucmuch4 жыл бұрын
Slot machines were not called bandit but one-arm bandit (they "stole" your money and the bulky box with one lever on its side kind of looked like a one-arm man. So the name of this problem is kind of a pun, a slot machine with more than one levers you can pull (here three) is a multi-armed bandit. ;-)