Multi-Armed Bandit : Data Science Concepts

  Рет қаралды 83,606

ritvikmath

ritvikmath

Күн бұрын

Making decisions with limited information!

Пікірлер: 137
@Sad-mm8tm
@Sad-mm8tm 3 жыл бұрын
I hope you will continue making videos forever. Your explanations are the best I've ever seen anywhere + the wide choice of topics gives me food for thought when dealing with my own optimization problems.
@ritvikmath
@ritvikmath 3 жыл бұрын
Thank you :) I'm happy to help
@VahidOnTheMove
@VahidOnTheMove Жыл бұрын
If he makes videos forever, we'll get zero regrets.
@abhinavbhatia5114
@abhinavbhatia5114 4 ай бұрын
​@@ritvikmathdon't let this channel die man
@savasozturk00
@savasozturk00 9 күн бұрын
After watching 5 videos, finally I found the best lecture teller for this topic. The examples are great, Thanks.
@khaalidmcmillan9260
@khaalidmcmillan9260 Жыл бұрын
Well said, needed a refresher after not seeing this for a while and this nailed it. Hopefully you've gone into more advanced topics like MAB reinforcement learning
@heteromodal
@heteromodal 3 жыл бұрын
Great video, and it's really nice listening to you! Thank you :)
@simoneorlando2496
@simoneorlando2496 2 жыл бұрын
It would be great if you made a whole playlist where you explain the statistics for machine learning by explaining the formulas in an intuitive way like you do (you make me understand them all). For example, explain the various distributions and their meaning, statistical tests (p-value), etc. Thank you so much for the work you do and the knowledge you share!
@malice112
@malice112 9 ай бұрын
What a great and easy to understand explanation of MAB - thank you for this!!!!
@CaioCarneloz
@CaioCarneloz 2 жыл бұрын
The way you explain is stunning, what a awesome lesson.
@111dogger
@111dogger 3 жыл бұрын
This is the best explanation I have come across so far for the Upper Bound Confidence concept. Thank you!
@adanulabidin
@adanulabidin 2 ай бұрын
What an amazing explanation! Thank you so much. Keep making such videos.
@marcelobeckmann9552
@marcelobeckmann9552 2 жыл бұрын
Your explanations, didactics, and dynamism are amazing, way better than several university professors. Well done!
@abdulsami5843
@abdulsami5843 2 жыл бұрын
A thing I absolutely like is how palatable you make these concepts, not too mathematical/theoratical and not overly simplified, just the right balance ( € - greedy is set right 😉)
@vahidsohrabi94
@vahidsohrabi94 2 жыл бұрын
I'm grateful to you because of this great tutorial.
@krittaprottangkittikun7740
@krittaprottangkittikun7740 2 жыл бұрын
This is so clear to me. Thank you for making this video!
@faadi4536
@faadi4536 2 жыл бұрын
What an amazing explanation. I am taking a Machine Learning Course and he tried to explain the concept using Bandits but couldn't quite really grasp it in detail. I understood what we are trying to figure out but wasn't quite their yet. You have made it so much easier. Kudos to You Brother.
@gabrieldart9943
@gabrieldart9943 10 ай бұрын
This is so cool! Thanks for your clear explanation.
@DarkNinja-24
@DarkNinja-24 2 жыл бұрын
Wow, great example and amazing explanation!
@bilalbayrakdar7100
@bilalbayrakdar7100 Жыл бұрын
Bro I completed my CS degree with your help and now I got accepted for master and you are still here to help. You are a true man, thx mate
@shahnazmalik6553
@shahnazmalik6553 3 жыл бұрын
Your teaching method is highly appreciated. Please make lectures on statistics and machine learning algorithms
@raphaeldayan
@raphaeldayan 3 жыл бұрын
Amazing explanation, very clear, thank you Sr
@A.n.a.n.d.k.r.
@A.n.a.n.d.k.r. 9 ай бұрын
Awesome cool technique just got hooked to this
@Dr.RegulaSrilakshmi
@Dr.RegulaSrilakshmi 2 ай бұрын
U r just awesome ,any person who doesn't have any knowledge of Reinforcement learning can understand,Keep up the spirit...cheers
@rikki146
@rikki146 Жыл бұрын
i cannot thank you enough for makin this excellent vid!
@traiancoza5214
@traiancoza5214 3 жыл бұрын
Perfectly explained. Genius.
@softerseltzer
@softerseltzer 3 жыл бұрын
Love your videos, the quality just keeps going up! PS. the name of the slot machine is "One-armed bandit", because of the long arm-like lever that you pull to play.
@irishryano
@irishryano 3 жыл бұрын
....And the bandit bc it’s the WORST odds in every casino
@spicytuna08
@spicytuna08 Жыл бұрын
i guess the slot machine is a bandit cause it keeps robbing money from the players.
@llescarini
@llescarini 3 жыл бұрын
Subscribed since few days, your videos are more than excellent! Amazing skill for teaching, thanks a lot.
@ritvikmath
@ritvikmath 3 жыл бұрын
Awesome, thank you!
@aryankr
@aryankr Жыл бұрын
Thank you for a great explanation!!
@yongnaguo8772
@yongnaguo8772 3 жыл бұрын
Thanks! Very good explanation!
@warreninganji7881
@warreninganji7881 3 жыл бұрын
crystal clear explanation worth a subscription for more👌
@georgiakouvoutsaki7877
@georgiakouvoutsaki7877 Жыл бұрын
This is amazing !
@velocfudarks8488
@velocfudarks8488 3 жыл бұрын
Thanks a lot! Really good representation!
@NoNTr1v1aL
@NoNTr1v1aL 2 жыл бұрын
Amazing video!
@nintishia
@nintishia 2 жыл бұрын
Very clear explanation. Thanks for this video.
@rutgervanbasten2159
@rutgervanbasten2159 Жыл бұрын
really nice job! thank you
@amirnouripour5501
@amirnouripour5501 Жыл бұрын
Thanks a lot. Very insightful!
@soundcollective2240
@soundcollective2240 2 жыл бұрын
Thanks, it was quite useful, heading to your Thompson Sampling video :)
@abogadorobot6094
@abogadorobot6094 2 жыл бұрын
WOW! That's was brilliant! Thank you!
@seowmingwei9426
@seowmingwei9426 3 жыл бұрын
Well explained! Thank you!
@jams0101
@jams0101 3 жыл бұрын
awesome video ! thanks so much
@avadheshkumar1488
@avadheshkumar1488 3 жыл бұрын
excellent explanation!!! thanks
@fridmamedov270
@fridmamedov270 4 ай бұрын
Simple and accurate. That is it. Thanks!!!
@anaydongre1226
@anaydongre1226 3 жыл бұрын
Thanks so much for explaining this in detail !!
@ritvikmath
@ritvikmath 3 жыл бұрын
You are so welcome!
@spicytuna08
@spicytuna08 Жыл бұрын
we need to a person like you to democratize these important concepts cannot express how grateful i am to understand these important concepts which i have struggled in the past.
@jc_777
@jc_777 Жыл бұрын
enough exploration for good youtube lecture on ml. i should keep exploit this guy. 0 regret guaranteed :)
@jinpark9871
@jinpark9871 3 жыл бұрын
Thanks, your work is really awesome.
@ritvikmath
@ritvikmath 3 жыл бұрын
Thank you too!
@dr.nalinfonseka7072
@dr.nalinfonseka7072 Жыл бұрын
Excellent explanation!
@welidbenchouche
@welidbenchouche 10 ай бұрын
This is more than enough for me
@alirezasamadi5804
@alirezasamadi5804 2 жыл бұрын
You explained so good
@user-bt5il9zq8p
@user-bt5il9zq8p 9 ай бұрын
Best example ever!!!
@josemuarnapoleon
@josemuarnapoleon 2 жыл бұрын
Nice explanation!
@dr.kingschultz
@dr.kingschultz 2 жыл бұрын
You are very good! Please explore more this topic. Also include the code and explain it
@SDKIM0211
@SDKIM0211 2 жыл бұрын
Love your videos. To understand the average regret value for exploitation, which extra material should we refer to? Why not 604?
@hameddadgour
@hameddadgour Жыл бұрын
I just realized that I need to explore more to maximize my happiness. Thank you Multi-Amed Bandit :)
@nassehk
@nassehk 3 жыл бұрын
I am new to your channel. You have a talent in teaching my friend. I enjoy your content a lot. Thanks.
@ritvikmath
@ritvikmath 3 жыл бұрын
Thanks!
@sabanburaknazlm1381
@sabanburaknazlm1381 5 ай бұрын
Well explained!
@abdulrahmankerim2377
@abdulrahmankerim2377 2 жыл бұрын
Thanks!
@shahulrahman2516
@shahulrahman2516 9 күн бұрын
Great video
@TheMuser
@TheMuser 10 ай бұрын
Nicely explained!
@davidkopfer3259
@davidkopfer3259 3 жыл бұрын
Very nice explanation, thanks!
@ritvikmath
@ritvikmath 3 жыл бұрын
Glad it was helpful!
@lukekim8304
@lukekim8304 2 жыл бұрын
I love this vid! It would be great if you could also do more videos on online learning and regret minimization 😆😆😆
@jroseme
@jroseme Жыл бұрын
This was a useful supplement to my read of Reinforcement Learning by Sutton & Barto. Thanks.
@ritvikmath
@ritvikmath Жыл бұрын
Glad it was helpful!
@snehotoshbanerjee1938
@snehotoshbanerjee1938 2 ай бұрын
Best explanation!!
@ritvikmath
@ritvikmath 2 ай бұрын
Glad you think so!
@rifatamanna7895
@rifatamanna7895 3 жыл бұрын
It was awesome technique 👍👍 thanks
@ritvikmath
@ritvikmath 3 жыл бұрын
thanks for your words!
@tariqrashid6748
@tariqrashid6748 3 жыл бұрын
Great explanation
@maxencelaisne4141
@maxencelaisne4141 3 жыл бұрын
Thank you so much, I passed my exam thanks to your explanation :)
@ritvikmath
@ritvikmath 3 жыл бұрын
Glad it helped!
@nastya831
@nastya831 3 жыл бұрын
thanks man, this is truly helpful! 6 min at 2x and I got it all
@ritvikmath
@ritvikmath 3 жыл бұрын
Great to hear!
@bassamry
@bassamry Жыл бұрын
very clear and simple explaination!
@ritvikmath
@ritvikmath Жыл бұрын
Glad it was helpful!
@TheMuser
@TheMuser 10 ай бұрын
I have explored and finally decided that I am going to exploit you! *Subscribed*
@francisliubin
@francisliubin 8 ай бұрын
Thanks for the great explanation. What is the essential difference between contextual bandit (CB) problem vs multi-arm bandit (MB) problem? How does the difference impact the strategy?
@zahrashekarchi6139
@zahrashekarchi6139 Жыл бұрын
Thanks a lot for this video! Just one thing I would like to find out here is where we store the result of our learning? like some policy or parameter to be updated?
@hypebeastuchiha9229
@hypebeastuchiha9229 2 жыл бұрын
My exam is in 2 days and I'm so close to graduating with the highest grades. Thanks for your help!
@sau002
@sau002 Жыл бұрын
Brilliant
@stanislavezhevski2877
@stanislavezhevski2877 3 жыл бұрын
Great explanation, can you leave a link to the code, which you used in simulations ?
@ritvikmath
@ritvikmath 3 жыл бұрын
Thanks! I have a follow up video on Multi-Armed Bandit coming out next week and the code will be linked in the description of that video. Stay tuned!
@Status_Bleach
@Status_Bleach Жыл бұрын
Thanks for the vid boss. How exactly did you calculate the average rewards for the Exploit Only and Epsilon-Greedy strategies though?
@thinkanime1
@thinkanime1 Жыл бұрын
Really good video
@ritvikmath
@ritvikmath Жыл бұрын
Thanks!
@calinobrocea7502
@calinobrocea7502 2 жыл бұрын
Hello, thank you for the awesome explanation, it really helped me a lot. But I want to ask you one additional question on this topic. Do you know some method of tuning the epsilon parameter? I tried searching on google, but I did not find anything helpful. Thank you!
@PhilipKirkbride
@PhilipKirkbride 3 жыл бұрын
Related to regret, we never really know the true distributions (since we can only infer from taking samples). Would you basically just use your estimated distributions at the end of the 300 days as the basis for calculating regret?
@yitongchen75
@yitongchen75 3 жыл бұрын
Cool explanation. Can you also talk about Upper Confidence Bound Algorithm relating to this?
@ritvikmath
@ritvikmath 3 жыл бұрын
Good timing! I have a video scheduled about UCB for Multi-Armed Bandit. It will come out in about a week :)
@annahuo6694
@annahuo6694 3 жыл бұрын
Great videos ! Thanks for your clarification. It's much clearer for me now. But I just wonder how you calculate the 330 regret in the case of exploitation only ?
@ritvikmath
@ritvikmath 3 жыл бұрын
Good question. You can get that number by considering all possible cases of visiting each restaurant on the first three days. Something like, consider the probability that of the first three days of visits, what is the probability that restaurant 1 is best, vs. probability restaurant 2 is best, etc. You can do this via pencil and paper but I'd recommend writing a simple computer simulation instead.
@annahuo6694
@annahuo6694 3 жыл бұрын
@@ritvikmath Thank you for this prompt response. I think I get the idea from the epsilon greedy formula (option number 3 in the example). Thank you a lot, your video is really helpful :)
@debashishbhattacharjee1112
@debashishbhattacharjee1112 Жыл бұрын
Hello Ritvik This was a very helpful video. You have explained a concept so simply. Hope you continue making such informative videos. Best wishes.
@ritvikmath
@ritvikmath Жыл бұрын
Thanks so much!
@victorkreitton2268
@victorkreitton2268 2 жыл бұрын
What ML books do you recommend or use?
@jonathanarias2729
@jonathanarias2729 2 жыл бұрын
Why 330 is the response in the explotation example? Should t be; 3000-2396=604??
@TheFobJang
@TheFobJang Жыл бұрын
Would you say exploit only strategy is the same as the eplore-then-commit strategy (also know as explore-then-exploit)?
@bobo0612
@bobo0612 3 жыл бұрын
Hi! Thank you for your video. I have a question at 6:28. Why the roh is not simply 3000 - 2396?
@senyksia
@senyksia 3 жыл бұрын
2396 was the happiness for that specific case, where restaurant #2 was chosen to exploit. 330 is the (approximate) average regret for every case. So 3000 - 2396 would be correct if you were only talking about that unique case.
@myoobies
@myoobies 3 жыл бұрын
@@senyksia Hey, what do you mean by average regret for every case? I'm still having trouble wrapping my head around this step. Thanks!
@madmax2442
@madmax2442 2 жыл бұрын
@Bolin WU I know it's 8 months already but I wanted to know whether you got the answer or not. I also have the same doubt.
@wenzhang5879
@wenzhang5879 Жыл бұрын
Could you explain the difference between the MAB problem and the ranking and selection problem? Thanks
@l2edz
@l2edz 3 жыл бұрын
Coolest prof ever! 😎
@ritvikmath
@ritvikmath 3 жыл бұрын
haha!
@quanghoang3801
@quanghoang3801 8 ай бұрын
Thanks! I really wish the RLBook authors could explain the k-armed bandit problem as clearly as you do, their writing is really confusing.
@sampadmohanty8573
@sampadmohanty8573 3 жыл бұрын
I knew everything from the start. Ate at the same place for 299 days and got pretty bored. So watched youtube and found this video. Now I am stuck at this same restaurant on the 300th day to minimize my regret. Such a paradox. Just kidding. Amazing explanation and example.
@yannelfersi3510
@yannelfersi3510 2 ай бұрын
can you share the calculation for the regret in case of exploitation only?
@wenlouismao
@wenlouismao 3 жыл бұрын
Any books on this topic you’d recommend? How do you come across this topic?
@jimbocho660
@jimbocho660 2 жыл бұрын
Try 'Bandit Algorithms for Website Optimization: Developing, Deploying, and Debugging' by John Myles White. It's short and sweet.
@Akshay-fk8bu
@Akshay-fk8bu 6 ай бұрын
@shantanurouth6383
@shantanurouth6383 3 жыл бұрын
I could not understand how it turned out to be 330, could you explain please?
@manabsaha5336
@manabsaha5336 2 жыл бұрын
Sir, video on softmax approach.
@Phil-oy2mr
@Phil-oy2mr 3 жыл бұрын
In the exploit only case, would there be a way to compute the regret mathematically without a simulation?
@softerseltzer
@softerseltzer 3 жыл бұрын
You could calculate the probability of picking one restaurant over the others and then sum over the expected rewards weighed with the aforementioned probabilities. So for example if one of the restaurants is clearly much better, you will most likely pick it in the initial one-shot exploration phase so it's probability will be close to 1. The probability of picking one restaurant over another could perhaps be derived using cumulative distribution functions of the initial reward distributions. One could imagine a simple example with discrete instead of continuous distributions. Say with any restaurant having only three options: a certain probability for a bad meal (reward 1), a mediocre meal (reward 2) and a good meal (reward 3).
@user-sl6gn1ss8p
@user-sl6gn1ss8p 2 жыл бұрын
*insert taleb nassim talking about the possibility that a meal kills you or changes you whole life* or something like that : p
@Trucmuch
@Trucmuch 3 жыл бұрын
Slot machines were not called bandit but one-arm bandit (they "stole" your money and the bulky box with one lever on its side kind of looked like a one-arm man. So the name of this problem is kind of a pun, a slot machine with more than one levers you can pull (here three) is a multi-armed bandit. ;-)
@ritvikmath
@ritvikmath 3 жыл бұрын
Wow I did not know that, thanks !!
@sunIess
@sunIess 3 жыл бұрын
Assuming a finite horizon (known beforehand), aren't you (in expectation) better off doing all the exploration before starting to exploit?
@ritvikmath
@ritvikmath 3 жыл бұрын
You've just made a very good point. One strategy I did not note is an epsilon-greedy strategy where the probability of explore in the beginning is very high and then it goes to 0 over time. This would likely be a good idea.
@chyldstudios
@chyldstudios 3 жыл бұрын
Why didn't you discuss the best strategy ... Bayesian Bandits with uniform priors coming from a Beta distribution
@EugeneBos
@EugeneBos 3 жыл бұрын
What if we will go in the worst restaurant 299 days and then we go to the best one? I think it is possible to get very high happiness(close to 3k for 1 visit) but regret may be as well high...
@EugeneBos
@EugeneBos 3 жыл бұрын
😂😂😂
@shraddhashah1284
@shraddhashah1284 3 ай бұрын
In epsilon greedy method how comes 2907 kindly tell me calculate on priority basis.
@geraldwu5753
@geraldwu5753 3 жыл бұрын
With only 3 options, wouldn't exploration naturally become exploitation?
The SIR Model : Data Science Concepts
20:33
ritvikmath
Рет қаралды 5 М.
Best Multi-Armed Bandit Strategy? (feat: UCB Method)
14:13
ritvikmath
Рет қаралды 38 М.
2000000❤️⚽️#shorts #thankyou
00:20
あしざるFC
Рет қаралды 16 МЛН
小女孩把路人当成离世的妈妈,太感人了.#short #angel #clown
00:53
UFC Vegas 93 : Алмабаев VS Джонсон
02:01
Setanta Sports UFC
Рет қаралды 201 М.
Hidden Markov Model : Data Science Concepts
13:52
ritvikmath
Рет қаралды 112 М.
Thompson Sampling : Data Science Concepts
13:16
ritvikmath
Рет қаралды 33 М.
Multi-Armed Bandits and A/B Testing
19:01
Jay Feng
Рет қаралды 6 М.
Policy Gradient Methods | Reinforcement Learning Part 6
29:05
Mutual Information
Рет қаралды 24 М.
Accept-Reject Sampling : Data Science Concepts
17:49
ritvikmath
Рет қаралды 62 М.
Multi-Armed Bandits: A Cartoon Introduction - DCBA #1
13:59
Academic Gamer
Рет қаралды 40 М.
Markov Decision Processes - Computerphile
17:42
Computerphile
Рет қаралды 159 М.
What the Heck is Bayesian Stats ?? : Data Science Basics
20:30
ritvikmath
Рет қаралды 60 М.
Markov Chains : Data Science Basics
10:24
ritvikmath
Рет қаралды 61 М.