Monte Carlo in Reinforcement Learning

  Рет қаралды 7,388

CodeEmporium

CodeEmporium

6 ай бұрын

Let's talk about how Monte Carlo methods can be used in reinforcement learning
RESOURCES
[1] Other Monte Carlo Video: • Running Simulations as...
PLAYLISTS FROM MY CHANNEL
⭕ Reinforcement Learning: • Reinforcement Learning...
Natural Language Processing: • Natural Language Proce...
⭕ Transformers from Scratch: • Natural Language Proce...
⭕ ChatGPT Playlist: • ChatGPT
⭕ Convolutional Neural Networks: • Convolution Neural Net...
⭕ The Math You Should Know : • The Math You Should Know
⭕ Probability Theory for Machine Learning: • Probability Theory for...
⭕ Coding Machine Learning: • Code Machine Learning
MATH COURSES (7 day free trial)
📕 Mathematics for Machine Learning: imp.i384100.net/MathML
📕 Calculus: imp.i384100.net/Calculus
📕 Statistics for Data Science: imp.i384100.net/AdvancedStati...
📕 Bayesian Statistics: imp.i384100.net/BayesianStati...
📕 Linear Algebra: imp.i384100.net/LinearAlgebra
📕 Probability: imp.i384100.net/Probability
OTHER RELATED COURSES (7 day free trial)
📕 ⭐ Deep Learning Specialization: imp.i384100.net/Deep-Learning
📕 Python for Everybody: imp.i384100.net/python
📕 MLOps Course: imp.i384100.net/MLOps
📕 Natural Language Processing (NLP): imp.i384100.net/NLP
📕 Machine Learning in Production: imp.i384100.net/MLProduction
📕 Data Science Specialization: imp.i384100.net/DataScience
📕 Tensorflow: imp.i384100.net/Tensorflow

Пікірлер: 23
@Akshaylive
@Akshaylive 6 ай бұрын
One important reason to use MC methods is cases where we do not have access to the markov decision process (MDP). The example in this video does have a known MDP so this can be solved using bellman equations as well.
@0xabaki
@0xabaki 3 ай бұрын
I would use Monte Carlo to predict if there will be food at the office tomorrow because It's so unpredictable when I have to bring in food lol
@syeshwanth6790
@syeshwanth6790 5 ай бұрын
Loved the way decision making of a robot using Q table was explained in this video.
@CodeEmporium
@CodeEmporium 5 ай бұрын
Glad the explanation was good. Thanks for the comment :)
@NG-ec2th
@NG-ec2th 3 ай бұрын
In S1 (8:08) the greedy action is to go up, actually...
@AshmaBhagad
@AshmaBhagad 11 күн бұрын
There is no grid to go up in s1 where it starts there are only two options right and down.
@NG-ec2th
@NG-ec2th 10 күн бұрын
@@AshmaBhagad Then why is there a payoff value in up?
@syeshwanth6790
@syeshwanth6790 5 ай бұрын
0.5 sq units. The area of square = 1*1 = 1 sq unit. Half of the balls dropped fell into the diamond, which means the diamond occupies half the area of the square (Area of diamond = (1/2) * 1 sq unit = 0.5 sq unit).
@CodeEmporium
@CodeEmporium 5 ай бұрын
Ding ding ding. That’s correct :)
@syeshwanth6790
@syeshwanth6790 5 ай бұрын
@@CodeEmporium Question 2) B. Frank was updating Q-values based on observed rewards from simulation.
@devinbrown9925
@devinbrown9925 Ай бұрын
For Quiz Time 1 at 3:47, Shouldn't the answer be B: 0.5 sq units. I think the entire premise is that you know the area of a region, you know the ratio of balls dropped in both regions, and the ratio of balls dropped equals the ratio of area. Therefore you can use this information to determine the unknown area.
@AakashKumarDhal
@AakashKumarDhal Ай бұрын
Answer for Quiz2: Option 'B' frank was updating Q values based on observed rewards from simulated episodes.
@florianneugebauer3042
@florianneugebauer3042 2 ай бұрын
where does the number of the states is coming from? where is state 17??
@chinmaibarai1750
@chinmaibarai1750 6 ай бұрын
Are you from Bharat 🇮🇳
@WeeHooTM
@WeeHooTM Ай бұрын
8:09 I stopped watching when he thinks 1.5 is greater than 2.1 lmao
@AshmaBhagad
@AshmaBhagad 11 күн бұрын
In state s1 the agent didnt actually have the option to go up. So maybe thats why 2.1 doesnt matter because the agent can only select the best action depending on the state it is in. At the start he clearly told that the environment has 9 grids(states)
@BizillionAtoms
@BizillionAtoms 6 ай бұрын
I think you should include the answers of the quizes in the video at some point. Also in 8:00 you said the highest is 1.5 but it is 2.1. Most importantly, I think these moments for frank where cringe and it distracted me from focusing. Target audience is not kids most likely (at least I think so), so they would consider it cringe too. No offense
@juliaplanidina1565
@juliaplanidina1565 5 ай бұрын
found it funny, not a kid, but helped me concentrate more😂
@Falcon8856
@Falcon8856 5 ай бұрын
Didn't find it funny, am a kid, but appreciate the light humor and effort put into these videos. Didn't really distract me.
@servicer6969
@servicer6969 6 күн бұрын
Stop being such a hater. The reason why 1.5 is the highest is because action with assumed reward 2.1 is illegal in state 1 (you can't move up because of the wall). -p.s. using the word cringe is cringe
@ayoubelmhamdi7920
@ayoubelmhamdi7920 6 ай бұрын
this is the difficult way to teach Monte Carlo 😂
@swphsil3675
@swphsil3675 6 ай бұрын
difficult for absolute beginners I think, otherwise the video was easy to follow for me.
@ayoubelmhamdi7920
@ayoubelmhamdi7920 6 ай бұрын
@@swphsil3675do you think that the Monte-Carlo should start learned from how the random could be gives a situation that control by math probability, starting how a coin could have two possibility, no one known if he will win or not to Play many times to be any one will 50% of times
Deep Q-Networks Explained!
10:51
CodeEmporium
Рет қаралды 16 М.
Proximal Policy Optimization | ChatGPT uses this
13:26
CodeEmporium
Рет қаралды 10 М.
Be kind🤝
00:22
ISSEI / いっせい
Рет қаралды 18 МЛН
100❤️ #shorts #construction #mizumayuuki
00:18
MY💝No War🤝
Рет қаралды 20 МЛН
SHE WANTED CHIPS, BUT SHE GOT CARROTS 🤣🥕
00:19
OKUNJATA
Рет қаралды 14 МЛН
Chips evolution !! 😔😔
00:23
Tibo InShape
Рет қаралды 42 МЛН
Reinforcement Learning: on-policy vs off-policy algorithms
14:47
CodeEmporium
Рет қаралды 6 М.
The Most Important Algorithm in Machine Learning
40:08
Artem Kirsanov
Рет қаралды 225 М.
How Well Can DeepMind's AI Learn Physics? ⚛
7:18
Two Minute Papers
Рет қаралды 1,6 МЛН
Q Learning simply explained | SARSA and Q-Learning Explanation
9:46
Policy Gradient Theorem Explained - Reinforcement Learning
59:36
Elliot Waite
Рет қаралды 56 М.
Text Embeddings Reveal (Almost) As Much As Text
37:06
Yannic Kilcher
Рет қаралды 39 М.
Monte Carlo Simulation
10:06
MarbleScience
Рет қаралды 1,4 МЛН
Policy and Value Iteration
16:39
CIS 522 - Deep Learning
Рет қаралды 128 М.
Monte Carlo And Off-Policy Methods | Reinforcement Learning Part 3
27:06
Mutual Information
Рет қаралды 35 М.
Be kind🤝
00:22
ISSEI / いっせい
Рет қаралды 18 МЛН