What Does an RL Parameter Space Look Like?

  Рет қаралды 5,038

Elliot Waite

Elliot Waite

Күн бұрын

Пікірлер: 37
@giovannipizzato6888
@giovannipizzato6888 4 жыл бұрын
Awesome tutorial, gave me a diferent angle to view NN optimization, rally great!
@elliotwaite
@elliotwaite 4 жыл бұрын
Thanks!
@VikasGupta1812
@VikasGupta1812 5 жыл бұрын
Very good explanation. Thanks !!
@tresorkoffi6435
@tresorkoffi6435 4 жыл бұрын
do you know the meaning of > in cart velocity and pole angle?
@kamaljeetsahoo4752
@kamaljeetsahoo4752 4 жыл бұрын
This video is really good!!! It forced me to think AI on different aspects
@elliotwaite
@elliotwaite 4 жыл бұрын
Thanks, kamaljeet! I'll be making more reinforcement learning related videos soon.
@ronniey4231
@ronniey4231 Жыл бұрын
This is just AMAZING!
@elliotwaite
@elliotwaite Жыл бұрын
Thanks!
@jonatan01i
@jonatan01i 5 жыл бұрын
Very interesting video, thank you!
@middleclass3782
@middleclass3782 6 жыл бұрын
That was a nice video, thanks. I will wait for other future videos you mentioned about.
@Stalkie
@Stalkie 3 жыл бұрын
Cart-Pole Tunnel Syndrom, ayeeeee! *fingerguns* :D I love the plots, are they done with matplotlib?
@elliotwaite
@elliotwaite 3 жыл бұрын
:D The plots were actually made with Plotly: plotly.com/python/3d-scatter-plots/
@MachinaMusings
@MachinaMusings 4 жыл бұрын
What text editor are you using? Also, can you share your code?
@elliotwaite
@elliotwaite 4 жыл бұрын
The text editor is PyCharm. I like it mainly for it's refactoring abilities, and I also like its visual style. As for sharing my code, I looked for those files but haven't been able to find them. I got a new computer since then and I must have misplaced those files in the upgrade. If I come across them in the future, though, I'll share them on GitHub and post a comment here.
@tresorkoffi6435
@tresorkoffi6435 4 жыл бұрын
@@elliotwaite do you know the meaning of > in cart velocity and pole angle?
@tresorkoffi6435
@tresorkoffi6435 4 жыл бұрын
nice video... through what s the meaning of > in cart velocity and pole angle?
@bnproduction9298
@bnproduction9298 4 жыл бұрын
velocity can be any number
@arashchitgar7445
@arashchitgar7445 3 жыл бұрын
In the scatter plot, each data point represents 2 values, angle, and velocity. However, in an episode in which the cart made it for 200 steps, we should have 200 pairs of angle and velocity values. So does each data point represent that last pair or the mean of all steps?
@elliotwaite
@elliotwaite 3 жыл бұрын
Ah, I probably should have named the axis labels differently. Instead of "pole angle" for the x-axis and "pole velocity at tip" for the y-axis, I should have named them something like "the coefficient that gets multiplied by the pole angle" for the x-axis and the "the coefficient that gets multiplied by the pole velocity at the tip" for the y-axis. These values stay the same throughout the entire run. For a data point at (x: -1, y: 2), at each time step we would calculate (-1) * (pole angle) + (2) * (pole velocity at the tip), and then if that value is greater than 0, we accelerate to the right, otherwise we accelerate to the left. And then we plot the size of the dot based on how much reward was received for the entire run using that same equation for choosing an action at each step. Hope that helps make it more clear. Let me know if there is anything that is still confusing.
@arashchitgar7445
@arashchitgar7445 3 жыл бұрын
@@elliotwaite Oh, I see. Thank you so much 🙏🙏 I have another question. But I completely understand if my basic questions take your time and I really don't want to be annoying or bore you with my questions. 🙏 How did you map the datapoints on a circle with the radius of 1?
@elliotwaite
@elliotwaite 3 жыл бұрын
@@arashchitgar7445 for the 3D sphere I added one more dimension that represents a new coefficient that gets multiplied by the cart's velocity and is added to the previous equation, and then if this new total is greater than zero the cart is accelerated to the right, otherwise it is accelerated to the left. And then the x, y, and z values of each point are all scaled uniformly so that the point lies on the unit sphere. And this scaling to the unit sphere just makes the data easier to visualize, and it actually doesn't misrepresent the data point since scaling the values uniformly by a positive number wouldn't change the actions that are chosen since the actions are only based on if the total is greater than zero or not. So for example, if the x, y, and z values were all multiplied by 2 (uniformly scaled by 2) then the new totals we would get from using these new coefficients would be the same as if we just multiplied the previous totals by 2, and any number greater than zero that gets multiplied by 2 will still be greater than zero, and any number less than or equal to zero that gets multiplied by 2 will still be less than or equal to zero. So the output actions for these new totals would be the same as they were before the scaling. Hope that helps. And I don't mind the questions. All questions are welcome.
@arashchitgar7445
@arashchitgar7445 3 жыл бұрын
@@elliotwaite Thanks a lot for describing it in details 🙏🙏 My problem is with that "uniformly scaled" part. I don't know how to scale (transform) the values to go from what we have at 4:30 to what we have at 4:35
@elliotwaite
@elliotwaite 3 жыл бұрын
@@arashchitgar7445, ah. To do that we divide the x and y values by the distance the point is from the origin. So x becomes x / sqrt(x^2 + y^2), and y becomes y / sqrt(x^2 + y^2). This makes it so that the point is now a length of 1 away from the origin, but still in the same direction away from the origin as it was before. This is usually referred to as normalization or vector normalization.
@cyber7000
@cyber7000 4 жыл бұрын
Hey I want to make a bot for a 3D game what data should I feed it
@elliotwaite
@elliotwaite 4 жыл бұрын
The more closely your training data can match what the bot would get as input in the game the better. However, it might be a good idea to downsample the data if you are trying to train on image data. And then adding some reward shaping to help it learn if the game rewards are sparse should also help. Not sure if that answered your question, but let me know if you have any other questions.
@cyber7000
@cyber7000 4 жыл бұрын
@@elliotwaite ok so the data that it needs most so like how to move and stuff like that
@elliotwaite
@elliotwaite 4 жыл бұрын
@@cyber7000, ah, maybe I misunderstood your original question. I was thinking you were trying to train it using reinforcement learning, in which case it would learn how to move on its own and you would only feed it the data that it would have about the state of the game at that time. However, if you are trying to directly feed it the data about how to move without using reinforcement learning, that's a hard question and I don't think I have a good answer. Maybe I'm still misunderstanding what you are trying to do.
@cyber7000
@cyber7000 4 жыл бұрын
@@elliotwaite its probably me not making my question clear enough but yea i want it to move accordingly to the state of the game but i dont have a proper place to train it nor the data to feed it plus i shouldnt be asking a question like this when i dont script with python or make machine learning agents
@elliotwaite
@elliotwaite 4 жыл бұрын
​@@cyber7000, ah, got it. Yeah, those things might need to be figured out first to be able to train a bot with reinforcement learning. No worries, I appreciate the comment.
@NeuralNerdHub
@NeuralNerdHub 6 жыл бұрын
nice (y)
@kenchang3456
@kenchang3456 Жыл бұрын
OK, the video was great, the joke was cringe-worthy. LOLOLOL :-)
@elliotwaite
@elliotwaite Жыл бұрын
Haha, 😁
Softmax Function Explained In Depth with 3D Visuals
17:39
Elliot Waite
Рет қаралды 39 М.
PyTorch Autograd Explained - In-depth Tutorial
13:42
Elliot Waite
Рет қаралды 110 М.
Creative Justice at the Checkout: Bananas and Eggs Showdown #shorts
00:18
Fabiosa Best Lifehacks
Рет қаралды 26 МЛН
Quilt Challenge, No Skills, Just Luck#Funnyfamily #Partygames #Funny
00:32
Family Games Media
Рет қаралды 25 МЛН
Lamborghini vs Smoke 😱
00:38
Topper Guild
Рет қаралды 33 МЛН
Policy Gradient Theorem Explained - Reinforcement Learning
59:36
Elliot Waite
Рет қаралды 64 М.
How to Run PyTorch Models in the Browser With ONNX.js
22:22
Elliot Waite
Рет қаралды 31 М.
My Custom Global Keyboard Shortcuts - Hammerspoon
6:47
Elliot Waite
Рет қаралды 13 М.
Vectoring Words (Word Embeddings) - Computerphile
16:56
Computerphile
Рет қаралды 298 М.
2 Years of C++ Programming
8:20
Zyger
Рет қаралды 4,3 М.
Why Do We Use the Sigmoid Function for Binary Classification?
8:50
A Short Introduction to Entropy, Cross-Entropy and KL-Divergence
10:41
Aurélien Géron
Рет қаралды 355 М.
Reinforcement Learning: Machine Learning Meets Control Theory
26:03
Steve Brunton
Рет қаралды 285 М.
Let's build GPT: from scratch, in code, spelled out.
1:56:20
Andrej Karpathy
Рет қаралды 4,9 МЛН
Why Does This Pattern Emerge?
5:32
Elliot Waite
Рет қаралды 1,6 М.
Creative Justice at the Checkout: Bananas and Eggs Showdown #shorts
00:18
Fabiosa Best Lifehacks
Рет қаралды 26 МЛН