Awesome tutorial, gave me a diferent angle to view NN optimization, rally great!
@elliotwaite4 жыл бұрын
Thanks!
@VikasGupta18125 жыл бұрын
Very good explanation. Thanks !!
@tresorkoffi64354 жыл бұрын
do you know the meaning of > in cart velocity and pole angle?
@kamaljeetsahoo47524 жыл бұрын
This video is really good!!! It forced me to think AI on different aspects
@elliotwaite4 жыл бұрын
Thanks, kamaljeet! I'll be making more reinforcement learning related videos soon.
@ronniey4231 Жыл бұрын
This is just AMAZING!
@elliotwaite Жыл бұрын
Thanks!
@jonatan01i5 жыл бұрын
Very interesting video, thank you!
@middleclass37826 жыл бұрын
That was a nice video, thanks. I will wait for other future videos you mentioned about.
@Stalkie3 жыл бұрын
Cart-Pole Tunnel Syndrom, ayeeeee! *fingerguns* :D I love the plots, are they done with matplotlib?
@elliotwaite3 жыл бұрын
:D The plots were actually made with Plotly: plotly.com/python/3d-scatter-plots/
@MachinaMusings4 жыл бұрын
What text editor are you using? Also, can you share your code?
@elliotwaite4 жыл бұрын
The text editor is PyCharm. I like it mainly for it's refactoring abilities, and I also like its visual style. As for sharing my code, I looked for those files but haven't been able to find them. I got a new computer since then and I must have misplaced those files in the upgrade. If I come across them in the future, though, I'll share them on GitHub and post a comment here.
@tresorkoffi64354 жыл бұрын
@@elliotwaite do you know the meaning of > in cart velocity and pole angle?
@tresorkoffi64354 жыл бұрын
nice video... through what s the meaning of > in cart velocity and pole angle?
@bnproduction92984 жыл бұрын
velocity can be any number
@arashchitgar74453 жыл бұрын
In the scatter plot, each data point represents 2 values, angle, and velocity. However, in an episode in which the cart made it for 200 steps, we should have 200 pairs of angle and velocity values. So does each data point represent that last pair or the mean of all steps?
@elliotwaite3 жыл бұрын
Ah, I probably should have named the axis labels differently. Instead of "pole angle" for the x-axis and "pole velocity at tip" for the y-axis, I should have named them something like "the coefficient that gets multiplied by the pole angle" for the x-axis and the "the coefficient that gets multiplied by the pole velocity at the tip" for the y-axis. These values stay the same throughout the entire run. For a data point at (x: -1, y: 2), at each time step we would calculate (-1) * (pole angle) + (2) * (pole velocity at the tip), and then if that value is greater than 0, we accelerate to the right, otherwise we accelerate to the left. And then we plot the size of the dot based on how much reward was received for the entire run using that same equation for choosing an action at each step. Hope that helps make it more clear. Let me know if there is anything that is still confusing.
@arashchitgar74453 жыл бұрын
@@elliotwaite Oh, I see. Thank you so much 🙏🙏 I have another question. But I completely understand if my basic questions take your time and I really don't want to be annoying or bore you with my questions. 🙏 How did you map the datapoints on a circle with the radius of 1?
@elliotwaite3 жыл бұрын
@@arashchitgar7445 for the 3D sphere I added one more dimension that represents a new coefficient that gets multiplied by the cart's velocity and is added to the previous equation, and then if this new total is greater than zero the cart is accelerated to the right, otherwise it is accelerated to the left. And then the x, y, and z values of each point are all scaled uniformly so that the point lies on the unit sphere. And this scaling to the unit sphere just makes the data easier to visualize, and it actually doesn't misrepresent the data point since scaling the values uniformly by a positive number wouldn't change the actions that are chosen since the actions are only based on if the total is greater than zero or not. So for example, if the x, y, and z values were all multiplied by 2 (uniformly scaled by 2) then the new totals we would get from using these new coefficients would be the same as if we just multiplied the previous totals by 2, and any number greater than zero that gets multiplied by 2 will still be greater than zero, and any number less than or equal to zero that gets multiplied by 2 will still be less than or equal to zero. So the output actions for these new totals would be the same as they were before the scaling. Hope that helps. And I don't mind the questions. All questions are welcome.
@arashchitgar74453 жыл бұрын
@@elliotwaite Thanks a lot for describing it in details 🙏🙏 My problem is with that "uniformly scaled" part. I don't know how to scale (transform) the values to go from what we have at 4:30 to what we have at 4:35
@elliotwaite3 жыл бұрын
@@arashchitgar7445, ah. To do that we divide the x and y values by the distance the point is from the origin. So x becomes x / sqrt(x^2 + y^2), and y becomes y / sqrt(x^2 + y^2). This makes it so that the point is now a length of 1 away from the origin, but still in the same direction away from the origin as it was before. This is usually referred to as normalization or vector normalization.
@cyber70004 жыл бұрын
Hey I want to make a bot for a 3D game what data should I feed it
@elliotwaite4 жыл бұрын
The more closely your training data can match what the bot would get as input in the game the better. However, it might be a good idea to downsample the data if you are trying to train on image data. And then adding some reward shaping to help it learn if the game rewards are sparse should also help. Not sure if that answered your question, but let me know if you have any other questions.
@cyber70004 жыл бұрын
@@elliotwaite ok so the data that it needs most so like how to move and stuff like that
@elliotwaite4 жыл бұрын
@@cyber7000, ah, maybe I misunderstood your original question. I was thinking you were trying to train it using reinforcement learning, in which case it would learn how to move on its own and you would only feed it the data that it would have about the state of the game at that time. However, if you are trying to directly feed it the data about how to move without using reinforcement learning, that's a hard question and I don't think I have a good answer. Maybe I'm still misunderstanding what you are trying to do.
@cyber70004 жыл бұрын
@@elliotwaite its probably me not making my question clear enough but yea i want it to move accordingly to the state of the game but i dont have a proper place to train it nor the data to feed it plus i shouldnt be asking a question like this when i dont script with python or make machine learning agents
@elliotwaite4 жыл бұрын
@@cyber7000, ah, got it. Yeah, those things might need to be figured out first to be able to train a bot with reinforcement learning. No worries, I appreciate the comment.
@NeuralNerdHub6 жыл бұрын
nice (y)
@kenchang3456 Жыл бұрын
OK, the video was great, the joke was cringe-worthy. LOLOLOL :-)