MIT 6.S094: Deep Reinforcement Learning for Motion Planning

Рет қаралды 235,853

Күн бұрын

Пікірлер: 107

@chuyhable 7 жыл бұрын

The concept of being able to virtually attend the class as it happens in MIT is awesome. Thank you Lex for taking the time to share the videos!

@bamoida42 7 жыл бұрын

It's even better. You can pause at any moment and continue only after you understood what was just said. Or you can skip ahead, using the provided slides as a reference. This is so much better than the old school university I attended back in the days. Oh the time I wasted trying to not fall asleep!

@mekhigary3241 3 жыл бұрын

Not sure if you guys gives a shit but if you are bored like me atm you can watch all of the new movies on InstaFlixxer. I've been watching with my brother for the last couple of days :)

@orlandoaiden6250 3 жыл бұрын

@Mekhi Gary Definitely, I have been watching on instaflixxer for since december myself =)

@turhancan97 Жыл бұрын

It's amazing how technology allows us to access such high-quality educational content from anywhere in the world. Huge thanks to Lex for sharing these insightful and inspiring videos with us!

@jacobrafati4200 7 жыл бұрын

You are the best machine learning teacher I have ever seen. Thanks for generosity in teaching your knowledge :)

@arthurk7270 7 жыл бұрын

What about Andrew Ng? :(

@cynicalmedia757 3 жыл бұрын

Learning about one of my favorite topics from Lex is just awesome. Thanks to this humble legend for sharing this!

@jayhu6075 5 жыл бұрын

First you are a good human and a fantastic teacher. Because you share the knowledge with the people who have not has the possibility to study by a university. Thanks for that and god bless you.

@HarishNarayanan 7 жыл бұрын

I enjoy how this lecture series slips in so many life lessons while masquerading as a series on the technical aspects of deep learning.

@generalqwer 7 жыл бұрын

umm wat? its not masquerading as anything. This literally is a series (of videos) on the technical aspects of deep learning.

@shanequillen5342 7 жыл бұрын

I agree completely, your comment though, reminds me of steve Jobs when he said learn a cpu lang. As it will teach you how to think.

@anonymoususer3741 6 жыл бұрын

Greeks had a word... Metanoia. Indeed fascinating to understand the inner workings of self through abstraction

@marsilzakour172 7 жыл бұрын

Thanks a lot it really helped me understand reinforcement learning, one year ago i was new to machine learning and i have known only basic structure of neural networks and feed forward process, i did not know anything about back prop or derivative based optimization so i have made a neural network that plays a simple game similar to Atari games and trained it with a genetic algorithm that aims to maximize score of player during a simulation round. Now it is really nice to find academic materials about that

@Tibetan-experience 7 жыл бұрын

Thank you lex for uploading this awesome video here for free.

@khayyam2302 4 жыл бұрын

Presentation style of the trainer is awesome.

@Sasipano 7 жыл бұрын

Thanks very much Lex. I greatly appreciate your efforts of sharing your knowledge with the world. You Rock :)

@pauldacus4590 6 жыл бұрын

At 36:56 it seems like you can reduce the reward to Q(t+1) - Q(t), or just the simple increase in the "value" of the state in time period t+1 over period t. Then the discount rate (y) can be applied to that gain to discount it back to time t. The learning rate (a) then becomes a "growth of future state" valuations. Then the most important thing is that y * a > 1, or your learning never overcomes the burden of the discount rate. This is really similar to the dividend growth model of stock valuation: D/(k-g) D=dividend at time 0, k=discount rate, g=growth rate. The strange similarity is that when the "Learning rate" (feels like this should be "Applied Learning Rate") is greater than the discount rate, there is "growth" in future states, otherwise there is contraction (think The Dark Ages). In the dividend discount model, whenever the growth rate is extrapolated into infinity as higher than the discount rate, the denominator goes to zero and below, and the valuation goes to infinity. Yeah, I like this guys analogies translating the bedrock of machine learning, etc to fundamental life lessons. Never stop learning... and then doing!

@daryoushmehrtash7601 7 жыл бұрын

It is interesting that many NAND gates can be used to implement sum function. Then many sum functional can be used to learn NAND.

@alverlopez2736 7 жыл бұрын

NAND-ception

@Nightlurk 7 жыл бұрын

Am I the only one that finds the explanations to be quite cumbersome and not easily digestible!? I'm having a hard time following some things, gotta pause, go back, rewatch segments, speculate on a lot of things and extrapolate on speculations then rewatch hoping to match speculations on stated facts to confirm my understanding is correct. I'm not an expert in teaching nor am I a genius but when the lesson leaves so many loose ends and raises more question than it answers, it might not be properly optimized for teaching. I do appreciate the effort though and acknowledge the fact that it's a difficult subject, I'm a visual learner and it's a pain in the ass to find material that suits me on this subject.

@joshuat6124 6 жыл бұрын

Have you tried Coursera yet? Machine Learning By Andrew Ng

@mchrgr2000 6 жыл бұрын

totally agree with you. I appreciate the effort, but It is not exactly what I expected from a top level uni.

@TpBrass0 7 жыл бұрын

wow now we have subtitles! Thank you very much!

@fandu7181 7 жыл бұрын

FYI - The reinforcemnet learning part starts at 19:46

@amirsadrpour3456 7 жыл бұрын

love when people do this , saves us all so much time ... thank you !!

@tresorkoffi6435 6 жыл бұрын

@@amirsadrpour3456 sure lol

@tresorkoffi6435 6 жыл бұрын

thank u dear

@QuantamTrails 6 жыл бұрын

This is so refreshing! To break down the human psyche to mathematical terms! Mind blown 🤯!! You nailed it!! When science and psychology come together so beautifully like this is an inspiring site! You got my attention 😊

@sharp389 2 жыл бұрын

I like Lex a lot, but the objective function for Q I think is wrong (32:49). Optimal Q-values are intended to maximize the *cumulative* future reward, not just reward at the next time step. One could easily imagine that the best action to take in one's current state delivers a loss at the next step, but in the long term achieves the greatest net gain in reward.

@enochsit 7 жыл бұрын

Thanks, and it is a good way to deliver the material in a motivation speech style !!!

@T4l0nITA 5 жыл бұрын

I can see he has a lot of "Work" behind him.

@amothe83 4 ай бұрын

This lecture is gold

@BhargavBardipurkar42 7 жыл бұрын

thank you for the lecture.

@prest0n755 4 жыл бұрын

at 7:31your slide shows a threshold for the activation function in the equation but the animation shows a sigmoid for the activation. That might confuse some MIT folks.

@anonymoususer3741 6 жыл бұрын

54:31 So basically you try to predict the result R' of performing A' on a past state S on which you did A and got result R, and then readjust your weights to make your prediction and the actual R' you got closer?

@arulkumar8433 7 жыл бұрын

Thanks for the great lecture! I have one doubt. At 56:43, in the demo video (Atari breakout) of the model trained after 240 minutes, how did the ball go to the top portion of the game without any path from the bottom (i.e., without making a path from the bottom by breaking the tiles)?

@mankaransingh2599 5 жыл бұрын

lol wth

@gal1leo885 2 жыл бұрын

Thanks! Great lecture!!

@yanadsl 7 жыл бұрын

Hi, I wonder why the Q-Learning equations are different in the same slide? at 36:11 the equation above is Q[s', a']=... , but the pseudo code down there is Q[s, a]=...

@mariaterzi1763 6 жыл бұрын

excellent lecture! Thank you!

@judenchd 7 жыл бұрын

Thanks Lex!

@AdityaVithaldas 7 жыл бұрын

probably the most intuitive explanation of how neural networks operate, gets scored and fires forward with actual examples , despite the topic being about a much more specific topic (SDC). thank you for that. though still not able to intuitively visualize the backpropogation steps and weights updating with live values. any other reference that can help with that?

@salman3112 7 жыл бұрын

I am not sure about the live examples part, but CS231n's explanation of Backprop is pretty intuitive. I finally understood backprop there.

@offchan 7 жыл бұрын

People usually draw a neural network as a bunch of connections and that is not good for understanding backpropagation because one neuron does so many things. It does add, multiply and non-linearity. I find that drawing neural network in the form of a data flow graph helps you visualize things better and backprop is also easier to understand that way.

@AdityaVithaldas 7 жыл бұрын

One of the definitions that I have heard of Deep Learning, is that it is typically a neural network which is more than 1 layer deep. However, in this lecture, we seem to referring to (Deep) Reinforcement Learning as a broad Q table implementation which learns over time (not necessarily implemented using a neural network, but more of a weights updation over time). Was curious to know / in typical parlence, when we refer to deep learning - is it always meant to be using neural network based approaches?

@lexfridman 7 жыл бұрын

Yes, deep learning is strictly referring to neural network based approaches. This lecture builds up to deep reinforcement learning by first introducing traditional reinforcement learning.

@prannakumify 7 жыл бұрын

Is there a offline version of the simulator with visualization, such the I could download the ConvNetJS and run ? I would like to play with and understand the effects of max speed, number of lanes, the training algorithm etc!

@AlexeyKrivitsky 6 жыл бұрын

I find the explanations of Q and deep Q a bit unclear

@stanleyman4100 7 жыл бұрын

Great lesson

@AnjaliChadha 7 жыл бұрын

How did we decide the image size should be 28*28? That 784 pixels are neither too much or too less for the training model.

@太郎田中-i2b 7 жыл бұрын

Great lecture! The slides seem to be the ones from lecture 1.

@lexfridman 7 жыл бұрын

Fixed. Thanks for catching that.

@TheGodlessGuitarist 5 жыл бұрын

Lex, why do you have digital shadow? it's freaking me out man.

@Dartneige 4 жыл бұрын

Best comment ever!

@niazhimselfangels 7 жыл бұрын

56:40 You mention that the input is just the pixels. But shouldn't the algorithm be aware of the 'score' and the 'lives remaining' parameters? Here if the algorithm is just fed the pixels, without passing these values as integers, wouldn't it have to decipher the image like an OCR and build up an understanding of these visually represented parameters?

@posie. 2 жыл бұрын

Amazing!

@markjing9124 7 жыл бұрын

thank you

@ankishbansal420 6 жыл бұрын

Thanks for sharing such a great lecture. But i stuck here at 45.13 where in the Atari game we have 4 image to decide Q parameter and we have dimension of each image as H * W and each pixel represent 256 level (gray scale) and then total size would be 256 * H * W * 4. But how there are 256^(H * W * 4) rows in the Q table. please anyone can explain?

@kaizhou7331 7 жыл бұрын

Hi, it seems the link to the slides of the lecture does not work. I am wondering if you could provide the new links? Thks

@hellyesOo 7 жыл бұрын

Thanks for sharing your lecture, Just the web site is not working

@aryankumarn5323 4 жыл бұрын

Can anyone tell me the step by step map for learning machine learning? I am a beginner ,I have just completed python programming and have done some small project. Please help me ,I don't know where to start

@scadasystem3790 7 жыл бұрын

good stuff.. thanks

@Jeff-KK 3 жыл бұрын

Lex, you look like a kid here! Are you sure this was only 4 years ago?

@YourFriend961 7 жыл бұрын

What does "observe initial state s", or "observe reward r and new state s' " exactly mean in the Q-learning algorithm's pseudocode?

@LeoVerto 7 жыл бұрын

This is pretty nitpicky but doesn't the size of the Q-table mentioned around 45:00 assume that for the consecutive images, any image can follow any other image, which is obviously not true in the context of the game? Obviously this doesn't practically mean anything as it's still a huge number. Nevertheless this should not dissuade from the fact that I do really enjoy following this course.

@lexfridman 7 жыл бұрын

@Leo The part you said that is "obvious" is actually the thing we're trying to learn. The biggest challenge for general AI is to learn (without prior expert-defined knowledge) the things that seem obvious to us as humans.

@deeplearningpartnership 6 жыл бұрын

Nice talk.

@bikrambaruah8914 7 жыл бұрын

Can someone please explain what is the input to the algorithm? Is it just one snapshot of the game , or multiple snapshots taken when humans are playing it, or a video of a human playing it?

@NealOGrady 3 жыл бұрын

Just started watching this series and realized the game is long taken down :'(

@박지우-e1s 7 жыл бұрын

@15:48 Can anybody explain deeply why we only need 4 outputs to represent 0 to 9?

@lexfridman 7 жыл бұрын

Sure. You can represent 0 through 9 in binary using 4 digits like this: 0000 = 0 0001 = 1 0010 = 2 0011 = 3 0100 = 4 0101 = 5 0110 = 6 0111 = 7 1000 = 8 1001 = 9 More examples here: l3d.cs.colorado.edu/courses/CSCI1200-96/binary.html

@sahidbarbar5130 7 жыл бұрын

Karel the robot 2.0, interesting.

@mshahzaib4195 7 жыл бұрын

Hey, it is my first time with deep learning or with any kind of machine learning. I am lost at how am i supposed to do the deeptraffic project. The lecture didn't actually cover how am i supposed to train / make the neural network. Can someone point out to some resource from which i can learn how to to actually do it in practical. Should i learn ConvJs first from somewhere ?

@RobetPaulG 7 жыл бұрын

So SUBMISSIONS TO THE CONTEST ENDED FOUR DAYS AGO!!! And you just put up the video of how to submit. selfdrivingcars.mit.edu/leaderboard/ Submissions toward the competition will end 11 AM Friday 1/20/17

@lexfridman 7 жыл бұрын

The first iteration of the contest was just for the students who attend the class in person. We'll do more future iterations. Register on the website for updates. It's free and open.

@RobetPaulG 7 жыл бұрын

Can you get to 75 mph by only changing lanesSide, patchesAhead, patchesBehind and num_neurons? I got to 73 mph by tuning those, but if it is many hours of tuning to squeeze out the extra two MPH, I will move on to the next topic. Thank you for the putting these lectures up and the website. I find it very rewarding and fun.

@zamazalotta 7 жыл бұрын

I got 70 mph with this code, it's not AI but gives you a base(dump)line :) lanesSide = 0; patchesAhead = 8; patchesBehind = 0; learn = function (state, lastReward) { if((state[1]+state[4])/2>0.99){ //car ahead/stuck? action = 1; // no? then floor it } else{ action = 3+Math.round(Math.random()); // yes? steer randomly left/right } return action; }

@bamoida42 7 жыл бұрын

Thank you for making the contest available to the public. The lecture is already great, but the contest is what motivated me to sit down and actually start doing it. Out of personal interest, do you plan on publishing statistics about how outsiders participated?

@lexfridman 7 жыл бұрын

Thanks! Yes, I'm working on a paper that includes those statistics. There are few interesting insights in those numbers. Stay tuned.

@jonathanplasky5996 7 жыл бұрын

SO... does the value of Q learning get stored in an array? How does it draw from previous actions?

@odedwolff3878 7 жыл бұрын

thanks for lecture, it is very interesting. i was wondering about Q learning: if the world is deterministic, like in Atari, then we get always the same reward for the same (s,a) pair, right? so is the old estimate of Q being factored in for worlds where the reward for given(s,a) varies with time (or is it also the case with atari)?

@denisf.7409 7 жыл бұрын

Thank you!

@sumanthnandamuri2168 7 жыл бұрын

At 28:36 How is it the optimal policy ? I am unable to understand, someone please explain.

@debjyotibiswas3793 7 жыл бұрын

Can someone tell me why are we doing this in the browser? Is the training happening in cloud or on the local system. What is the logic of using browser?

@lashinakimkulov3796 7 жыл бұрын

I am only on lecture 2 so I may be missing sonethin but to my best knowledge and guess - the HTML allows the project reviewers to see your code's output without having to run the code.

@Ghanzo 7 жыл бұрын

This is fascinating, and I'd like to learn how to use the code provided on the website. Someone please make a tutorial video and link me to it. Or send me a message so I can ask you some questions. Would be more than grateful to receive a message from someone!

@murtazaburhani4022 3 жыл бұрын

Thankyou 🙏🙏🙏

@imvivekawasthi 7 жыл бұрын

Sometimes i find some concepts difficult to understand..

@beattoedtli1040 3 жыл бұрын

Can you please use bike instead of cars? Car's are polluting, outdated, harmful means of transport.

@therealalexjones3449 3 жыл бұрын

Dude what? Bikes aren’t outdated tho???🤣

@beattoedtli1040 3 жыл бұрын

@@therealalexjones3449 They are the future. Almost CO2-neutral. What did you have in mind?

@therealalexjones3449 3 жыл бұрын

@@beattoedtli1040 bikes are almost as outdated as horses bud

@beattoedtli1040 3 жыл бұрын

@@therealalexjones3449 you may heat up the planet by 2+ degrees, or we start understanding that burning carbon is not the way forward. No, bikes are appropriate in cities, and trains outside of them. Cars and planes are primitive, nonrenewable means of transport. This might be a new perspective to you, but just because you know the vehicle does not mean we understand tomorrow's means of transport.we have 15 Motel years till Carbon emissions must be at zero. So better start thinking anew.

@therealalexjones3449 3 жыл бұрын

@@beattoedtli1040 let me know when u join the rest of us in reality

@OEFarredondo 5 жыл бұрын

28 people don’t have email

@csankar69 7 жыл бұрын

Very poor and hand wavy explanation on some of the key concepts. Not at all clear what the deep Q learning loss function is and why it is chosen that way and how it is evaluated. The instructor seems to assume that you have a pretty good basic understanding of the content talked about in the lecture