25. Stochastic Gradient Descent

Рет қаралды 85,375

Күн бұрын

Пікірлер: 72

@JatinThakur-dv7mt Жыл бұрын

Sir you are a student from lalpani school shimla. You were the topper in +2. I am very happy for you. You have reached at a level where you truly belonged to. I wish you more and more success.

@ASHISHDHIMAN1610 Жыл бұрын

I am from Nahan, and I’m watching this from Ga Tech :)

@schobihh2703 Жыл бұрын

MIT is simply the best teaching around. Really deep insights again. Thank you.

@sukhjinderkumar2723 2 жыл бұрын

Hands Down one of the most intersting lectures, The way Professor showed reseach ideas here and there and almost everywhere just blows me away, It was very very intersting, and best part is it is afforable to non-Math guys too, (thought its coming from a maths guy, however I feel like math part of very little, it was more towards intuitive side of SGD)

@Vikram-wx4hg 3 жыл бұрын

What a beautiful beautiful lecture! Thank you Prof. Suvrit!

@trevandrea8909 3 ай бұрын

I love the way the professor teaches in this lecture and video. Thank you so much!

@minimumlikelihood6552 Жыл бұрын

That was the kind of lecture that deserved applause!

@RAJIBLOCHANDAS 2 жыл бұрын

Really extraordinary lecture. Very lucid but highly interesting. My research is on 'Adaptive signal processing'. However, I enjoyed this lecture most. Thank you.

@rogiervdw 4 жыл бұрын

This is truly remarkabe teaching. Greatly helps understanding and intuition of what SGD actually does. prof. Sra's proof of SGD convergence for non-convex optimization is in prof. Strang's excellent book "Linear Algebra & Learning From Data", p.365

@georgesadler7830 3 жыл бұрын

Professor Suvrit Sra, thank for a beautiful lecture on Stochastic Gradient Descent and it's impact on machine learning. This powerful lecture help me understand something about machine learning and it's overall impact on large companies.

@BorrWick 4 жыл бұрын

i think there is a very small mistake in the graph of (a_i*x-b)^2. The confusion area is bound is not a_i/b_i but b_i/a_i

@tmusic99 2 жыл бұрын

Thank you for an excellent lecture! Give me a clear track for development.

@holographicsol2747 2 жыл бұрын

Thank you, you are an excellent teacher and I learned, thank you

@3g1991 4 жыл бұрын

Anyone have the proof he didn't have time for regarding stochastic gradient in non-convex case.

@nayanvats3424 5 жыл бұрын

couldn't have been better....great lecture.... :)

@jfjfcjcjchcjcjcj9947 4 жыл бұрын

Very clear and nice, to the point.

@taasgiova8190 2 жыл бұрын

Fantastic, excellent lecture thank you.

@NinjaNJH 4 жыл бұрын

Very helpful, thanks! ✌️

@grjesus9979 Жыл бұрын

So, when using tensorflow or keras, when you set batch size = 1, there is as many iterations as samples in the entire training dataset. So my question is where is the random in "stochastic" gradient descent coming from?

@cevic2191 2 жыл бұрын

Many thanks Great!!!

@josemariagarcia9322 5 жыл бұрын

Simply brilliant

@haru-1788 2 жыл бұрын

Marvellous!!!

@SHASHANKRUSTAGII 3 жыл бұрын

Andrew NG didn't explain it in this detail That is why MIT is MIT, Thanks professor.

@JTFOREVER26 3 жыл бұрын

Can anyone here care to explain how in the example in one dimension, when choosing a scalar outside R it grants that the stochastic gradient and the full gradient has the same sign? (corresponding to 30:30 - 31:00 ish in the video) Thanks in advance!

@ashrithjacob4701 Жыл бұрын

Since f(x) can be thought of as a sum of quadratic functions ( each function corresponding to one data point) with a minima at bi/ai. When we are outside the region R, then the minima of all the functions lies on the same side to where we are and as a result all their gradients have the same sign

@ac2italy 4 жыл бұрын

He cited images as an example for large feature set : nobody use standard ML for images, we use Convolution.

@elyepes19 3 жыл бұрын

I understand he is referring to Convolutional Neural Networks as a tool for image analysis as a generalized example

@elyepes19 3 жыл бұрын

For those of us who are newcomers in ML, it's most enlightening to know that unlike "pure optimization" that aims to find the most exact minimum possible, ML aims instead to be "close enough" to the minimum in order to train the ML engine, if you get " too close" to the minimum an over-fit of your training data might occur. Thank you so much for the clarification

@rembautimes8808 3 жыл бұрын

Amazing for MIT to make such high quality lectures available worldwide. Well worth time investment to go thru these lectures. Thanks Prof Strang & Prof Suvrit & MIT

@akilarasan3288 Жыл бұрын

I would use MCMC to compute n sum to answer 14:00

@gwonchanyoon7748 4 ай бұрын

beautiful class room!

@cobrasetup703 2 жыл бұрын

Amazing lecture, i am delighted by the smooth explanation of this complex topic! Thanks

@MohanLal-of8io 4 жыл бұрын

what GUI software professor Suvrit is using to change the step size instantly?

@brendawilliams8062 3 жыл бұрын

I don’t know but it would have to transpose numbers of a certain limit it seems to me.

@anadianBaconator 4 жыл бұрын

this guy is fantastic!

@benjaminw.2838 10 ай бұрын

Amazing class!!!!!!!!!!!! not only for ML researchers but also for ML practitioners.

@notgabby604 Жыл бұрын

Very nice lecture. I will seeming go off topic here and say that an electrical switch is one-to-one when on and zero out when off. When on 1 volt in gives 1 volt out, 2 volts in gives 2 volts out etc. ReLU is one-to-one when its input x is >=0 and zero out otherwise. To convent a switch to ReLU you just need a attached switching decision x>=0. Then a ReLU neural networks is composed of weighted sums that are connected and disconnected from each other by the switch decisions. Once the switch states are known then you can simplify the weighted sum composits using simple linear algebra. Each neuron output anywhere in the net is some simple weighted sum of the input vector. AI462 blog.

@TrinhPham-um6tl 3 жыл бұрын

Just a litte typo that I came across throught out this perfect lecture is the "confusion region": min(a_i/b_i) and max (a_i/b_i) should be min(b_i/a_i) and max (b_i/a_i). Generally speaking, this lecture is the best explanation on SGD I have ever seen. Again, thank you prof. Sra and thank you MITOpenCourseWare so so much 👍👏 P/s: Any other resources that I've read explained SGD so complicatedly 😔

@neoneo1503 2 жыл бұрын

"shuffle" in practice or "random pick" in theory on 42:00

@scorpio19771111 2 жыл бұрын

Good lecture. Intuitive explanations with specific illustrations

@brendawilliams8062 3 жыл бұрын

It appears that from engineering math view that there’s the problem.

@pbawa2003 2 жыл бұрын

This is Gr lecture though took me little time to prove the gradient descent lies in range of region of confusion with min and max been individual sample gradients

@BananthahallyVijay 2 жыл бұрын

Wow! That was one great talk. Prof. Suvrit Sra's done a great job in giving examples just light enough to drive the key ideas of SGD.

@vinayreddy8683 4 жыл бұрын

Prof assumed all the variables are scalars so, while moving loss towards down hill or local minimum; how does loss function is guided to minimum without any directions (scalar property)

@watcharakietewongcharoenbh6963 2 жыл бұрын

How can we find his 5 lines proof of why SGD works? It is fascinating.

@fishermen708 5 жыл бұрын

Great.

@kethanchauhan9418 5 жыл бұрын

what is the best book or resource to learn the whole mathematics behind stochastic gradient descent?

@mitocw 5 жыл бұрын

The textbook listed in the course is: Strang, Gilbert. Linear Algebra and Learning from Data. Wellesley-Cambridge Press, 2019. ISBN: 9780692196380. See the course on MIT OpenCourseWare for more information at: ocw.mit.edu/18-065S18.

@brendawilliams8062 3 жыл бұрын

Does this view and leg of math believe there is an unanswered Reiman hypothesis?

@hj-core 11 ай бұрын

An amazing lecture!

@KumarHemjeet 3 жыл бұрын

What an amazing lecture !!

@rababmaroc3354 4 жыл бұрын

well explained, thank you very much professor

@robmarks6800 2 жыл бұрын

Leaving the proof as a cliffhanger, almost worse than Fermat…

@papalau6931 Жыл бұрын

You can find the proof by Prof. Survit Sra from Prof. Gilbert Strang's book titled "Linear Algebra and Learning from Data".

@xiangyx 3 жыл бұрын

fantastic

@sadeghadelkhah6310 2 жыл бұрын

10:31 the [INAUDIBLE] thing is "Weight".

@mitocw 2 жыл бұрын

Thanks for the feedback! The caption has been updated.

@fatmaharman3842 4 жыл бұрын

excellent

@tuongnguyen9391 2 жыл бұрын

Where can I obtain professor sra's slide ?

@mitocw 2 жыл бұрын

The course does not have slides of the presentations. The materials that we do have (problem sets, readings) are available on MIT OpenCourseWare at: ocw.mit.edu/18-065S18. Best wishes on your studies!

@tuongnguyen9391 2 жыл бұрын

@@mitocw Thank you, I think I gues I just noted everything down

@shivamsharma8874 5 жыл бұрын

please share slides of this lecture.

@mitocw 5 жыл бұрын

It doesn't look like there are slides available. I see a syllabus, instructor insights, problem sets, readings, and a final project. Visit the course on MIT OpenCourseWare to see what materials we have at: ocw.mit.edu/18-065S18.

@vinayreddy8683 4 жыл бұрын

Take a screenshots and prepare it by yourself!!!