Sir you are a student from lalpani school shimla. You were the topper in +2. I am very happy for you. You have reached at a level where you truly belonged to. I wish you more and more success.
@ASHISHDHIMAN1610 Жыл бұрын
I am from Nahan, and I’m watching this from Ga Tech :)
@schobihh2703 Жыл бұрын
MIT is simply the best teaching around. Really deep insights again. Thank you.
@sukhjinderkumar27232 жыл бұрын
Hands Down one of the most intersting lectures, The way Professor showed reseach ideas here and there and almost everywhere just blows me away, It was very very intersting, and best part is it is afforable to non-Math guys too, (thought its coming from a maths guy, however I feel like math part of very little, it was more towards intuitive side of SGD)
@Vikram-wx4hg3 жыл бұрын
What a beautiful beautiful lecture! Thank you Prof. Suvrit!
@trevandrea89093 ай бұрын
I love the way the professor teaches in this lecture and video. Thank you so much!
@minimumlikelihood6552 Жыл бұрын
That was the kind of lecture that deserved applause!
@RAJIBLOCHANDAS2 жыл бұрын
Really extraordinary lecture. Very lucid but highly interesting. My research is on 'Adaptive signal processing'. However, I enjoyed this lecture most. Thank you.
@rogiervdw4 жыл бұрын
This is truly remarkabe teaching. Greatly helps understanding and intuition of what SGD actually does. prof. Sra's proof of SGD convergence for non-convex optimization is in prof. Strang's excellent book "Linear Algebra & Learning From Data", p.365
@georgesadler78303 жыл бұрын
Professor Suvrit Sra, thank for a beautiful lecture on Stochastic Gradient Descent and it's impact on machine learning. This powerful lecture help me understand something about machine learning and it's overall impact on large companies.
@BorrWick4 жыл бұрын
i think there is a very small mistake in the graph of (a_i*x-b)^2. The confusion area is bound is not a_i/b_i but b_i/a_i
@tmusic992 жыл бұрын
Thank you for an excellent lecture! Give me a clear track for development.
@holographicsol27472 жыл бұрын
Thank you, you are an excellent teacher and I learned, thank you
@3g19914 жыл бұрын
Anyone have the proof he didn't have time for regarding stochastic gradient in non-convex case.
@nayanvats34245 жыл бұрын
couldn't have been better....great lecture.... :)
@jfjfcjcjchcjcjcj99474 жыл бұрын
Very clear and nice, to the point.
@taasgiova81902 жыл бұрын
Fantastic, excellent lecture thank you.
@NinjaNJH4 жыл бұрын
Very helpful, thanks! ✌️
@grjesus9979 Жыл бұрын
So, when using tensorflow or keras, when you set batch size = 1, there is as many iterations as samples in the entire training dataset. So my question is where is the random in "stochastic" gradient descent coming from?
@cevic21912 жыл бұрын
Many thanks Great!!!
@josemariagarcia93225 жыл бұрын
Simply brilliant
@haru-17882 жыл бұрын
Marvellous!!!
@SHASHANKRUSTAGII3 жыл бұрын
Andrew NG didn't explain it in this detail That is why MIT is MIT, Thanks professor.
@JTFOREVER263 жыл бұрын
Can anyone here care to explain how in the example in one dimension, when choosing a scalar outside R it grants that the stochastic gradient and the full gradient has the same sign? (corresponding to 30:30 - 31:00 ish in the video) Thanks in advance!
@ashrithjacob4701 Жыл бұрын
Since f(x) can be thought of as a sum of quadratic functions ( each function corresponding to one data point) with a minima at bi/ai. When we are outside the region R, then the minima of all the functions lies on the same side to where we are and as a result all their gradients have the same sign
@ac2italy4 жыл бұрын
He cited images as an example for large feature set : nobody use standard ML for images, we use Convolution.
@elyepes193 жыл бұрын
I understand he is referring to Convolutional Neural Networks as a tool for image analysis as a generalized example
@elyepes193 жыл бұрын
For those of us who are newcomers in ML, it's most enlightening to know that unlike "pure optimization" that aims to find the most exact minimum possible, ML aims instead to be "close enough" to the minimum in order to train the ML engine, if you get " too close" to the minimum an over-fit of your training data might occur. Thank you so much for the clarification
@rembautimes88083 жыл бұрын
Amazing for MIT to make such high quality lectures available worldwide. Well worth time investment to go thru these lectures. Thanks Prof Strang & Prof Suvrit & MIT
@akilarasan3288 Жыл бұрын
I would use MCMC to compute n sum to answer 14:00
@gwonchanyoon77484 ай бұрын
beautiful class room!
@cobrasetup7032 жыл бұрын
Amazing lecture, i am delighted by the smooth explanation of this complex topic! Thanks
@MohanLal-of8io4 жыл бұрын
what GUI software professor Suvrit is using to change the step size instantly?
@brendawilliams80623 жыл бұрын
I don’t know but it would have to transpose numbers of a certain limit it seems to me.
@anadianBaconator4 жыл бұрын
this guy is fantastic!
@benjaminw.283810 ай бұрын
Amazing class!!!!!!!!!!!! not only for ML researchers but also for ML practitioners.
@notgabby604 Жыл бұрын
Very nice lecture. I will seeming go off topic here and say that an electrical switch is one-to-one when on and zero out when off. When on 1 volt in gives 1 volt out, 2 volts in gives 2 volts out etc. ReLU is one-to-one when its input x is >=0 and zero out otherwise. To convent a switch to ReLU you just need a attached switching decision x>=0. Then a ReLU neural networks is composed of weighted sums that are connected and disconnected from each other by the switch decisions. Once the switch states are known then you can simplify the weighted sum composits using simple linear algebra. Each neuron output anywhere in the net is some simple weighted sum of the input vector. AI462 blog.
@TrinhPham-um6tl3 жыл бұрын
Just a litte typo that I came across throught out this perfect lecture is the "confusion region": min(a_i/b_i) and max (a_i/b_i) should be min(b_i/a_i) and max (b_i/a_i). Generally speaking, this lecture is the best explanation on SGD I have ever seen. Again, thank you prof. Sra and thank you MITOpenCourseWare so so much 👍👏 P/s: Any other resources that I've read explained SGD so complicatedly 😔
@neoneo15032 жыл бұрын
"shuffle" in practice or "random pick" in theory on 42:00
@scorpio197711112 жыл бұрын
Good lecture. Intuitive explanations with specific illustrations
@brendawilliams80623 жыл бұрын
It appears that from engineering math view that there’s the problem.
@pbawa20032 жыл бұрын
This is Gr lecture though took me little time to prove the gradient descent lies in range of region of confusion with min and max been individual sample gradients
@BananthahallyVijay2 жыл бұрын
Wow! That was one great talk. Prof. Suvrit Sra's done a great job in giving examples just light enough to drive the key ideas of SGD.
@vinayreddy86834 жыл бұрын
Prof assumed all the variables are scalars so, while moving loss towards down hill or local minimum; how does loss function is guided to minimum without any directions (scalar property)
@watcharakietewongcharoenbh69632 жыл бұрын
How can we find his 5 lines proof of why SGD works? It is fascinating.
@fishermen7085 жыл бұрын
Great.
@kethanchauhan94185 жыл бұрын
what is the best book or resource to learn the whole mathematics behind stochastic gradient descent?
@mitocw5 жыл бұрын
The textbook listed in the course is: Strang, Gilbert. Linear Algebra and Learning from Data. Wellesley-Cambridge Press, 2019. ISBN: 9780692196380. See the course on MIT OpenCourseWare for more information at: ocw.mit.edu/18-065S18.
@brendawilliams80623 жыл бұрын
Does this view and leg of math believe there is an unanswered Reiman hypothesis?
@hj-core11 ай бұрын
An amazing lecture!
@KumarHemjeet3 жыл бұрын
What an amazing lecture !!
@rababmaroc33544 жыл бұрын
well explained, thank you very much professor
@robmarks68002 жыл бұрын
Leaving the proof as a cliffhanger, almost worse than Fermat…
@papalau6931 Жыл бұрын
You can find the proof by Prof. Survit Sra from Prof. Gilbert Strang's book titled "Linear Algebra and Learning from Data".
@xiangyx3 жыл бұрын
fantastic
@sadeghadelkhah63102 жыл бұрын
10:31 the [INAUDIBLE] thing is "Weight".
@mitocw2 жыл бұрын
Thanks for the feedback! The caption has been updated.
@fatmaharman38424 жыл бұрын
excellent
@tuongnguyen93912 жыл бұрын
Where can I obtain professor sra's slide ?
@mitocw2 жыл бұрын
The course does not have slides of the presentations. The materials that we do have (problem sets, readings) are available on MIT OpenCourseWare at: ocw.mit.edu/18-065S18. Best wishes on your studies!
@tuongnguyen93912 жыл бұрын
@@mitocw Thank you, I think I gues I just noted everything down
@shivamsharma88745 жыл бұрын
please share slides of this lecture.
@mitocw5 жыл бұрын
It doesn't look like there are slides available. I see a syllabus, instructor insights, problem sets, readings, and a final project. Visit the course on MIT OpenCourseWare to see what materials we have at: ocw.mit.edu/18-065S18.
@vinayreddy86834 жыл бұрын
Take a screenshots and prepare it by yourself!!!
@Tevas255 жыл бұрын
A link to the Matlab simulation prof Suvrit shows would be great