man, you video is better than my 3 hour lecture at uni. finally i find a video that go straight to the key point
@cambridgebreaths35813 жыл бұрын
This episode is utterly amazing. Please make a comprehensive complete series of every essential concept in data Science, for absolute beginnes. Truly brilliant explanation. Thank you kindly
@ritvikmath3 жыл бұрын
Thanks!
@barkinkaratay79513 жыл бұрын
man this must be the best explanation on gradient descent I found so far. thank you very much
@amithnambiar98183 жыл бұрын
Ritvik this is so well explained ! Thank you so much man.
@ritvikmath3 жыл бұрын
of course! thanks for watching!
@MrMoore03123 жыл бұрын
Excellent! Great new twist on standard GD, love the practical objective-oriented decision making process explained with math and code!! Thanks!
@rahulahuja14123 жыл бұрын
Very helpful. Would love to see another video that incorporates advanced concepts like momentum etc. Thanks a lot for your videos. They have helped me enhance my understanding of complex concepts.
@ritvikmath3 жыл бұрын
Great suggestion! And thanks!
@andrewjolly3193 жыл бұрын
This is really great, and having the notebooks to follow along with is so helpful. I'm at the start of my PhD in physics and we got taught basically zero stats and now suddenly everything is gradient descent and MCMC and despite other textbooks/youtube videos I've not been able to get my head around it until now.
@DidierDurand6 ай бұрын
Great explanations: very easy to understand!
@JO-vj9kn2 жыл бұрын
Thanks for a great tutorial! As many others I was stuck with cryptic lecture notes by a genius but not very pedagogic professor and this type of material is way more approachable.
@richardqualis4780 Жыл бұрын
Very well explained!!!
@katieangelopoulos47064 ай бұрын
These 10 minutes were worth more than 3 hours of lectures in a masters.
@steve81802 жыл бұрын
dude thanks for this, it was really really well explained
@amnont8724 Жыл бұрын
6:05 the minus is given that we want to find a minimum point right? Because if the gradient is negative, we want to move forward toward the minimum, and if it's positive, we want to move backwards to find the minimum. So if we wanted to find a maximum, we should've turned the minus sign to a plus?
@otmarjr3 жыл бұрын
I feel so luck I found your content! You've got a serious talent to explain stuff. In the future could you please shoot a video recomment books and courses for someone interested in becoming a data scientist ?
@sairaj6875 Жыл бұрын
Thank you so much. This was a really nice demonstration.
@aniket19832 жыл бұрын
Thank you so much...you are the only one clarified about defining own loss function...I was stuck on if there is only L1 and L2 loss why bother as all gradiant discent will be same...so thanks a ton for the clarity...
@EW-mb1ih3 жыл бұрын
of course we want videos on how the step of the descent is obatined and how you treat not convex function :)
@martand_053 жыл бұрын
Nicely explained bro🔥🔥
@ritvikmath3 жыл бұрын
Thanks!
@chacmool25813 жыл бұрын
I was able to, more or less, follow this, and I haven't even gotten started with my data science studies!
@ritvikmath3 жыл бұрын
Nice!
@chacmool25813 жыл бұрын
@@ritvikmath My engineering background and those punishing four semesters of calculus (integral, differential, diff eqs, multivariate) are still alive in my aging brain. How much advanced calculus do you really need to know, though? I mean, max/min problems and differentiation is the stuff that high school seniors tackle in Advanced Placement courses.
@LimeObeans3 жыл бұрын
do one with adaptive momentum and one where you have no function or derivative please!! u rock
@ragstoriches919 Жыл бұрын
Amazing video🎉
@ritvikmath Жыл бұрын
Thanks 😁
@othsaj14672 жыл бұрын
Amazing vid 💯
@chunchen34503 жыл бұрын
Any topics on adding weights to time series analysis? For instance higher weights to more recent observations, to get a better forecasting? Is this often used in practice?
@ritvikmath3 жыл бұрын
I've definitely seen cases of higher weights for more recent examples
@danielwiczew3 жыл бұрын
@ritvikmath Are you planning maybe an introduction to Cholesky, LU and SVD matrix decompositions ?
@ritvikmath3 жыл бұрын
Good suggestion!
@srinivasanbalan24693 жыл бұрын
Good content. Thanks
@ritvikmath3 жыл бұрын
Thanks!
@rutu.dances.to.express Жыл бұрын
Great intuition to play with loss function in the 2nd part of the video. I had a small doubt here.. The idea of adding e^i to the loss function is to give more weights to smaller values of x right?? What if in a case, we have multiple input features (x1,x2,x3... and so on)- In such a case will it indicate that we care about predicting accurately for smaller values of x1,x2,x3... (irrespective of high or low values in a particular record..? for e.g. one single record can have high value of x1, where as very low value of x2.. and so on. ) Or since you've used the variable i in e^i, does it indicate row numbers 1,2,3,... (initial rows) will get more weight than further rows?
@ihgnmah3 жыл бұрын
Thank you for the vid. I just wonder what kind of loss function would take care of the data points in the curve rather than the small cluster below.
@areskapoor13712 жыл бұрын
Interesting video! How did you choose what model to use for the data initially, and also for the second part why did you choose e specifically for e^-i? Is this a common mathematical thing to do?
@buckylc93453 жыл бұрын
Thank you so much for explaining a couple of concepts I didn't find in other more popular videos! Is there a cost function we can use to effectively treat the left lower corner as outliers?
@houyao21473 жыл бұрын
Does this mean we should identify those outliers first and then remove them before training?
@user-or7ji5hv8y3 жыл бұрын
but don't you also derive the loss function using MLE? or did I get that part wrong with something else?
@aRoryBorealis9 ай бұрын
Thank you so much
@ritvikmath9 ай бұрын
You're most welcome
@muhammadal-qurishi71103 жыл бұрын
can you please explain CRF
@ritvikmath3 жыл бұрын
Great suggestion!
@adaloveless39042 жыл бұрын
I'm dumb to this stuff... you lost me at "derivative of the loss function with respect to W"... all of a sudden the X dimension comes into play... what happens in a neural network where there are many dimensions?