CS231n Winter 2016: Lecture 3: Linear Classification 2, Optimization

  Рет қаралды 143,472

Andrej Karpathy

Andrej Karpathy

Күн бұрын

Пікірлер: 77
@KhueLe
@KhueLe 9 жыл бұрын
A big thank to Andrej and the teaching staff for uploading the videos. They are of high quality and really make the class more interesting to follow.
@jenishah9825
@jenishah9825 Жыл бұрын
This content here, is GOLD.
@leixun
@leixun 4 жыл бұрын
*My takeaways:* 1.Multiclass SVM loss 4:15 2. Regularization 20:04: e.g. L2, L1 3. Softmax classifier 29:40 4. Optimization 50:10: gradient descent 4.1 Mini-batch gradient descent 1:00:40 4.2 Optimizers 1:05:00: SGD, momentum, Adagrad, RMSProp, Adam
@sezaiburakkantarci
@sezaiburakkantarci 10 ай бұрын
You are one of the best Andrej. You make learning so fun, with moments like 27:45 😄 Forever grateful.
@njordeggen4209
@njordeggen4209 2 жыл бұрын
The best thing about the internet is access to great information and education like this series. Thank you Andrej and everyone involved!
@NehadHirmiz
@NehadHirmiz 2 жыл бұрын
These lectures are amazing. I still go back to review these every few years. Thank you sir.
@bayesianlee6447
@bayesianlee6447 6 жыл бұрын
Not only his lecture itself but his intelligence level and interpretation level on DL is quite interesting. Thanks for sharing and grateful for classes.
@Shkencetari
@Shkencetari 5 жыл бұрын
Thank you very much for this great lecture. Just to help others out, optimization starts at 50:11
@ralphblanes8370
@ralphblanes8370 9 жыл бұрын
Andrej's a great teacher! Keep up the good work!
@XTLi-xb8iv
@XTLi-xb8iv 9 жыл бұрын
Thanks for uploading the video online, it really helps!
@geoffreyanderson4719
@geoffreyanderson4719 8 жыл бұрын
Instructor's losses math is messed up at 8m31s. For frog colum the loss should be 12.9 but instructor says total loss of frog column is 10.9: In [10]: max(0, 2.2 - (-3.1)+1) # cat row frog column Out[10]: 6.300000000000001 Etc.
@yoniker83
@yoniker83 8 жыл бұрын
Yeah I was about to say that when I saw your comment!
@questforprogramming
@questforprogramming 4 жыл бұрын
Good observation
@Kerphelio01
@Kerphelio01 Жыл бұрын
Ah, thank you - I was going nuts over this. (I have a history of messing up trivial arithmetic, so I have zero self confidence).
@pauldacus4590
@pauldacus4590 6 жыл бұрын
17:00 starts a good discussion of regularization...
@Siwon-vv5mi
@Siwon-vv5mi Жыл бұрын
At 39:00, what did he mean by jiggling the scores?
@atrenta123
@atrenta123 5 жыл бұрын
The calculation for frog class at 8:32 is wrong. The sum should be 12.9
@michaelgomezchen7718
@michaelgomezchen7718 4 жыл бұрын
I think he forgot to sum the +1
@vaibhavravichandran
@vaibhavravichandran 7 жыл бұрын
Thank you for the lecture series. They've helped me understand NNs much much better.
@joehan4255
@joehan4255 9 жыл бұрын
On slide 11, shouldn't max(0,5.3) + max(0,5.6) be max(0,6.3) + max(0,6.6)?
@poyrazus
@poyrazus 8 жыл бұрын
+Joe Han You're right, it should be.
@atrenta123
@atrenta123 5 жыл бұрын
yes you're right
@MauriceMauser
@MauriceMauser 7 жыл бұрын
[33.17] (-)0.84 is log base 10. assume, we want the natural log: - ln(0.13) = 2.04 ?
@edisonchen7682
@edisonchen7682 6 жыл бұрын
yup, how come it's 0.89? i dont get it
@zahrasoltani8630
@zahrasoltani8630 5 жыл бұрын
Thanks Andrej for the nice lecture. Just one point in the early minutes o the lecture, the car score is good but it is mentioned as bad. the score is high which means high probability so it is not a bad score.
@kemalware4912
@kemalware4912 Жыл бұрын
Thanks internet
@drdrsh
@drdrsh 8 жыл бұрын
There is a very interesting artifact around Andrej Head, the symbols on the white board seem to jiggle around as his head moves. Warping the space-time somehow?
@deepuraveendran1903
@deepuraveendran1903 7 жыл бұрын
Loss Function of Frog at 8.36 is 12.9 not 10.9, forget to add +1
@deanfang8673
@deanfang8673 8 жыл бұрын
Always the best class~
@uncoded0
@uncoded0 3 жыл бұрын
Around 35:00 to 37:00 he's using log base 10.
@ocixot94
@ocixot94 4 жыл бұрын
Min 14:00 I did not get why the loss at the very beginning is number of classes - 1
@lizardking640
@lizardking640 4 жыл бұрын
because we have a weights close to 0 -> score close to 0 -> max(0, close_to_zero+1) gives us 1. Then for all classes we sum scores over each class **but one**, thats youre -1
@ocixot94
@ocixot94 4 жыл бұрын
@@lizardking640 Thank you so much!
@bornslippy9109
@bornslippy9109 6 жыл бұрын
@24:30 hows (4,1).(1,4) matrix result in 1?
@completelyboofyblitzed
@completelyboofyblitzed 5 жыл бұрын
I'm sorry, but where's this link for interactive examples? I really want to see 🙏
@walk-in-the-woods
@walk-in-the-woods 4 жыл бұрын
vision.stanford.edu/teaching/cs231n-demos/linear-classify/
@AvielLivay
@AvielLivay Жыл бұрын
37:49 they ask professor why do we look for the minimum of the -log(p) instead of just looking directly for the maximum of p. He says that it makes the math more convenient but I thought the reason was that you want to maximize the joint probability p1*p2*p3*…*pN for your N samples. So instead of maximing a multiplication, you maximize the sum of the logs: log(p1)+log(p2)+log(p3)+… +log(pN).
@ShangDaili
@ShangDaili 7 жыл бұрын
I looked at the notes, and I still don't understand why analytical gradient is prone to errors. Can somebody explain? Thanks.
@KartikSubbarao
@KartikSubbarao 7 жыл бұрын
Given that f() already exists, the numerical gradient simply involves iterating over each of the weights w[i] and computing ( f(w[i] + h) - f(w[i]) ) / h -- see slide 3-56. The key is that you don't have to know anything about the internals of f() in order to write eval_numerical_gradient(). So there isn't much room for human error. Whereas the analytical gradient involves delving into the internals of f() and *manually* applying the chain rule to get the analytic expression for the partial derivative for each weight in the network. When you program a function to evaluate the analytic gradient, you have to use these analytic expressions that you hand-created to output the gradient. That's why this approach is more error prone -- there is plenty of room for human error when manually calculating partial derivatives. In theory, if you expressed f() in such a way that you could use Mathematica or some other tool to programmatically compute the partial derivatives, then these sorts of human errors could be potentially eliminated.
@aashudwivedi
@aashudwivedi 7 жыл бұрын
it's possible that the function is not differentiable at some points, and analytical gradient still gives you a gradient because it's just comparing the value at x with the value at x + delta(h)
@lifeisbeautifu1
@lifeisbeautifu1 Жыл бұрын
I love you Andrej ❤
@EurekaAILabs
@EurekaAILabs 12 күн бұрын
Can someone explain what's happening in 39 mint??
@Enliden
@Enliden 7 жыл бұрын
Why is the analytic gradient error-prone? Because it is theoretically deduced? I mean, the point of mathematics is proving correctness of analytic results..
@onewinter1900s
@onewinter1900s 8 жыл бұрын
incredible lecture...love it
@ThienPham-hv8kx
@ThienPham-hv8kx 3 жыл бұрын
Summary: Loss function let us know how well the randomized weight we choose going ? (Bad , good , is going bad, is going good). To minimize the loss , we will try to press it to ZERO as much as possible in whole dataset (maybe million samples), look like a blind person 👨‍🦯 try going down hill ⛰️ and on each step he make sure that he is on right way of going down.
@serhangul
@serhangul 9 жыл бұрын
Great lectures, thanks!
@notanape5415
@notanape5415 4 жыл бұрын
Historical drastic differentiation since 2012 - Rather than cherry-picking the features of a class and making a singular feature vector from it, we train our learning algorithms for each pixel and thus generalizes everything.
@anibaloal
@anibaloal 8 жыл бұрын
I don't fully understand Mini-batch Gradient Decent. I wish someone can help me. Lets say that we are working with images, so you send a Mini-batch of images to the training. It is computed a single loss and a single gradient for each and then we take the average or something similar, or somehow you can compute a total loss and a total gradient for the complete set of images.
@anibaloal
@anibaloal 8 жыл бұрын
+Aníbal Sigüenza Ok update I think I know the answer apparently you can sum or average the loss, and it is more used the mean so it is not dependent of the Mini-batch size.
@carlosvazquez8984
@carlosvazquez8984 7 жыл бұрын
what is the textbook used in this class?
@aashudwivedi
@aashudwivedi 7 жыл бұрын
I don't think there's a recommended textbook, I'm actually finding the lecture notes more than enough
@rishabhbhardwaj6174
@rishabhbhardwaj6174 6 жыл бұрын
Are the lecture notes posted somewhere?
@markroxor
@markroxor 6 жыл бұрын
cs231n.github.io
@jonathanr4242
@jonathanr4242 2 жыл бұрын
Did you hear about the mathematician mountain climber? He got stuck at a local maximum
@mikeba3809
@mikeba3809 2 ай бұрын
Andrej is so good
@jenishah9825
@jenishah9825 7 жыл бұрын
Can anyone explain about the loss at initialization? If they are nearly 0, how do we get n-1?
@BryanChianghau
@BryanChianghau 7 жыл бұрын
The differences between all the scores of the not-true classes and the score of the correct class will be around 0. But a margin of 1 is added to each difference of the n-1 not-true classes for n classes in total, the final loss for each example will be n-1 => full loss = n-1.
@ekandrot
@ekandrot 7 жыл бұрын
Except for the columns that are correct and therefore zero. So we will be averaging columns of zero and columns of n-1, which means it will be around n-1 with small values of W.
@dingguodong5840
@dingguodong5840 9 жыл бұрын
Does anybody know where I can find the slides?
@Aksahnsh
@Aksahnsh 9 жыл бұрын
cs231n.stanford.edu/syllabus.html
@dingguodong5840
@dingguodong5840 9 жыл бұрын
Akshansh Singh Thanks a lot!
@piewpok3127
@piewpok3127 Жыл бұрын
Day -2 . 3 lectures and counting...
@Sgoose105
@Sgoose105 5 жыл бұрын
What accent is his?
@Landonismo
@Landonismo 7 жыл бұрын
that gradient descent error manifold description tho...
@fengzhang8640
@fengzhang8640 9 жыл бұрын
hah good course!
@abhisheksanghani9680
@abhisheksanghani9680 7 жыл бұрын
Anyone in New York, going through these? We could team up! :)
@aashudwivedi
@aashudwivedi 7 жыл бұрын
I am going through this in India, Bangalore. Would love to team up over hangouts/github.
@bingeltube
@bingeltube 6 жыл бұрын
Recommendable, but still some inexperience in giving lectures shows
@zahrasoltani8630
@zahrasoltani8630 5 жыл бұрын
who cares, the lecture was quite clear and easy to understand. ( The most important thing you can't find in the lecture of the most EXPERIENCED lecturers.)
@zoriiginalx7544
@zoriiginalx7544 4 күн бұрын
He was a grad student, of course there was some inexperience. Nevertheless, he did a fantastic job
CS231n Winter 2016: Lecture 4: Backpropagation, Neural Networks 1
1:19:39
Andrej Karpathy
Рет қаралды 303 М.
Гениальное изобретение из обычного стаканчика!
00:31
Лютая физика | Олимпиадная физика
Рет қаралды 4,8 МЛН
Mom Hack for Cooking Solo with a Little One! 🍳👶
00:15
5-Minute Crafts HOUSE
Рет қаралды 23 МЛН
Lecture 1 | Natural Language Processing with Deep Learning
1:11:41
Stanford University School of Engineering
Рет қаралды 781 М.
Nuts and Bolts of Applying Deep Learning (Andrew Ng)
1:19:48
Lex Fridman
Рет қаралды 385 М.
The Dome Paradox: A Loophole in Newton's Laws
22:59
Up and Atom
Рет қаралды 1,1 МЛН
The spelled-out intro to neural networks and backpropagation: building micrograd
2:25:52
Deep Learning for Computer Vision (Andrej Karpathy, OpenAI)
1:25:17
Lex Fridman
Рет қаралды 173 М.
CS231n Winter 2016: Lecture 5: Neural Networks Part 2
1:18:38
Andrej Karpathy
Рет қаралды 185 М.
What is mathematical thinking actually like?
9:44
Benjamin Keep, PhD, JD
Рет қаралды 25 М.
But what is a neural network? | Deep learning chapter 1
18:40
3Blue1Brown
Рет қаралды 18 МЛН
Lecture 14 | Deep Reinforcement Learning
1:04:01
Stanford University School of Engineering
Рет қаралды 373 М.
Stanford CS25: V2 I Introduction to Transformers w/ Andrej Karpathy
1:11:41