CS231n Winter 2016: Lecture 3: Linear Classification 2, Optimization

Рет қаралды 143,472

Күн бұрын

Пікірлер: 77

@KhueLe 9 жыл бұрын

A big thank to Andrej and the teaching staff for uploading the videos. They are of high quality and really make the class more interesting to follow.

@jenishah9825 Жыл бұрын

This content here, is GOLD.

@leixun 4 жыл бұрын

*My takeaways:* 1.Multiclass SVM loss 4:15 2. Regularization 20:04: e.g. L2, L1 3. Softmax classifier 29:40 4. Optimization 50:10: gradient descent 4.1 Mini-batch gradient descent 1:00:40 4.2 Optimizers 1:05:00: SGD, momentum, Adagrad, RMSProp, Adam

@sezaiburakkantarci 10 ай бұрын

You are one of the best Andrej. You make learning so fun, with moments like 27:45 😄 Forever grateful.

@njordeggen4209 2 жыл бұрын

The best thing about the internet is access to great information and education like this series. Thank you Andrej and everyone involved!

@NehadHirmiz 2 жыл бұрын

These lectures are amazing. I still go back to review these every few years. Thank you sir.

@bayesianlee6447 6 жыл бұрын

Not only his lecture itself but his intelligence level and interpretation level on DL is quite interesting. Thanks for sharing and grateful for classes.

@Shkencetari 5 жыл бұрын

Thank you very much for this great lecture. Just to help others out, optimization starts at 50:11

@ralphblanes8370 9 жыл бұрын

Andrej's a great teacher! Keep up the good work!

@XTLi-xb8iv 9 жыл бұрын

Thanks for uploading the video online, it really helps!

@geoffreyanderson4719 8 жыл бұрын

Instructor's losses math is messed up at 8m31s. For frog colum the loss should be 12.9 but instructor says total loss of frog column is 10.9: In [10]: max(0, 2.2 - (-3.1)+1) # cat row frog column Out[10]: 6.300000000000001 Etc.

@yoniker83 8 жыл бұрын

Yeah I was about to say that when I saw your comment!

@questforprogramming 4 жыл бұрын

Good observation

@Kerphelio01 Жыл бұрын

Ah, thank you - I was going nuts over this. (I have a history of messing up trivial arithmetic, so I have zero self confidence).

@pauldacus4590 6 жыл бұрын

17:00 starts a good discussion of regularization...

@Siwon-vv5mi Жыл бұрын

At 39:00, what did he mean by jiggling the scores?

@atrenta123 5 жыл бұрын

The calculation for frog class at 8:32 is wrong. The sum should be 12.9

@michaelgomezchen7718 4 жыл бұрын

I think he forgot to sum the +1

@vaibhavravichandran 7 жыл бұрын

Thank you for the lecture series. They've helped me understand NNs much much better.

@joehan4255 9 жыл бұрын

On slide 11, shouldn't max(0,5.3) + max(0,5.6) be max(0,6.3) + max(0,6.6)?

@poyrazus 8 жыл бұрын

+Joe Han You're right, it should be.

@atrenta123 5 жыл бұрын

yes you're right

@MauriceMauser 7 жыл бұрын

[33.17] (-)0.84 is log base 10. assume, we want the natural log: - ln(0.13) = 2.04 ?

@edisonchen7682 6 жыл бұрын

yup, how come it's 0.89? i dont get it

@zahrasoltani8630 5 жыл бұрын

Thanks Andrej for the nice lecture. Just one point in the early minutes o the lecture, the car score is good but it is mentioned as bad. the score is high which means high probability so it is not a bad score.

@kemalware4912 Жыл бұрын

Thanks internet

@drdrsh 8 жыл бұрын

There is a very interesting artifact around Andrej Head, the symbols on the white board seem to jiggle around as his head moves. Warping the space-time somehow?

@deepuraveendran1903 7 жыл бұрын

Loss Function of Frog at 8.36 is 12.9 not 10.9, forget to add +1

@deanfang8673 8 жыл бұрын

Always the best class~

@uncoded0 3 жыл бұрын

Around 35:00 to 37:00 he's using log base 10.

@ocixot94 4 жыл бұрын

Min 14:00 I did not get why the loss at the very beginning is number of classes - 1

@lizardking640 4 жыл бұрын

because we have a weights close to 0 -> score close to 0 -> max(0, close_to_zero+1) gives us 1. Then for all classes we sum scores over each class **but one**, thats youre -1

@ocixot94 4 жыл бұрын

@@lizardking640 Thank you so much!

@bornslippy9109 6 жыл бұрын

@24:30 hows (4,1).(1,4) matrix result in 1?

@completelyboofyblitzed 5 жыл бұрын

I'm sorry, but where's this link for interactive examples? I really want to see 🙏

@walk-in-the-woods 4 жыл бұрын

vision.stanford.edu/teaching/cs231n-demos/linear-classify/

@AvielLivay Жыл бұрын

37:49 they ask professor why do we look for the minimum of the -log(p) instead of just looking directly for the maximum of p. He says that it makes the math more convenient but I thought the reason was that you want to maximize the joint probability p1*p2*p3*…*pN for your N samples. So instead of maximing a multiplication, you maximize the sum of the logs: log(p1)+log(p2)+log(p3)+… +log(pN).

@ShangDaili 7 жыл бұрын

I looked at the notes, and I still don't understand why analytical gradient is prone to errors. Can somebody explain? Thanks.

@KartikSubbarao 7 жыл бұрын

Given that f() already exists, the numerical gradient simply involves iterating over each of the weights w[i] and computing ( f(w[i] + h) - f(w[i]) ) / h -- see slide 3-56. The key is that you don't have to know anything about the internals of f() in order to write eval_numerical_gradient(). So there isn't much room for human error. Whereas the analytical gradient involves delving into the internals of f() and *manually* applying the chain rule to get the analytic expression for the partial derivative for each weight in the network. When you program a function to evaluate the analytic gradient, you have to use these analytic expressions that you hand-created to output the gradient. That's why this approach is more error prone -- there is plenty of room for human error when manually calculating partial derivatives. In theory, if you expressed f() in such a way that you could use Mathematica or some other tool to programmatically compute the partial derivatives, then these sorts of human errors could be potentially eliminated.

@aashudwivedi 7 жыл бұрын

it's possible that the function is not differentiable at some points, and analytical gradient still gives you a gradient because it's just comparing the value at x with the value at x + delta(h)

@lifeisbeautifu1 Жыл бұрын

I love you Andrej ❤

@EurekaAILabs 12 күн бұрын

Can someone explain what's happening in 39 mint??

@Enliden 7 жыл бұрын

Why is the analytic gradient error-prone? Because it is theoretically deduced? I mean, the point of mathematics is proving correctness of analytic results..

@onewinter1900s 8 жыл бұрын

incredible lecture...love it

@ThienPham-hv8kx 3 жыл бұрын

Summary: Loss function let us know how well the randomized weight we choose going ? (Bad , good , is going bad, is going good). To minimize the loss , we will try to press it to ZERO as much as possible in whole dataset (maybe million samples), look like a blind person 👨‍🦯 try going down hill ⛰️ and on each step he make sure that he is on right way of going down.

@serhangul 9 жыл бұрын

Great lectures, thanks!

@notanape5415 4 жыл бұрын

Historical drastic differentiation since 2012 - Rather than cherry-picking the features of a class and making a singular feature vector from it, we train our learning algorithms for each pixel and thus generalizes everything.

@anibaloal 8 жыл бұрын

I don't fully understand Mini-batch Gradient Decent. I wish someone can help me. Lets say that we are working with images, so you send a Mini-batch of images to the training. It is computed a single loss and a single gradient for each and then we take the average or something similar, or somehow you can compute a total loss and a total gradient for the complete set of images.

@anibaloal 8 жыл бұрын

+Aníbal Sigüenza Ok update I think I know the answer apparently you can sum or average the loss, and it is more used the mean so it is not dependent of the Mini-batch size.

@carlosvazquez8984 7 жыл бұрын

what is the textbook used in this class?

@aashudwivedi 7 жыл бұрын

I don't think there's a recommended textbook, I'm actually finding the lecture notes more than enough

@rishabhbhardwaj6174 6 жыл бұрын

Are the lecture notes posted somewhere?

@markroxor 6 жыл бұрын

cs231n.github.io

@jonathanr4242 2 жыл бұрын

Did you hear about the mathematician mountain climber? He got stuck at a local maximum

@mikeba3809 2 ай бұрын

Andrej is so good

@jenishah9825 7 жыл бұрын

Can anyone explain about the loss at initialization? If they are nearly 0, how do we get n-1?

@BryanChianghau 7 жыл бұрын

The differences between all the scores of the not-true classes and the score of the correct class will be around 0. But a margin of 1 is added to each difference of the n-1 not-true classes for n classes in total, the final loss for each example will be n-1 => full loss = n-1.

@ekandrot 7 жыл бұрын

Except for the columns that are correct and therefore zero. So we will be averaging columns of zero and columns of n-1, which means it will be around n-1 with small values of W.

@dingguodong5840 9 жыл бұрын

Does anybody know where I can find the slides?

@Aksahnsh 9 жыл бұрын

cs231n.stanford.edu/syllabus.html

@dingguodong5840 9 жыл бұрын

Akshansh Singh Thanks a lot!

@piewpok3127 Жыл бұрын

Day -2 . 3 lectures and counting...

@Sgoose105 5 жыл бұрын

What accent is his?

@Landonismo 7 жыл бұрын

that gradient descent error manifold description tho...

@fengzhang8640 9 жыл бұрын

hah good course！

@abhisheksanghani9680 7 жыл бұрын

Anyone in New York, going through these? We could team up! :)

@aashudwivedi 7 жыл бұрын

I am going through this in India, Bangalore. Would love to team up over hangouts/github.

@bingeltube 6 жыл бұрын

Recommendable, but still some inexperience in giving lectures shows

@zahrasoltani8630 5 жыл бұрын

who cares, the lecture was quite clear and easy to understand. ( The most important thing you can't find in the lecture of the most EXPERIENCED lecturers.)

@zoriiginalx7544 4 күн бұрын

He was a grad student, of course there was some inexperience. Nevertheless, he did a fantastic job