You are one of the best Andrej. You make learning so fun, with moments like 27:45 😄 Forever grateful.
@njordeggen42092 жыл бұрын
The best thing about the internet is access to great information and education like this series. Thank you Andrej and everyone involved!
@NehadHirmiz2 жыл бұрын
These lectures are amazing. I still go back to review these every few years. Thank you sir.
@bayesianlee64476 жыл бұрын
Not only his lecture itself but his intelligence level and interpretation level on DL is quite interesting. Thanks for sharing and grateful for classes.
@Shkencetari5 жыл бұрын
Thank you very much for this great lecture. Just to help others out, optimization starts at 50:11
@ralphblanes83709 жыл бұрын
Andrej's a great teacher! Keep up the good work!
@XTLi-xb8iv9 жыл бұрын
Thanks for uploading the video online, it really helps!
@geoffreyanderson47198 жыл бұрын
Instructor's losses math is messed up at 8m31s. For frog colum the loss should be 12.9 but instructor says total loss of frog column is 10.9: In [10]: max(0, 2.2 - (-3.1)+1) # cat row frog column Out[10]: 6.300000000000001 Etc.
@yoniker838 жыл бұрын
Yeah I was about to say that when I saw your comment!
@questforprogramming4 жыл бұрын
Good observation
@Kerphelio01 Жыл бұрын
Ah, thank you - I was going nuts over this. (I have a history of messing up trivial arithmetic, so I have zero self confidence).
@pauldacus45906 жыл бұрын
17:00 starts a good discussion of regularization...
@Siwon-vv5mi Жыл бұрын
At 39:00, what did he mean by jiggling the scores?
@atrenta1235 жыл бұрын
The calculation for frog class at 8:32 is wrong. The sum should be 12.9
@michaelgomezchen77184 жыл бұрын
I think he forgot to sum the +1
@vaibhavravichandran7 жыл бұрын
Thank you for the lecture series. They've helped me understand NNs much much better.
@joehan42559 жыл бұрын
On slide 11, shouldn't max(0,5.3) + max(0,5.6) be max(0,6.3) + max(0,6.6)?
@poyrazus8 жыл бұрын
+Joe Han You're right, it should be.
@atrenta1235 жыл бұрын
yes you're right
@MauriceMauser7 жыл бұрын
[33.17] (-)0.84 is log base 10. assume, we want the natural log: - ln(0.13) = 2.04 ?
@edisonchen76826 жыл бұрын
yup, how come it's 0.89? i dont get it
@zahrasoltani86305 жыл бұрын
Thanks Andrej for the nice lecture. Just one point in the early minutes o the lecture, the car score is good but it is mentioned as bad. the score is high which means high probability so it is not a bad score.
@kemalware4912 Жыл бұрын
Thanks internet
@drdrsh8 жыл бұрын
There is a very interesting artifact around Andrej Head, the symbols on the white board seem to jiggle around as his head moves. Warping the space-time somehow?
@deepuraveendran19037 жыл бұрын
Loss Function of Frog at 8.36 is 12.9 not 10.9, forget to add +1
@deanfang86738 жыл бұрын
Always the best class~
@uncoded03 жыл бұрын
Around 35:00 to 37:00 he's using log base 10.
@ocixot944 жыл бұрын
Min 14:00 I did not get why the loss at the very beginning is number of classes - 1
@lizardking6404 жыл бұрын
because we have a weights close to 0 -> score close to 0 -> max(0, close_to_zero+1) gives us 1. Then for all classes we sum scores over each class **but one**, thats youre -1
@ocixot944 жыл бұрын
@@lizardking640 Thank you so much!
@bornslippy91096 жыл бұрын
@24:30 hows (4,1).(1,4) matrix result in 1?
@completelyboofyblitzed5 жыл бұрын
I'm sorry, but where's this link for interactive examples? I really want to see 🙏
37:49 they ask professor why do we look for the minimum of the -log(p) instead of just looking directly for the maximum of p. He says that it makes the math more convenient but I thought the reason was that you want to maximize the joint probability p1*p2*p3*…*pN for your N samples. So instead of maximing a multiplication, you maximize the sum of the logs: log(p1)+log(p2)+log(p3)+… +log(pN).
@ShangDaili7 жыл бұрын
I looked at the notes, and I still don't understand why analytical gradient is prone to errors. Can somebody explain? Thanks.
@KartikSubbarao7 жыл бұрын
Given that f() already exists, the numerical gradient simply involves iterating over each of the weights w[i] and computing ( f(w[i] + h) - f(w[i]) ) / h -- see slide 3-56. The key is that you don't have to know anything about the internals of f() in order to write eval_numerical_gradient(). So there isn't much room for human error. Whereas the analytical gradient involves delving into the internals of f() and *manually* applying the chain rule to get the analytic expression for the partial derivative for each weight in the network. When you program a function to evaluate the analytic gradient, you have to use these analytic expressions that you hand-created to output the gradient. That's why this approach is more error prone -- there is plenty of room for human error when manually calculating partial derivatives. In theory, if you expressed f() in such a way that you could use Mathematica or some other tool to programmatically compute the partial derivatives, then these sorts of human errors could be potentially eliminated.
@aashudwivedi7 жыл бұрын
it's possible that the function is not differentiable at some points, and analytical gradient still gives you a gradient because it's just comparing the value at x with the value at x + delta(h)
@lifeisbeautifu1 Жыл бұрын
I love you Andrej ❤
@EurekaAILabs12 күн бұрын
Can someone explain what's happening in 39 mint??
@Enliden7 жыл бұрын
Why is the analytic gradient error-prone? Because it is theoretically deduced? I mean, the point of mathematics is proving correctness of analytic results..
@onewinter1900s8 жыл бұрын
incredible lecture...love it
@ThienPham-hv8kx3 жыл бұрын
Summary: Loss function let us know how well the randomized weight we choose going ? (Bad , good , is going bad, is going good). To minimize the loss , we will try to press it to ZERO as much as possible in whole dataset (maybe million samples), look like a blind person 👨🦯 try going down hill ⛰️ and on each step he make sure that he is on right way of going down.
@serhangul9 жыл бұрын
Great lectures, thanks!
@notanape54154 жыл бұрын
Historical drastic differentiation since 2012 - Rather than cherry-picking the features of a class and making a singular feature vector from it, we train our learning algorithms for each pixel and thus generalizes everything.
@anibaloal8 жыл бұрын
I don't fully understand Mini-batch Gradient Decent. I wish someone can help me. Lets say that we are working with images, so you send a Mini-batch of images to the training. It is computed a single loss and a single gradient for each and then we take the average or something similar, or somehow you can compute a total loss and a total gradient for the complete set of images.
@anibaloal8 жыл бұрын
+Aníbal Sigüenza Ok update I think I know the answer apparently you can sum or average the loss, and it is more used the mean so it is not dependent of the Mini-batch size.
@carlosvazquez89847 жыл бұрын
what is the textbook used in this class?
@aashudwivedi7 жыл бұрын
I don't think there's a recommended textbook, I'm actually finding the lecture notes more than enough
@rishabhbhardwaj61746 жыл бұрын
Are the lecture notes posted somewhere?
@markroxor6 жыл бұрын
cs231n.github.io
@jonathanr42422 жыл бұрын
Did you hear about the mathematician mountain climber? He got stuck at a local maximum
@mikeba38092 ай бұрын
Andrej is so good
@jenishah98257 жыл бұрын
Can anyone explain about the loss at initialization? If they are nearly 0, how do we get n-1?
@BryanChianghau7 жыл бұрын
The differences between all the scores of the not-true classes and the score of the correct class will be around 0. But a margin of 1 is added to each difference of the n-1 not-true classes for n classes in total, the final loss for each example will be n-1 => full loss = n-1.
@ekandrot7 жыл бұрын
Except for the columns that are correct and therefore zero. So we will be averaging columns of zero and columns of n-1, which means it will be around n-1 with small values of W.
@dingguodong58409 жыл бұрын
Does anybody know where I can find the slides?
@Aksahnsh9 жыл бұрын
cs231n.stanford.edu/syllabus.html
@dingguodong58409 жыл бұрын
Akshansh Singh Thanks a lot!
@piewpok3127 Жыл бұрын
Day -2 . 3 lectures and counting...
@Sgoose1055 жыл бұрын
What accent is his?
@Landonismo7 жыл бұрын
that gradient descent error manifold description tho...
@fengzhang86409 жыл бұрын
hah good course!
@abhisheksanghani96807 жыл бұрын
Anyone in New York, going through these? We could team up! :)
@aashudwivedi7 жыл бұрын
I am going through this in India, Bangalore. Would love to team up over hangouts/github.
@bingeltube6 жыл бұрын
Recommendable, but still some inexperience in giving lectures shows
@zahrasoltani86305 жыл бұрын
who cares, the lecture was quite clear and easy to understand. ( The most important thing you can't find in the lecture of the most EXPERIENCED lecturers.)
@zoriiginalx75444 күн бұрын
He was a grad student, of course there was some inexperience. Nevertheless, he did a fantastic job