That project report being mentioned at 27:13 had been accepted to one of the ICLR workshops on the following year and has over 500 citations up until now. Impressive stuff.
@leixun4 жыл бұрын
*My takeaways:* 1. Parameter updates: optimizers, such as momentum, Nesterov momentum, AdaGrad, RMSProp, Adam 3:53 2. Learning rate 28:20 3. 2nd order optimizers 30:53 4. Evaluation: model ensembles 36:19 5. Regularization: dropout 38:25 6. Gradient checking 56:55 7. Convolutional Neural Networks 57:35
@ze24114 жыл бұрын
thanyou
@leixun4 жыл бұрын
@@ze2411 You’re welcome
@citiblocsMaster7 жыл бұрын
1:04:50 "This might be useful in self driving cars". One year later, head of AI at Tesla.
@notanape54154 жыл бұрын
(Car)-(Path)y - It was always in the name.
@666enough3 жыл бұрын
@@notanape5415 Wow this coincidence is unbelievable.
@irtazaa82009 жыл бұрын
Did cs231n in 2015. great to see the videos released to public now. Good Job Stanford!
@twentyeightO18 ай бұрын
This is helping me quite a lot, thanks!!!
@champnaman7 жыл бұрын
@15:10, Andrej says that according to recent work, local minimums' are not a problem for large networks. Could anyone point me to these papers? I am interested to read these results.
@vivekloganathan93862 жыл бұрын
For someone curious like me @ 43:34 (Someone's siri mistakenly tried to recognize.. and said this..) Siri: "I am not sure what you said"
@nguyenthanhdat938 жыл бұрын
The best AI course I have ever taken. Thank you, Andrej!!!
@ThienPham-hv8kx3 жыл бұрын
Summary : start from a simple gradient decent : x += - learning_rate * dx , but when we applied this gradient decent in a big dataset , it will take a long time to calculate derivative for each data point (neural unit). So we will use Stochastic Gradient Descent (SGD) to randomly choose a neural unit in a layer instead of whole units. It will reduce the time to calculate derivative. SGD 's still slow because it jitter on the way of convergence because of random. Then, we have other method to help converge faster: SGD + Momentum , Adagrad , RMSProp , Adam . Each of them will have learning rate. We should find the ideal learning rate for each of data set. For ex: default learning rate of Adam = 0.001 in Keras. We can use dropout to prevent overfitting. To prevent overfitting , we have 3 ways: increase dataset , simplify network (dropout , reduce number of layer) , preprocessing data (data augmentation)
@ArdianUmam7 жыл бұрын
In 9:12, the SGD refers to literally SGD (train with ONLY 1 example randomly) or refers to mini-batch? Because in the web lecture notes, it states that term SGD often enough to be used for "mini-batch" actually (not literally SGD using ONLy 1 example).
@krishnadusad56247 жыл бұрын
It refers to mini-batch SGD.
@ArdianUmam7 жыл бұрын
Ok, thanks :)
@iliavatahov95178 жыл бұрын
Great job! The end is so inspiring!
@mihird95 жыл бұрын
57:30 Intro to CNN
@sokhibtukhtaev96936 жыл бұрын
at 44:06, Is he saying that dropout is applied on different neurons in each epoch? Say, I have x -10-10 - y network (x -input, y- output, 10s - hidden layers). In one epoch (that's forward prop + backprop) dropout is applied to,say, 3rd, 5th, 9th neurons of first hidden layer and 2nd, 5th, 8th neurons of second hidden layer, in the second epoch dropout is applied to 5th,6th,10th neurons of first layer and 1st, 7th, 10th neurons of second hidden layer. Does it mean we kind of have as many models as our epoch? Can someone clear this up for me?
@souvikbhattacharyya24803 ай бұрын
Yes, I think youu are right, but I think you are talking about mini batches and not epochs. Epoch = one full pass (fprop+backprop) for each data points in your training set i.e. 1 epoch = several mini bacthes
@tthtlc7 жыл бұрын
Just found that the slides at Stanford website is updated with the 2017 slides + videos. Is there any way to get the original 2016 slides? The lectures are as classic as those of Andrew Ng.
@ArdianUmam7 жыл бұрын
43:35 xD
@vcothur76 жыл бұрын
What was that person saying?
@giorgigiglemiani43346 жыл бұрын
"I'm not sure what you said." and it was someone's siri apparently.
@questforprogramming4 жыл бұрын
@@giorgigiglemiani4334 lol
@randywelt82108 жыл бұрын
8:50 noisy signal = how about usage of kalman filter??
@boringmanager95597 жыл бұрын
Some guy playing dota vs neural network - over million views. A genius explaining how to build a neural network - 40k views.
@pawelwiszniewski6 жыл бұрын
People can relate to playing a game much more easily. Understanding how it works - it's hard :)
@WahranRai6 жыл бұрын
33:12 L-BFGS not be confused with LBGT
@omeryalcn57976 жыл бұрын
there is a one issue . We don't suggest normalized data when it is a image. but when we use batch normalization , we normalize data. Is that a problem ?
@qwerty-tf4zn3 жыл бұрын
It's getting exponentially difficult
@mostinho74 жыл бұрын
Intro to convnets is all history, skip to next lecture for convnets
@jiexc43854 жыл бұрын
What happended on 43:38, what's so funny?
@vivekloganathan93862 жыл бұрын
(Someone's siri mistakenly tried to recognize.. and said this..) Siri: "I am not sure what you said"
@bayesianlee64476 жыл бұрын
What is vanilla update? :)
@nikolahuang19197 жыл бұрын
The big idea of momentum update method is smart! But it is obvious that this updating method is an interim method.