Understanding Dropout (C2W1L07)

Рет қаралды 152,430

Күн бұрын

Пікірлер: 50

@manuel783 3 жыл бұрын

Clarification about Understanding dropout Please note that from around 2:40 - 2:50, the dimension of w[ˡ] should be 7x3 instead of 3x7, and w[³] should be 3x7 instead of 7x3. In general, the number of neurons in the previous layer gives us the number of columns of the weight matrix, and the number of neurons in the current layer gives us the number of rows in the weight matrix.

@supreethmv 3 жыл бұрын

It basically depends on the way you multiply the weights to the data of the previous layer. So the place you've studied might have followed the other way round. Make sure the corresponding weights should be multiplied with the nodes of the previous layer. This doubt occurs, coz ur basics of linear algebra should me made more firm. Just watch the playlist on linear algebra by 3b1b, it's a must to know that. Hope that helps bud. ♥️

@krogubueva7856 3 жыл бұрын

@@supreethmv The OP is confused because the opposite convention was used earlier in this course. No need to be so condescending...

@rawandbamoki9873 Жыл бұрын

From my experience, in the input layer each feature represents a column and each row represents a sample. Here in the input layer we have 3 features so the shape of the input is Nx3, where N can be any integer because it just gives us the number of samples. In the hidden layer, the weights of one neuron are in the same column and the number of neurons gives us the number of columns; we have 7 neurons with 3 weights each, hence the shape of the hidden layer is 3x7. I believe this is the convention that is mostly used.

@sungyunpark1366 5 жыл бұрын

I think the dimension of each weight w1 should be [7][3], not [3][7]. And w3 should be [3][7]...

@davidz6828 5 жыл бұрын

I agree. In coding and in math, the sizes are opposite to each other.

@shwetashaw9978 5 жыл бұрын

The dimension of w[l] is (n[l-1], n[l]) And that of w[l].T is(n[l], n[l-1])

@NuXxCr0 5 жыл бұрын

@@shwetashaw9978 That is wrong. Dimension of W[l] are ( n[l], n[l-1] ). n[l] being # of neurons in current layer and n[l-1] being # of neurons in previous layer or input features.

@ritapravadutta7939 5 жыл бұрын

It does not matter what dimension you use, you always have the option of transpose.

@BCS_AliAbbas 3 жыл бұрын

@@shwetashaw9978 dim of w[l] is always n[l],n[l-1] andrew is mistaken here

@alonewalker3050 24 күн бұрын

the video does not play with me not on laptop and mobile is their any main reason

@preetysingh7672 6 ай бұрын

funny scaling factor..😂. It's very polite of you to call all the tech blunders FUNNY.. & add humour effortlessly w/o being rude🤩. Great teacher!!

@marcellinuschrisnada7613 4 жыл бұрын

Just to make sure my understanding, the downside of using dropouts is that we cannot use the loss function (or J function as previously stated in the video) as the indicator whether our model is actually converging or diverging. Because the neurons in the hidden layers are constantly changing through iterations. Therefore, we simply cannot compare since how the data is being treated differently every epoch. Is it correct?

@winsonwijaya5592 4 жыл бұрын

its more likely that when computing the J, we do a forward prop through every nodes in our network when without dropout. but when doing a dropout, we only do a backprop to some of the nodes in NN, which in turn that when computing the J, we will do a forward prop to only some nodes compared to when we don't use a dropout, this makes the J function less defined or not so accurate representation on how well our model cost function is(since its only calculate cost function on some nodes instead of full nodes)

@rahultripathi9457 4 жыл бұрын

Also to check if your model is performing in the right direction, we can switch off the dropout or set keepProb = 1, and see if the plot of our cost function (J) is converging or not. If yes then we are good to go for training with dropout as ON.

@travel6142 2 жыл бұрын

Do we use another random dropout at each iteration? Suppose that we selected keep_prob=0.8 for layer 3, at each iteration, it picks another random 0.2 of units to shut off as far as I understood. Can anyone confirm me about this?

@valentinfontanger4962 2 жыл бұрын

According to the previous video, you choose a new set of nodes after each batch

@alboz1327 2 жыл бұрын

At each iteration (batch) you pick each neuron of layer 3 and decide with prob=0.2 to shut it down. That is, at each iteration you will have 20% of your layer 3 neurons shut down, but which are shut down is random

@deveshnagar1732 4 жыл бұрын

What if instead of dropping hidden unit randomly I will trained my NN with limited units not deep

@joelphilip2942 4 жыл бұрын

Then there is a chance that the model will not perform well even on ur training set as the model will not able to compute the complex features due to less neurons... Remember: Training Error can be reduced by: Increasing the no of layers or increasing the no of neurons in a layer Test Set Error can be reduced by: Using regularization strategies that ensure that the model has not overfitted on the training data, like L1 regularization,L2 regularization, Drop outs and some other techniques... For more knowledge follow Andre NG's Deep Learning specialization. It consists of 5 courses and all of them are very good....... Follow the link: www.coursera.org/specializations/deep-learning?

@mandarchincholkar5955 3 жыл бұрын

@@joelphilip2942 thanks alot for your comment..

@indiangirl4081 5 ай бұрын

Half of the things he speaks are un-understandable to me. God knows what he says!!

@travelwithadatascientist 5 жыл бұрын

Thanks a lot!

@NolanZewariligon 4 жыл бұрын

2:00 If L2 is more adaptive, what is the advantage of using dropout? Is it the robustness? It seems that dropout directly enforces that the network should be robust.

@chinmaymaganur7133 4 жыл бұрын

@Ran Su so weight distribution means, shape of weight matrix ???

@jpzhang8290 5 жыл бұрын

How can dropout be related to L2 regularization? L1 is more plausible.

@WeilongYou 5 жыл бұрын

Why?

@o0Yozora0o 4 жыл бұрын

I was thinking of the same thing, since L1 regularization shrinks some parameters to zero (similar effect by eliminating the nodes by using dropout). However, in this video, Andrew said that dropout spread out the weights which has the effect of shrinking the weights of all the previous units like L2 regularization.

@zxzhaixiang 4 жыл бұрын

Dropout can be seen as a L0 regularization (not exactly though)

@ahmedb2559 Жыл бұрын

Thank you !

@ABC2007YT 4 жыл бұрын

At what point is a dropout rate too high? 50% sounds like a lot if the training step is called frequently. I'm afraid it throws out useful weights before they converge.

@sandipansarkar9211 3 жыл бұрын

very importany lecture .need to watch again

@phumlanimbabela-thesocialc3285 3 жыл бұрын

Thanks.

@MinhTran-ew3on 5 жыл бұрын

so why drop out work ?

@nafisadahodwala134 5 жыл бұрын

Puri Ramayan ke baad pooch rahe ho ke Raam kaun hai!

@dragosdima8758 5 жыл бұрын

He presents the intuition, if you more a more in depth analysis I recommend reading this article, www.jmlr.org/papers/volume15/srivastava14a/srivastava14a.pdf?.com&

@mumcarpet109 5 жыл бұрын

drop out has to work to get money, since they don't have any college degree

@andrewtoland1933 4 жыл бұрын

@@mumcarpet109 drop out is most effective when applied in conjunction with tune-in and turn-on

@kswill4514 4 жыл бұрын

I had a question regarding 3:15 since I expect assigning a low keep prop at hidden layer 1 instead of hidden layer 2. As Andy mentioned, the drop out performs shrinking of weights of input nodes which could cause overfitting, so I assumed P(keep) should be low for both hidden 1 and 2

@chinmaymaganur7133 4 жыл бұрын

hidden layer 1 receives i/p(weights) from 3 inputs, so weight size will be (7,3) for hd1, whereas for hdlayer2 ,it receives weights from hd1 which have 7 nodes so hd 2 weight size is(7,7) since its a lot of parameters, p for hd2 is low compared to hd1.

@11hamma 4 жыл бұрын

"Andy" :) :)

@coolamigo00 5 жыл бұрын

Do we just keep randomly changing droput neurons in every iteration once we start? How would that be useful; we need to find best combination in real world.

@WeilongYou 5 жыл бұрын

I guess that means all weights are penalized?

@ritapravadutta7939 5 жыл бұрын

@@WeilongYou right

@joelphilip2942 4 жыл бұрын

Well as we are randomly selecting neurons to be deactivated in a particular iteration, you can consider that the neurons in a layer will get inputs from neurons in the previous layer, in a balanced fashion, none of them too high or too low (we scale up inputs of that layer to compensate for loss of neurons)...Hence it is not single dependent on a single feature which a particular neuron may have extracted in the previous layer..... So it can work even when the some features which were present in the input data are not present in the test data...

@joelphilip2942 4 жыл бұрын

Consider an example where you want to detect a face Suppose you use a 1 layer NN, where you have the input features fed to a neuron which gives the output whether the image is a face or not.. Suppose we take 4 features: 1) eyes 2) ears 3) nose 4) mouth... Now consider that all the training data consisted of images of people with specs...Then the NN would consider that all people with a face wear specs...this is a case of overfitting (somewhat extreme, i admit)....So when you balance the weights of these inputs using dropout, the inputs to the subsequent neuron from these input neurons are somewhat balanced...Hence the NN gives equal preference to eyes, ears, nose and mouth...So when it sees a person without specs, it sees that the image has other features like ears, nose and mouth...So it is a face..... Hope this explains it....

@coolamigo00 4 жыл бұрын

@@joelphilip2942 Thanks! So will in the operation mode the network have all neurons or a selected combination.