I am doing the DL specialization after finishing your old ML course. By far you are the best teacher out there. Thank you so much for this.
@zekarias98884 жыл бұрын
WooW. After I watched 5 other videos about ResNets, I was still lost. Now I got this video and it cleared my misunderstandings out of my mind. Super cool!
@viniciussantana87374 жыл бұрын
Andrew os simply the best instrutor in neural networks subject out there. Helped me a Lot.
@JohnDoe-vr4et4 жыл бұрын
Me after listening to most people explaining Resnet: "What? Why? Why do you do this?" Me after listening to Andrew: "Makes sense. Easy peasy."
@Cicatka.Michal3 жыл бұрын
Finally got it! Sometimes it is hard for me to grasp even the basic concepts if I don't know what I should understand and take from the topic. Thank You very much for these videos that will at first tell you what problem you are trying to solve is and why you should solve it and then clearly explains the soulutions. Thumbs up! :)
@sanjurydv5 жыл бұрын
never seen tutorial videos which such a clear explanation. He is the best
@davidtorres50123 жыл бұрын
You are the best, Andrew
@Alex-xx8ij2 жыл бұрын
Your explanation is very clear! Thank you for the lecture.
@swfsql Жыл бұрын
I think we could use the ResNet concept to improve Dropout, creating a "shutdown" regularization: Select a layer (or rather nodes from that layer) that ought to be shutdown and instead only act on the cost function, by adding a cost relative to that layer not being an identity layer. Then the network is free to gradually adapt itself (hopefully by reducing train-set overfit and generalizing) as to push that layer into being evermore so of an identity one. Then if that layer manages to be an identity, it can be permanently shutdown. This could be a way to reduce the network size, and maybe could automatically be applied on high variance with low bias. As far as linear Z functions go, one way for a layer to be an identity is if it has the same amount of nodes as inputs, and if you make a cost for each node[j] so that only their weight[j] is 1 while all other weights are 0, so this would be similar to a "identity" Z layer. But I think that trying to make the activation function also an identity is a hassle, but even ignoring the activation function, if you could still manage to just shutdown the Z function nodes and stack the posterior activation back into the previous activation, that would already be a network simplification. Edit: We also could try to simplify the activation functions if we generalize them and re-parametrize them. Eg. for a ReLU activation function, we could turn it into a leaky ReLU where the leaky-side parameter starts at zero (so it's just like normal ReLU), then we add a cost of that parameter being zero and we let backprop start pushing it towards 1, in which case that previously ReLU activation has turned into the identity activation, which can them be gracefully shutdown.
@HexagonalClosePacked5 жыл бұрын
I'm trying to understand the components behind Semantic Segmentation and your videos really helped!
@altunbikubra4 жыл бұрын
Thank you, it was a very brief and simplified explanation, loved it.
@iasonaschristoulakis69322 жыл бұрын
Excellent both theoretically and technically
@promethful4 жыл бұрын
So the skipped connections don't literally skip layers but rather add the original input onto the output of the 'skipped' layers?
@АннаКопатько3 жыл бұрын
I think so too, at least it is the only explanation that I understand
@Ganitadava2 жыл бұрын
Sir, very nice explanation as always, thanks a lot.
@MrRynRules3 жыл бұрын
Thank you!
@rahuldogra71713 жыл бұрын
what is the benefit of adding identity blocks and skip it? Instead of skipping it why then we are adding?
@kumarabhishek56523 жыл бұрын
Why training error is increasing in reality as opposed to theory in plain model??
@academicconnorshorten61715 жыл бұрын
Do you broadcast a[l] to make it match the dimensionality of a[l+2]?
@MeAndCola5 жыл бұрын
"man pain" 2:10 😂
@altunbikubra4 жыл бұрын
omg he is really writing that :D
@chrlemes2 жыл бұрын
I think the title of the paper is wrong. The correct one is "Deep residual learning for image recognition".
@ravivaghasiya56802 жыл бұрын
Hello everyone. In this video at time 5.20 you have mentioned that as number of layers increase in plain network , training error gets increased in practice. Could you please explain me or Share me some references why does this actually occurs? One reason,i found that vanishing gradient problem and this can be addressed using ReLU. Thus, one can use ReLU in plain network. Then why does ResNet is very traditional?
@amir06a4 жыл бұрын
I have a very silly doubt if the skip layers/connections exist isn't the real layers in play = total layers/2?
@whyitdoesmatter28144 жыл бұрын
Wait!!!!! Z_{l+1} should normally be equal to W_{l}a_{l} + b_{l+1}??
@patrickyu84702 жыл бұрын
Just a question for those out there - has anyone been able to use techniques from ResNets to improve the convergence speed of deep fully connected networks? Usually people use skip connections in the context of convolutional neural nets but I haven't seen much gain in performance with fully connected ResNets, so just wondering if there's something else I may be missing.
@rahulrathnakumar7855 жыл бұрын
If a_l skips two layers to directly enter the final ReLU, how do we get the z_(l+2) in the final equation a_(l+2) =g(z_(l+2) + a_(l))? Thanks!
@IvanOrsolic5 жыл бұрын
You still calculate them, you just keep a copy of the original a_l value and plug it into the network before calculating a[l+2]. Why would you even do that? It's explained in the next video
@mohe4ever5143 жыл бұрын
@@IvanOrsolic If we plug this value to the network, then what is the benefit of skipping the layers? still we are going through the layers to calculate a[l+2], we just added one more term here but how it helps in skipping the connection ?
@astropiu47534 жыл бұрын
there's some high-frequency noise in many of this specialization's videos which is hurting my ears.
@ntumbaeliensampi63054 жыл бұрын
use a low pass filter. Lol
@swapnildubey64285 жыл бұрын
how are the dimensions handled, I mean that dimension of a[l] could happen to be unequal to a[l + 2];
@SuperVaio1234 жыл бұрын
Padding
@s.s.19303 жыл бұрын
padding with zeros or using an Conv 1x1 inside skip connection
@mannansheikh4 жыл бұрын
Great
@steeltwistercoaster4 жыл бұрын
+1 this is great
@adityaniet6 жыл бұрын
Hi Andrew , I have a question, when we are calculating al+2 we need both al and z[l+2] . But z[l+2] can only be calculated by calculating a[l+1] , so how will we get that ? ....Many thanks:-)
@larryguo25296 жыл бұрын
If I understand your question correctly, a[I+2] = activation of z[I+2] ......
@freee88386 жыл бұрын
just like in formula a[l+1]=g(z[l+1])...
@_ashout6 жыл бұрын
Yep this confuses me as well. How can we have both z[l+2] and a[l+1] if a[l+1] skips a layer?
@wliw30343 жыл бұрын
Good
@lorenzosorgi60884 жыл бұрын
is there any theoretical motivation justifying the increasing error of a deep plain network during training?
@mohnish.physics4 жыл бұрын
Theoretically, the error should go down. But in practice, I think the exploding gradients for a network with a large number of layers increases the error.
@tumaaatum3 жыл бұрын
Yes there is. I am not sure why Andrew Ng didn't touch up on this. Basically once you add the skip connection you are including an additive term inside the non-linearity. The additive term can only increase the function space (the range of the function www.intmath.com/functions-and-graphs/2a-domain-and-range.php) as it is inside the original function (the theory of nesting functions). Hence, you allow the network to have more approximating/predictive capacity in each layer. You can visit the D2L lectures about this: d2l.ai/chapter_convolutional-modern/resnet.html?highlight=resnet
@shivani404sheth43 жыл бұрын
Meet the ML god
@kavorkagames5 жыл бұрын
I find a ResNet behaves as a shallower net. It gives a solution that resembles that of a four to six (roughly) layered net when being eight laters. ResNets are out for me.
@okktok5 жыл бұрын
Kavorka Games ResNets are now that state of the art for image recognition. Every new architecture uses it and doesn’t make sense anymore to use plain networks.
@amir06a4 жыл бұрын
@@okktok but isn't the actual layers in play = total layers/2 as we are providing a shortcut? so, on a broader note, they are just like plain networks which looks bigger?
@trexmidnite3 жыл бұрын
Sounds like terminator..
@mikebot53615 жыл бұрын
If we use resnets, are we losing the information in between the layers?
@s.s.19303 жыл бұрын
no, we're not losing them - we just add x after a amount of layers (in this example 2 layers) - this is our ResNet block