Cracking video, Rupert. Well animated and explained. I am already satisfied with my understanding of ResNets after this.
@AhmedThahir200214 күн бұрын
This has to be the best explanation of ResNet ever. Amazing work, Rupert!
@Cypher195 Жыл бұрын
Thanks. Been out of touch with AI for far too long so this summary is very helpful.
@rupert_ai Жыл бұрын
Thanks Aziz, good luck with getting back in touch with AI
@sarthakpatwari7988 Жыл бұрын
Mark my words, if he become consitent, this channel will become one of the next big thing in AI
@prammar19514 ай бұрын
everyone is praising the video, maybe it's just me but i really didn't understand what the residual connection hopes to achieve? and how does it do that? didn't make it clear.
@TheJDenАй бұрын
“Residuals” are what mathematicians call the difference between the actual and predicted data values. Imagine you had a simple dataset that looked linear but with some oscillating variation (like put x + sin(3x) into graphing calculator). One option to model this data would be to train a network on each x and y. In this case, the model would have to learn the underlying linear trend (x), and the oscillation (sin(3x)). Alternatively, we could estimate the slope of the line (without variations). We could then repeatedly feed the estimated height of the line at x into the network whenever it is training on an x y pair. This way, the model only has to learn the oscillation, the difference between the line and the variation, the residual (sin(3x)). It makes the model’s job easier because it doesn’t have to learn and keep track of the linear trend (x) since we remind it every few steps. In more complex things like he showed in the video it means it doesn’t have to learn both how to maintain a good representation of a flower and make resolution higher, only how to make resolution higher (because it always has access to original flower).
@poopenfarten4222 Жыл бұрын
legit one of the best explanations i found
@rupert_ai Жыл бұрын
Thanks myyy dude!
@sergioorozco733110 ай бұрын
Is the right hand side of the addition supposed to have height and width dimension of 32x32 at 7:08? I think there is a small typo in the visual.
@logon27782 жыл бұрын
You say that the identity function is added elementwise at the end of the block. So say I have an identity [1,2] and the result of the block is [3,4]. So would the output of the layer be [4,6]? So its not a concatenation of the identity function which would be [1,2,3,4], correct? You basically ensure the identity function is the same dimensionality as the output of the block then add them element-wise.
@rupert_ai2 жыл бұрын
Hey Logon, great question, you are totally correct the output from your example (identity [1,2] and block output [3, 4]) would be [4, 6] e.g. you simply add the values based on their twin positions. You don't concatenate! Yes, the last section on dimension matching covers the scenario when the dimensions don't match (and therefore you can't add them element-wise until you modify them).
@logon27782 жыл бұрын
@@rupert_ai So in the case of the 1x1 convolutions where there are 3 input channels and 6 output channels of equal size... How are they added element-wise? Are the input features add elementwise twice? Once for each pair of 3 output channels? Or does it only add element-wise to the first 3 output channels and leaves the other 3 untouched.
@rupert_ai2 жыл бұрын
Hi @@logon2778, as is standard with convolutional neural networks each 1x1 convolution takes contributions from all channels (in this case across all 3 channels of the input). So in order to have 6 output channels you have 6 lots of 1x1 convolutions that take contributions from all 3 channels. In order to half the size you skip every other pixel (e.g. a stride of 2). That is simply what is used for the original paper, obviously other approaches work too. Now you have a 6 channel output which is half the height and width which matches the network dimensions and you can do element wise addition as usual. Have a watch of the video again and look up convolutional basics - I have a video on this actually - hopefully that might shed some light on things kzbin.info/www/bejne/bIezap5ojLJpoZI
@logon27782 жыл бұрын
@@rupert_ai I understand how convolution works for the most part. 8:45 you show that there are 6 output channels of equal size to the input. But how can you element wise add 3 input channels to 6 output channels of equal size? In my mind you have double the dimensions. You have 6, 64x64 output channels. But you have 3, 64x64 input channels. So how can you element wise multiply them?
@rupert_ai2 жыл бұрын
@@logon2778 The section you mention discusses what must be done to the copy of the identity along the residual connection BEFORE you do element wise addition with the output from the resnet block. The process follows this logic: 1) save a copy of your input as the identity (e.g. 3 channels 64x64) 2) run your input through the main block this outputs a new tensor. This new tensor can have the same dimensions or it can have different dimensions (e.g. 6 channels 32x32). If it has different dimensions proceed to step 3) if it has the same time dimensions proceed to step 4). 3) take the copy of the identity in step 1) and apply 6 1x1 convolution kernels with stride 2 to it, this outputs 6 channels 32x32. 4) do element wise addition with your identity and your resnet block output. Note that if the dimensions changed, then you also changed your identity with step 3 to ensure you can do element wise addition. Element wise addition is simply adding each corresponding value with one another. E.g. the value in the top left corner of channel 2 for the first tensor is added to the value in the top left corner of channel 2 for the second tensor. You don't do element wise multiplication as you mention. Hope that clears it up!
@agenticmark10 ай бұрын
lol, I have fought that exact trendline so many times in ML :D Great humor. Great video work.
@samruddhisaoji71952 ай бұрын
9:02 i have doubt: how are the number of features in the LHS and RHS matching? LHS = w *h*c. RHS = (w/2)*(h/2)*(2*c). Thus RHS = 2*LHS
@Bryanvas252 ай бұрын
actually RHS = (1/2) * LHS, and yes, i also dont understand that part
@samruddhisaoji71952 ай бұрын
@@Bryanvas25 yes youre right about RHS = LHS/2. My bad!
@Omsip123 Жыл бұрын
I pushed it to exactly 1k likes, cause it deserves it ... and many more
@heathernapthine8775Ай бұрын
is the zero padding only done for layers which increase the size or is it done for down sampling layers too? intuitively if we zero padded the output in order to add a larger inout this doesn't seem like a downsampled layer?
@egesener19322 жыл бұрын
Everyone say ResNet solves vanishing/gradient problem but dont we already use ReLu function istead of sigmoid to solve it ? Also part 4.1 of article say plain counterpart with batch normalization doesn't causes vanishing problem but still causes more error rate when layers are increased 18 to 34. Can you explain it ?
@rupert_ai Жыл бұрын
1) there are multiple things that help solved the vanishing/exploding gradient problem, residual connections in general help massively with the learning process - as they ground the learning process around the desired result. e.g. you learn the difference between what you have and the correct result (the residual). 2) batch normalisation also helps with the vanishing/exploding gradient problem as again this allows features of each layer to have a normalised distribution that is scaled so it won't explode/vanish, etc. 3) your point around 4.1 they are saying that networks without residual connections (plain) have worse error when they have more layers (18 vs 34) for the exact reason I stated in part 1) of this answer, it is a difficult optimisation problem for the network to solve without the residual, when you add residuals you aren't penalising adding more layers to your network. Hope that makes sense!
@firefistace8569 Жыл бұрын
What is the residual in the image classification task?
@rupert_ai Жыл бұрын
Good question! It can be tricky to understand what the residual might be in the image classification task as it is more abstract when compared to the super resolution task, essentially, you use the feature maps from previous layers and learn the 'residual' between previous layers and the current layer - in essence this makes a very powerful block of computation that is grounded by the skip connections. This makes image classification easier as the network itself can process the image in a more comprehensive way. There really isn't any 'end-to-end' residual in image classification like there is with super resolution, I hope that answers your question
@firefistace8569 Жыл бұрын
@@rupert_ai Thanks!
@TheBlendedTech2 жыл бұрын
Thank you, this was well put together and very useful.
@rupert_ai2 жыл бұрын
Thanks!
@devanshsharma5159 Жыл бұрын
love the animation! Thanks for the clean and clear explanation!
@ciciy-wm5ik4 ай бұрын
at time 2:09 image 1- image2 = image 3 does not imply image1 + image 3 = image 2
@gunasekhar84403 ай бұрын
I mean we need to assume like that. Because in the paper they said h(x) be our desired mapping, x was input and f(x) would be some transformation. So f(x)=h(x)-x
@ShahidulAbir Жыл бұрын
Amazing explanation. Thank you for the video
@rupert_ai Жыл бұрын
Thank you Shahidul!
@xagent6327Ай бұрын
The solution to pad with zeros fixed the number of channels, but how did they then reduce the dimensions from 64x64 to 32x32?
@mohamed_akram1 Жыл бұрын
Nice video. Did you use Manim?
@rupert_ai Жыл бұрын
Hey Mohamed! Yes I did - my first video using manim! I hope to use it for some more complex things in the future :)
@louisdante84574 ай бұрын
7:53 Why is there a need to preserve the time complexity per layer?
@samruddhisaoji71952 ай бұрын
The number of elements in the input and output of a convolution layer should remain same, as later we will be performing an element-wise operation
@wege84096 ай бұрын
6:38 this is the part that really made me understand, thank you
@januarchristie615 Жыл бұрын
Hello, I apologize for my question, but I still don't quite understand why learning residuals can improve model predictions better? Thank you
@giovannyencinia9239 Жыл бұрын
I think, that is because this arquitecture can apply the identity function, first you have an input a^[l] and this pass forward the convolutions, batch normalization, activation funciton etc. and finally there is an output z^[l+2] (this output in the hidden layers has some parameters theta), and here is where the architecture add the a^[l] (ReLU(z^[l+2] + a^[l])), then in the back propagation step there is the posibility that the optimal parameters in z^[l+2] are 0, so the result is a^[l] this is because you apply a ReLU activation funtion, and this means that the intermediate layers wont be use. If you build a big and deeper NN this arquitecture can skip the layers(blocks of residuals) that does not help to reach the local optima.
@panjak323 Жыл бұрын
Idk why, but simply adding bicubicly upscaled image to output of CNN with pixel shuffling layer achieves much better results than having any amount residual blocks. Also it's much faster.
@謝其宏-p3z9 ай бұрын
It's amazing. Both resnet and this explaination.
@RadenRenggala Жыл бұрын
Hello, is the term "residu" referring to the convolutional feature maps from the previous layer that are then added to the feature maps output in the current layer?
@rupert_ai Жыл бұрын
The residual is actually the 'difference' between two features! In ResNets the feature maps from previous layers are added onto the current features maps, this means the current layer can learn the 'residual' function where it only needs to learn the difference
@RadenRenggala Жыл бұрын
@@rupert_ai So, residual is the difference between the current feature map and the previous feature map, and to obtain the residual, we need to perform an addition between those feature maps?.. Thank you.
@djauschan10 ай бұрын
Amazing explanation of this concept. Thank you very much
@datascience87752 жыл бұрын
Good content, just subscribed, keep sharing.
@rupert_ai2 жыл бұрын
Thanks, will do :)
@doudouban Жыл бұрын
2:06, the equation shift seem problematic.
@ColorfullHD Жыл бұрын
Hey, its 3blue1brown All jokes aside, great explanation, cheers
@rupert_ai Жыл бұрын
Hahaha well it is using his animation library ;) all hail grant sanderson
@dapr98 Жыл бұрын
Great video! Thanks. Would you recommend ResNet over CNN for music classification?
@nxtboyIII Жыл бұрын
Great video well explained thanks!
@nxtboyIII Жыл бұрын
I liked the visuals too
@rupert_ai Жыл бұрын
@@nxtboyIII Thank you Lucas 🙏
@christianondo963710 ай бұрын
great video, super intuitive explanation
@swedenontwowheels Жыл бұрын
Great content! Thank you for the effort!
@rupert_ai Жыл бұрын
Thanks Terence! :)
@MuhammadHamza-o3r4 ай бұрын
Very well explained
@the_random_noob98609 ай бұрын
Lifesaver! Also, for classification, it's inevitable that the dimensions go down and channels go up across the network. But the 1 x 1 convolution on the input features to 'match the dimensions' kinda loses the original purpose i.e to retain/boost the original signal.. In a sense it's another conv operation that is no longer similar to the input (I mean it could be similar but certainly as not as the input features themselves). It's just the original idea was to have the same input features so that we could zero out the weights if no transformation is needed. Atleast they're not as different from how the input features as transformed across the usual conv block(conv, pooling, batch norm and activation). Let me know if I am missing anything
@SakshamGupta-em2zw6 ай бұрын
Love the Music
@SakshamGupta-em2zw6 ай бұрын
And love that you used manim, keep it up
@krishnashah665410 ай бұрын
i'd just say thank you so much man!
@rezajavadzadeh5597 Жыл бұрын
thank you so much
@rupert_ai Жыл бұрын
Thanks Reza!
@jamesnorton49532 жыл бұрын
🔥
@enzogurijala54642 жыл бұрын
great video
@JoydurnYup2 жыл бұрын
great vid sir
@rupert_ai2 жыл бұрын
Thanks Joydurn! :)
@moosemorse1 Жыл бұрын
Subscribed. Thank you so much
@tanmayvaity94372 жыл бұрын
nice video
@rupert_ai2 жыл бұрын
Thanks Tanmay!
@carolinavillamizar795 Жыл бұрын
Thanks!!
@gusromul33568 ай бұрын
cool info, thanks rupert ai
@BABA-oi2cl9 ай бұрын
Thanks a lot ❤
@lifeisbeautifu19 ай бұрын
that was good!
@cocgamingstar6990 Жыл бұрын
Very bad
@rupert_ai Жыл бұрын
Feel free to leave some constructive feedback :) Or did you mean to write badass? if so thanks!