ResNet (actually) explained in under 10 minutes

  Рет қаралды 75,339

rupert ai

rupert ai

Жыл бұрын

Want an intuitive and detailed explanation of Residual Networks? Look no further! This video is an animated guide of the paper 'Deep Residual Learning for Image Recognition' created using Manim.
Sources / credits
Resnet Paper: arxiv.org/abs/1512.03385
Manim animation library: www.manim.community/
Pytorch ResNet implementation: github.com/pytorch/vision/blo...

Пікірлер: 68
@nialperry9563
@nialperry9563 9 ай бұрын
Cracking video, Rupert. Well animated and explained. I am already satisfied with my understanding of ResNets after this.
@sarthakpatwari7988
@sarthakpatwari7988 9 ай бұрын
Mark my words, if he become consitent, this channel will become one of the next big thing in AI
@poopenfarten4222
@poopenfarten4222 Жыл бұрын
legit one of the best explanations i found
@rupert_ai
@rupert_ai Жыл бұрын
Thanks myyy dude!
@devanshsharma5159
@devanshsharma5159 8 ай бұрын
love the animation! Thanks for the clean and clear explanation!
@Cypher195
@Cypher195 Жыл бұрын
Thanks. Been out of touch with AI for far too long so this summary is very helpful.
@rupert_ai
@rupert_ai Жыл бұрын
Thanks Aziz, good luck with getting back in touch with AI
@djauschan
@djauschan 3 ай бұрын
Amazing explanation of this concept. Thank you very much
@TheBlendedTech
@TheBlendedTech Жыл бұрын
Thank you, this was well put together and very useful.
@rupert_ai
@rupert_ai Жыл бұрын
Thanks!
@user-mg3ey1uq8f
@user-mg3ey1uq8f Ай бұрын
It's amazing. Both resnet and this explaination.
@christianondo9637
@christianondo9637 3 ай бұрын
great video, super intuitive explanation
@ShahidulAbir
@ShahidulAbir Жыл бұрын
Amazing explanation. Thank you for the video
@rupert_ai
@rupert_ai Жыл бұрын
Thank you Shahidul!
@terencechengde
@terencechengde Жыл бұрын
Great content! Thank you for the effort!
@rupert_ai
@rupert_ai Жыл бұрын
Thanks Terence! :)
@agenticmark
@agenticmark 3 ай бұрын
lol, I have fought that exact trendline so many times in ML :D Great humor. Great video work.
@sergioorozco7331
@sergioorozco7331 3 ай бұрын
Is the right hand side of the addition supposed to have height and width dimension of 32x32 at 7:08? I think there is a small typo in the visual.
@mahneh7121
@mahneh7121 9 ай бұрын
great video man
@datascience8775
@datascience8775 Жыл бұрын
Good content, just subscribed, keep sharing.
@rupert_ai
@rupert_ai Жыл бұрын
Thanks, will do :)
@moosemorse1
@moosemorse1 9 ай бұрын
Subscribed. Thank you so much
@Omsip123
@Omsip123 4 ай бұрын
I pushed it to exactly 1k likes, cause it deserves it ... and many more
@RadenRenggala
@RadenRenggala 11 ай бұрын
Hello, is the term "residu" referring to the convolutional feature maps from the previous layer that are then added to the feature maps output in the current layer?
@rupert_ai
@rupert_ai 10 ай бұрын
The residual is actually the 'difference' between two features! In ResNets the feature maps from previous layers are added onto the current features maps, this means the current layer can learn the 'residual' function where it only needs to learn the difference
@RadenRenggala
@RadenRenggala 10 ай бұрын
@@rupert_ai So, residual is the difference between the current feature map and the previous feature map, and to obtain the residual, we need to perform an addition between those feature maps?.. Thank you.
@nxtboyIII
@nxtboyIII Жыл бұрын
Great video well explained thanks!
@nxtboyIII
@nxtboyIII Жыл бұрын
I liked the visuals too
@rupert_ai
@rupert_ai Жыл бұрын
@@nxtboyIII Thank you Lucas 🙏
@the_random_noob9860
@the_random_noob9860 2 ай бұрын
Lifesaver! Also, for classification, it's inevitable that the dimensions go down and channels go up across the network. But the 1 x 1 convolution on the input features to 'match the dimensions' kinda loses the original purpose i.e to retain/boost the original signal.. In a sense it's another conv operation that is no longer similar to the input (I mean it could be similar but certainly as not as the input features themselves). It's just the original idea was to have the same input features so that we could zero out the weights if no transformation is needed. Atleast they're not as different from how the input features as transformed across the usual conv block(conv, pooling, batch norm and activation). Let me know if I am missing anything
@krishnashah6654
@krishnashah6654 2 ай бұрын
i'd just say thank you so much man!
@logon2778
@logon2778 Жыл бұрын
You say that the identity function is added elementwise at the end of the block. So say I have an identity [1,2] and the result of the block is [3,4]. So would the output of the layer be [4,6]? So its not a concatenation of the identity function which would be [1,2,3,4], correct? You basically ensure the identity function is the same dimensionality as the output of the block then add them element-wise.
@rupert_ai
@rupert_ai Жыл бұрын
Hey Logon, great question, you are totally correct the output from your example (identity [1,2] and block output [3, 4]) would be [4, 6] e.g. you simply add the values based on their twin positions. You don't concatenate! Yes, the last section on dimension matching covers the scenario when the dimensions don't match (and therefore you can't add them element-wise until you modify them).
@logon2778
@logon2778 Жыл бұрын
@@rupert_ai So in the case of the 1x1 convolutions where there are 3 input channels and 6 output channels of equal size... How are they added element-wise? Are the input features add elementwise twice? Once for each pair of 3 output channels? Or does it only add element-wise to the first 3 output channels and leaves the other 3 untouched.
@rupert_ai
@rupert_ai Жыл бұрын
Hi @@logon2778, as is standard with convolutional neural networks each 1x1 convolution takes contributions from all channels (in this case across all 3 channels of the input). So in order to have 6 output channels you have 6 lots of 1x1 convolutions that take contributions from all 3 channels. In order to half the size you skip every other pixel (e.g. a stride of 2). That is simply what is used for the original paper, obviously other approaches work too. Now you have a 6 channel output which is half the height and width which matches the network dimensions and you can do element wise addition as usual. Have a watch of the video again and look up convolutional basics - I have a video on this actually - hopefully that might shed some light on things kzbin.info/www/bejne/bIezap5ojLJpoZI
@logon2778
@logon2778 Жыл бұрын
@@rupert_ai I understand how convolution works for the most part. 8:45 you show that there are 6 output channels of equal size to the input. But how can you element wise add 3 input channels to 6 output channels of equal size? In my mind you have double the dimensions. You have 6, 64x64 output channels. But you have 3, 64x64 input channels. So how can you element wise multiply them?
@rupert_ai
@rupert_ai Жыл бұрын
@@logon2778 The section you mention discusses what must be done to the copy of the identity along the residual connection BEFORE you do element wise addition with the output from the resnet block. The process follows this logic: 1) save a copy of your input as the identity (e.g. 3 channels 64x64) 2) run your input through the main block this outputs a new tensor. This new tensor can have the same dimensions or it can have different dimensions (e.g. 6 channels 32x32). If it has different dimensions proceed to step 3) if it has the same time dimensions proceed to step 4). 3) take the copy of the identity in step 1) and apply 6 1x1 convolution kernels with stride 2 to it, this outputs 6 channels 32x32. 4) do element wise addition with your identity and your resnet block output. Note that if the dimensions changed, then you also changed your identity with step 3 to ensure you can do element wise addition. Element wise addition is simply adding each corresponding value with one another. E.g. the value in the top left corner of channel 2 for the first tensor is added to the value in the top left corner of channel 2 for the second tensor. You don't do element wise multiplication as you mention. Hope that clears it up!
@enzogurijala5464
@enzogurijala5464 Жыл бұрын
great video
@egesener1932
@egesener1932 Жыл бұрын
Everyone say ResNet solves vanishing/gradient problem but dont we already use ReLu function istead of sigmoid to solve it ? Also part 4.1 of article say plain counterpart with batch normalization doesn't causes vanishing problem but still causes more error rate when layers are increased 18 to 34. Can you explain it ?
@rupert_ai
@rupert_ai Жыл бұрын
1) there are multiple things that help solved the vanishing/exploding gradient problem, residual connections in general help massively with the learning process - as they ground the learning process around the desired result. e.g. you learn the difference between what you have and the correct result (the residual). 2) batch normalisation also helps with the vanishing/exploding gradient problem as again this allows features of each layer to have a normalised distribution that is scaled so it won't explode/vanish, etc. 3) your point around 4.1 they are saying that networks without residual connections (plain) have worse error when they have more layers (18 vs 34) for the exact reason I stated in part 1) of this answer, it is a difficult optimisation problem for the network to solve without the residual, when you add residuals you aren't penalising adding more layers to your network. Hope that makes sense!
@BABA-oi2cl
@BABA-oi2cl 2 ай бұрын
Thanks a lot ❤
@mohamed_akram1
@mohamed_akram1 Жыл бұрын
Nice video. Did you use Manim?
@rupert_ai
@rupert_ai Жыл бұрын
Hey Mohamed! Yes I did - my first video using manim! I hope to use it for some more complex things in the future :)
@Antagon666
@Antagon666 8 ай бұрын
Idk why, but simply adding bicubicly upscaled image to output of CNN with pixel shuffling layer achieves much better results than having any amount residual blocks. Also it's much faster.
@JoydurnYup
@JoydurnYup Жыл бұрын
great vid sir
@rupert_ai
@rupert_ai Жыл бұрын
Thanks Joydurn! :)
@gusromul3356
@gusromul3356 Ай бұрын
cool info, thanks rupert ai
@lifeisbeautifu1
@lifeisbeautifu1 2 ай бұрын
that was good!
@rezajavadzadeh5597
@rezajavadzadeh5597 Жыл бұрын
thank you so much
@rupert_ai
@rupert_ai Жыл бұрын
Thanks Reza!
@januarchristie615
@januarchristie615 10 ай бұрын
Hello, I apologize for my question, but I still don't quite understand why learning residuals can improve model predictions better? Thank you
@giovannyencinia9239
@giovannyencinia9239 10 ай бұрын
I think, that is because this arquitecture can apply the identity function, first you have an input a^[l] and this pass forward the convolutions, batch normalization, activation funciton etc. and finally there is an output z^[l+2] (this output in the hidden layers has some parameters theta), and here is where the architecture add the a^[l] (ReLU(z^[l+2] + a^[l])), then in the back propagation step there is the posibility that the optimal parameters in z^[l+2] are 0, so the result is a^[l] this is because you apply a ReLU activation funtion, and this means that the intermediate layers wont be use. If you build a big and deeper NN this arquitecture can skip the layers(blocks of residuals) that does not help to reach the local optima.
@firefistace8569
@firefistace8569 11 ай бұрын
What is the residual in the image classification task?
@rupert_ai
@rupert_ai 10 ай бұрын
Good question! It can be tricky to understand what the residual might be in the image classification task as it is more abstract when compared to the super resolution task, essentially, you use the feature maps from previous layers and learn the 'residual' between previous layers and the current layer - in essence this makes a very powerful block of computation that is grounded by the skip connections. This makes image classification easier as the network itself can process the image in a more comprehensive way. There really isn't any 'end-to-end' residual in image classification like there is with super resolution, I hope that answers your question
@firefistace8569
@firefistace8569 10 ай бұрын
@@rupert_ai Thanks!
@carolinavillamizar795
@carolinavillamizar795 7 ай бұрын
Thanks!!
@jamesnorton4953
@jamesnorton4953 Жыл бұрын
🔥
@dapr98
@dapr98 5 ай бұрын
Great video! Thanks. Would you recommend ResNet over CNN for music classification?
@ColorfullHD
@ColorfullHD 9 ай бұрын
Hey, its 3blue1brown All jokes aside, great explanation, cheers
@rupert_ai
@rupert_ai 9 ай бұрын
Hahaha well it is using his animation library ;) all hail grant sanderson
@tanmayvaity9437
@tanmayvaity9437 Жыл бұрын
nice video
@rupert_ai
@rupert_ai Жыл бұрын
Thanks Tanmay!
@doudouban
@doudouban 5 ай бұрын
2:06, the equation shift seem problematic.
@cocgamingstar6990
@cocgamingstar6990 Жыл бұрын
Very bad
@rupert_ai
@rupert_ai Жыл бұрын
Feel free to leave some constructive feedback :) Or did you mean to write badass? if so thanks!
Residual Networks and Skip Connections (DL 15)
17:00
Professor Bryce
Рет қаралды 32 М.
ResNet Explained Step by Step( Residual Networks)
34:31
Code With Aarohi
Рет қаралды 99 М.
How many pencils can hold me up?
00:40
A4
Рет қаралды 13 МЛН
格斗裁判暴力执法!#fighting #shorts
00:15
武林之巅
Рет қаралды 60 МЛН
КАРМАНЧИК 2 СЕЗОН 5 СЕРИЯ
27:21
Inter Production
Рет қаралды 427 М.
TECtalks 25th May 2024
The Revival Fellowship
Рет қаралды 34
ResNet Deep Neural Network Architecture Explained
30:23
Machine Learning Explained
Рет қаралды 965
Epoch, Batch, Batch Size, & Iterations
3:29
DeepNeuron
Рет қаралды 66 М.
Stable Diffusion in Code (AI Image Generation) - Computerphile
16:56
Computerphile
Рет қаралды 283 М.
ESRS Materiality - the Matrix problem
30:52
Noflyzone
Рет қаралды 497
Watching Neural Networks Learn
25:28
Emergent Garden
Рет қаралды 1,1 МЛН
This Black Hole Could be Bigger Than The Universe
11:44
Kurzgesagt – In a Nutshell
Рет қаралды 3 МЛН
Готовый миниПК от Intel (но от китайцев)
36:25
Ремонтяш
Рет қаралды 444 М.
ПРОБЛЕМА МЕХАНИЧЕСКИХ КЛАВИАТУР!🤬
0:59
Корнеич
Рет қаралды 1,3 МЛН
Vortex Cannon vs Drone
20:44
Mark Rober
Рет қаралды 15 МЛН