All Convolution Animations Are Wrong (Neural Networks)

  Рет қаралды 50,548

Animated AI

Animated AI

Жыл бұрын

Patreon: / animated_ai
All the neural network 2d convolution animations you've seen are wrong.
Check out my animations: animatedai.github.io/

Пікірлер: 142
@randyekrer431
@randyekrer431 Жыл бұрын
you should've started with the typical RGB 3 layer input image, and animate convolutions on that; that's where most people start to get lost as to how the weights match with inputs, translating from the 2D mental model to 3D.
@amitamola2014
@amitamola2014 7 ай бұрын
Bang on right. So he made a video of how others weren't doing it right but then didn't start from the start itself to explain what actually goes on correctly. I mean, what good is this new one then :/
@avidrucker
@avidrucker Жыл бұрын
A major thing that feels missing to me in the animations is clear textual labeling. It's fine that you label them out loud, and then, also, it would be more accessible for folks with hearing challenges or cognitive challenges. My crit aside, this animation is lovely, and I'm very impressed with what you've done. You've earned yourself a new subscriber :)
@spider853
@spider853 Жыл бұрын
Oh man, I'm so glad someone took a direct approach to this problem, when I was learning I was so confused by all these animations and explanations in 2D, and then seeing resulting tensor shapes got me super confused, where the depth go and where did it appear? Thanks for bringing this video to the world!
@thomasprimidis9360
@thomasprimidis9360 Жыл бұрын
All these wrong illustration and animation trends have been among the many problems where you would think "why the hell have we been doing this all wrong, all the time, everywhere?". Finally, someone came and did the obvious. Thank you!
@cfranc1s
@cfranc1s 11 ай бұрын
They are not wrong. They just show a special case. They use the special case because the focus is on things like stride, dilation, padding etc. It's good to make the 3D tensor animations, but don't call the existing ones wrong. I think I would have still found it easier to understand the existing ones first and then move on to the 3D animations.
@MrShadowjockey
@MrShadowjockey Жыл бұрын
Thank you for this, recently I tried to explain why the input and output shapes behave the way they do, and what gets combined with what. These animations will make it sooo much easier!!
@PeppeMarino
@PeppeMarino Жыл бұрын
The use of all these misleading animations is the primary cause of misconception about convolutional neural networks; you have finally provided a good visualization. I am happy to share this content with my colleagues.
@peabrane8067
@peabrane8067 Жыл бұрын
The animation is just meant as an abstraction of the spatial convolution operation itself. A spatial CNN layer consists of spatial convolution operations across multiple input and output channels (which is what you are referring to)
@alessandropolidori9895
@alessandropolidori9895 Жыл бұрын
Love it. I always thought there were no accurate visualization on the internet too. Good job
@schorsch7400
@schorsch7400 Ай бұрын
Wow, really great, thanks for your work! I was struggling with the very problem you mentioned in the video - bringing together the 2D conv visualizations with the multi-channel 3x3 convolutions that are common in modern CNNs. Thanks to your work, I now understood it.
@Leibniz_28
@Leibniz_28 Жыл бұрын
Awesome . Finally a good representation of this computations. Thanks for your hard work!!!
@aintgonhappen
@aintgonhappen Жыл бұрын
This is some amazing content. Thank you, buddy!
@rezhaadriantanuharja3389
@rezhaadriantanuharja3389 Жыл бұрын
Premise 1: All convolution animations are wrong Premise 2: This is a convolution animation Conclusion: this is wrong
@Q_20
@Q_20 Жыл бұрын
oh shit
@zyansheep
@zyansheep Жыл бұрын
0:02, "all convolution animations you've seen _up to this point_ are wrong"
@LimitedWard
@LimitedWard Жыл бұрын
Maybe it was a proof by contradiction
@Firestorm-tq7fy
@Firestorm-tq7fy Ай бұрын
No?
@krischalkhanal9591
@krischalkhanal9591 Ай бұрын
Someone just read discrete algebra. (Kudos!)
@sscswist
@sscswist Жыл бұрын
Finnaly. You are the best. When i was learning this, i was always looking at all those original animations and i was always so confused ...
@twbjr2
@twbjr2 5 ай бұрын
Thank you for making this video! I have been trying to visualize this using all the horrible diagrams from papers. I immediately understood what they were trying to convey after watching your video!
@logon2778
@logon2778 Жыл бұрын
Forget the animation itself (even though its great). I just appreciate a non-moving camera. It bothers me so much when people spin the camera around a nice animation in a circle. Makes me feel like I am on a carnival ride.
@alimurreza
@alimurreza 6 ай бұрын
Excellent visualization! I will definitely show these visualizations to my students in Machine Learning course. They will love it.
@bala.dhinesh
@bala.dhinesh Жыл бұрын
This is what I expected for a long time. This explains everything clearing. Thanks for posting this.
@jazzvids
@jazzvids 10 ай бұрын
Thank you so much for this! worth mentioning that the animation has a stride of 2
@alexgamingtv7118
@alexgamingtv7118 Жыл бұрын
Great work, such a animation for grouped convolutiion would be nice too
@ayamaitham8430
@ayamaitham8430 Жыл бұрын
Such a hard work ! Thank you so much
@eliagonzalezmolina366
@eliagonzalezmolina366 2 ай бұрын
I was looking for something like this to dispel my doubts and it worked! thanks :) (I think you are right, common animations are super misleading)
@hos42
@hos42 Жыл бұрын
Thank you for putting this out!
@momolight2468
@momolight2468 Жыл бұрын
Best video about neural convolution and filters!? YES!!! Thank you so much!
@kartikpodugu
@kartikpodugu 10 ай бұрын
amazing. you have cleared all my doubts in single shot
@jkkuusis
@jkkuusis Жыл бұрын
Thanks for the great video! I'm at the moment trying to learn these concepts, so I might be suggesting something that is incorrect. Here it goes anyway: an example with input of depth 3 might be good to better understand this by thinking of rgb image data. It also seems to be the (special) case in your animatioms, that the input depth is the same as the number filters leading to same depth in both input and output. That is not always the case, if I've understood this right.
@animatedai
@animatedai Жыл бұрын
I intentionally avoided using an input depth of 3, because that's a special case. Most convolutional layers in a CNN will have an input depth much higher than 3. It's better to think of the convolutional layer as "feature map in, feature map out" rather than "image in, feature map out". By that same logic, I should have made the input depth different than the output depth, because that's also a special case like you said. I had thought about this at one point, but sadly it didn't make it into the final animation. I'll probably fix that on GitHub in the future.
@naasvanrooyen2894
@naasvanrooyen2894 Жыл бұрын
Thanks so much for this. Also really struggled to get proper animations. Would have liked to see how this looks in the actual neural network. i.e. how the filter can be visualized as the weights. Or show who the filter parameters are trained. Would greatly appreciate a video of GAN and LSTM. The LSTM diagrams are terrible. Really struggled to visualized how they connect to the overall network
@rukshanj.senanayaka1467
@rukshanj.senanayaka1467 5 ай бұрын
This video feels like an iPhone moment - a video I didn't know I needed until I saw it. Thanks a lot!
@simonpikalov6136
@simonpikalov6136 Жыл бұрын
Finally a good animation video !
@laplaceha6700
@laplaceha6700 Жыл бұрын
this conv2d animation you do is right, thanks alot
@pere_gin
@pere_gin Жыл бұрын
"a 2D convolution actually takes in a 3D tensor as input and has a 3D convolution as output", well, it depends right? If you have a single channel/grayscale image then the input is in fact a 2D tensor, and each feature outputs a 2D tensor that is joined with all others in the feature map. So if you have a grayscale image with a single feature, the animations would in fact be correct. I think the animations are perfectly fine, as they simplify a concept to it's most basic form for easy understanding. But it is true that after you understand the basic concept, a 3D - 3D representation is also nice to understand more common and complex examples. Disclaimer that I could be wrong as I am by no means an expert, but this is my take from my current understanding of convolutions :)
@animatedai
@animatedai Жыл бұрын
I'm not aware of a library where it depends, i.e., where the depth dimension is optional. PyTorch's Conv2D will accept a 3D tensor or a 4D tensor (batched 3D tensors). The functional interface only accepts 4D. Kera's Conv2D layer will only accept a 4D tensor. TensorFlow's conv2d operation with accept anything with at least 4 dimensions (the last three are treated as the height, width, and channels and all the others before that are treated as batch dimensions. NVIDIA's cudnn implementation of 2D convolution takes a 4D tensor. And in all of these cases the weights will be in 4D which can be thought of as a 3D weight for each filter corresponding to the size of the 3D patch that the filters operate on in the input. So as far as the industry standard, there just isn't a 2D convolution where you don't have a depth dimension in the input. Your grayscale case will only have a depth of 1 but will in fact be a 3D tensor. If you're able to find a mainstream library where the depth dimension in the input is optional, let me know.
@TheBlindfischLP
@TheBlindfischLP Жыл бұрын
@@animatedai Still the concept doesn't need the 3d implementation, as the different features are worked on independently anyway. I definitely think it's a stretch and comes off really condescending, to call all other animations wrong.
@kshamanthkumar6042
@kshamanthkumar6042 10 ай бұрын
Amazing work 😍
@pew_pew_pew4377
@pew_pew_pew4377 Жыл бұрын
well, not speak for the existing animations/figs, i won't say they are wrong, they have some issues, but essentially they are correct. When talking about 2D convonlution, we should know the input and output are 3D as input is a picture and output is also a picture/feature map.
@slime67
@slime67 Жыл бұрын
Great work!
@menkiguo7805
@menkiguo7805 4 ай бұрын
This question has been confusing Mr for a long time thank you
@GamesEngineer
@GamesEngineer Жыл бұрын
Thank you! Great animation. However, I do have a technical nit pick. Your animation shows an operation known as cross-correlation, which is related to convolution, but it is mirrored. "Convolutional neural networks" use cross-correlations in the feed-forward phase and convolutions in the backpropagation phase.
@ayushpandey7190
@ayushpandey7190 Жыл бұрын
Beautiful 💙
@bluvalor7443
@bluvalor7443 Жыл бұрын
Lol, I literally learned this the hard way just about 2 months ago, when the shape for my 2d convolution required 3 parameters, and this made me super confused :,)
@kristianmlbachlian5042
@kristianmlbachlian5042 Жыл бұрын
Discovered your channel just now. If only I had these resources at my disposal when learning about these topics myself. Keep it up, you will save many careers!
@PurnenduPrabhat
@PurnenduPrabhat 3 күн бұрын
Good job
@xuerobert5336
@xuerobert5336 6 ай бұрын
brilliant !
@bediosoro7786
@bediosoro7786 Жыл бұрын
The first animation you say is wring shows the contribution of one filter operations which is quite accurate. is you considered the number of input channels one and out put channels 1 that is the right figure for the whole operation. the conv2d operation are all element-wise matrices multiplication with shifting windows. the 3D animation you did look great but lack of that notion . that is my option. i stick with the 2D.
@almightysapling
@almightysapling Жыл бұрын
Thank you! He never explains why his animation is "correct" and in my opinion it simply isn't. 2D convolutions act 2D on 2D data. The fact that we have *multiple data* and *multiple filters* leads to us frequently blocking things in 3D, but the convolution itself is still fundamentally 2D. And if someone doesn't understand that, it's not because of bad animations.
@orenAm
@orenAm 7 ай бұрын
Your videos are very cool! I wonder if you thought about how to present Conv3d, it is a challenge when considering more than one channel
@rukshanj.senanayaka1467
@rukshanj.senanayaka1467 5 ай бұрын
Super helpful, any plans to make this open source or make interactable cases where we can change the stride and see the variation?
@animatedai
@animatedai 5 ай бұрын
Thank you! On my GitHub page (animatedai.github.io), you can see a few different variations, and I'm also working on an interactive webgl app where you can pick the parameters.
@FelipeGustavoSilvaTeodoro
@FelipeGustavoSilvaTeodoro Жыл бұрын
Amazing!
@user-pc7ed3oi3y
@user-pc7ed3oi3y Жыл бұрын
Adding a bias term added after convolutions would be a full process representation. Anyway, great visualization!
@Shad0wWarr10r
@Shad0wWarr10r Жыл бұрын
With those sahpes og input and filter, is there even any reason to have them 3d over 2d? I get the output as you layer filters, but if the input and filter is just the same thing all thru representing them by singular cubes is not wrong
@guyindisguise
@guyindisguise Жыл бұрын
Nice animation, are you planning on making animations for Transformers as well?
@mariogonzalezotero
@mariogonzalezotero Жыл бұрын
amazing!
@menkiguo7805
@menkiguo7805 4 ай бұрын
amazing
@gabrielchan3255
@gabrielchan3255 Жыл бұрын
this is great
@andreas.karatzas
@andreas.karatzas Жыл бұрын
@animatedai How did you learn blender? Which were your sources?
@koktszfung
@koktszfung Жыл бұрын
Convolution is not only used in neural network
@dereklust3480
@dereklust3480 Жыл бұрын
Great video, but you didn’t explain why the input is a 3d tensor. If we are convoluting a 2d image, where does the 3d tensor come from?
@animatedai
@animatedai Жыл бұрын
The short answer is that both the input and output are feature maps. Check out this video explaining it: kzbin.info/www/bejne/m367pp5vbLOYias
@dereklust3480
@dereklust3480 Жыл бұрын
@@animatedai A better way to ask my question is this - How is the first feature map created in a convolutional network? Surely the time time the 2d image is convoluted we have a 2d tensor and 2d filters, just like the typical animation, right? I get that the output of this convolution will be a 3d feature map, and thus all further convolution will look like your animation that is 3d to 3d.
@dereklust3480
@dereklust3480 Жыл бұрын
the *first* time
@animatedai
@animatedai Жыл бұрын
A 2D image is represented as a 3D feature map with 3 features: red, green, and blue. So even the first convolution has a 3D input.
@simonpointner7545
@simonpointner7545 Жыл бұрын
@@animatedai You may take a grayscale image as an input because for many cases it is sufficient, and for learning it´s a good simplification, I would not consider this being a wrong animation, it just assumes you have a grayscale image as an input. I get the point of your video but the title is clickbait.
@mizoru_
@mizoru_ Жыл бұрын
Nice!
@curious_one1156
@curious_one1156 Жыл бұрын
Bravo !
@davidebic
@davidebic Жыл бұрын
Is there a way to access your course online? I'm really interested in this subject!
@animatedai
@animatedai Жыл бұрын
The course is a work-in-progress. You can see the videos that are completed so far in this playlist: kzbin.info/www/bejne/m367pp5vbLOYias
@connorvaughan356
@connorvaughan356 Жыл бұрын
Why is the does the input shape have have so many layers in this animation? Wouldn't it have a shape equal to the image shape, then 3 layers, 1 for rgb respectively?
@animatedai
@animatedai Жыл бұрын
I'm glad you asked. In general, convolution takes a feature map as input, which can have any number of features (depth). An image is a special kind of feature map with 3 features: red, green, and blue. However, this typically only applies for the first layer of convolution in a neural network and the other convolutional layers will have more features for input, e.g., 32, 64, 128, ... 1024, 2048. So to better represent the general case and to encourage viewers to consider more than just the special case of an image, I chose to use 8 for the animations. Although 3 would also be perfectly valid.
@kasuha
@kasuha Жыл бұрын
I find these "new and correct" animations confusing, I have no idea what's happening there. I assume this is just "the correct way to display convolution" for AI models? As an old school person who used convolutions mainly for 2D image processing (blur/edge detection) I don't see anything wrong about the old animations, that's exactly what we used to do there.
@user-ld8lc4ex4m
@user-ld8lc4ex4m 10 ай бұрын
tysm
@Shamisen100
@Shamisen100 Жыл бұрын
Any plans to add your animations to Wikimedia commons? :)
@macewindont9922
@macewindont9922 7 күн бұрын
sick
@RojinaPanta1
@RojinaPanta1 Жыл бұрын
I really appreciate the effort and is good one, but I would still go with the 2D one this is way too much jittery for me with so many things happening at one and choice of colors.
@JonDornaletetxe
@JonDornaletetxe Жыл бұрын
🔥
@jazzvids
@jazzvids 10 ай бұрын
3:30 - the final animation
@ChaoticNeutralMatt
@ChaoticNeutralMatt 6 ай бұрын
Oh cool, didn't know Blender had all that.
@crhistianribeiro6897
@crhistianribeiro6897 Жыл бұрын
👏👏👏
@andreansihombing6780
@andreansihombing6780 Жыл бұрын
So the output is a feature map? I don't get it why the feature map on the right, stacked like that. Anyone can explain it?
@animatedai
@animatedai Жыл бұрын
Yep, the output is a feature map. Each filter produces one feature of the output, which get stacked together like you see. I've got a video covering that concept here: kzbin.info/www/bejne/m367pp5vbLOYias.
@andreansihombing6780
@andreansihombing6780 Жыл бұрын
@@animatedai Thank you for the explanations, I barely understand with others visualization, but you really do a good job.
@devjeonghwan
@devjeonghwan Жыл бұрын
Unfortunately, only half right. How about if we need to understand 4D or 5D convolution situation? Humans can understand 2D most intuitively and I think this is a reason for why made that 2d based animations. (And 2d convolution can extending to a larger dimension.) And deep learning convolution is unfortunately not mathematically organized. It is derived from "filter" in image processing. and "filter" also derived from "cross correlation" long before. You are animation have a multiple kernels, It just depict an argument called "channels" that is only used by "Neural Network" frameworks.
@peabrane8067
@peabrane8067 Жыл бұрын
Just abstract/generalize it to higher dimensions then. This is the same as saying "why do we visualize vectors in 2d coordinate systems, even though Nd vectors are well-defiend, or even infinite dimensional vectors (Hilbert space)". Visualizations are meant to capture an intuitive/simplified example, not meant for generality. The generality comes from formal mathematical reasoning which no visualization can capture.
@TheBlindfischLP
@TheBlindfischLP Жыл бұрын
Convolution is well defined mathematically, and has been way before the invention of image processing.
@carnap355
@carnap355 6 ай бұрын
I don't understand why filters are 3D. 8 deep = 8 channels? In the end you get as many channels as there are filters?
@animatedai
@animatedai 6 ай бұрын
Good questions! First: for a good explanation of "why" the filters are 3D, check out this video: kzbin.info/www/bejne/jpW3n2ijmZikiq8. And second: the filter depth matches the input depth, and the output depth matches the filter count.
@shuninc9273
@shuninc9273 Жыл бұрын
Not wrong bro. They are just incomplete.
@Karmush21
@Karmush21 8 ай бұрын
Is a cube in the filter (or image) a pixel? Or is it a combination of channels?
@animatedai
@animatedai 8 ай бұрын
Good question. Each cube is a single floating point value.
@Karmush21
@Karmush21 8 ай бұрын
@@animatedai Thank you for answer, btw, why do you think the typical animations are wrong? Can't we just do a 2D convolution on each slice of an input image and then just stack the slices together to get the same feature map as with your animation?
@animatedai
@animatedai 8 ай бұрын
That's a great question. So good, in fact that I'm planning to make a follow-up video explaining it. I think a lot of people are struggling with this idea and your question's phrasing really helped me understand where the misconception is. The short answer is that what you're proposing wouldn't be equivalent, because in neural network convolution, each filter sees the all the features/channels of the input, not just a slice of the input. That's why the filters themselves are 3D. More concretely, let's say a bumblebee-detecting neural network wanted to look for the color yellow, so it needed a filter that detected yellow. That filter couldn't just look at the red channel or just the green channel or just the blue channel. It needs to look at all of them together to distinguish yellow from red or from green or from white or any other color. So we can't slice the input up into red/green/blue and then operate on them separately. Does that make sense?
@Karmush21
@Karmush21 8 ай бұрын
@@animatedai Yeah it makes total sense once you realize (which is quite obvious when you think about it) that the input channels pretty much always have some sort of correlation between them. In your example, we need all 3 channels to see how much red, green and blue we have to get this certain type of yellow. I talked with my professor about this topic (where i referenced to your video). And he believed that they only reason for when this 2D convolution and stacking of slices is better is if you don't have that much data. Also, that the training will be faster. I think a lot of people would appreciate a video exploring the difference between these two ideas in more detail. At least I would. Thank you already for your animations and answers!
@animatedai
@animatedai 8 ай бұрын
I meant to add that the splitting and stacking isn't a crazy idea as long as you understand the limitation (compared the standard convolution) and compensate for it. In fact, it's the basis of the depthwise-separable convolution, which can be much more efficient that standard convolution. I've got a video on it that you might like: kzbin.info/www/bejne/rIfEg5uQjdSpmNk
@BornAgainstAll
@BornAgainstAll Жыл бұрын
Oh.
@havehalkow
@havehalkow 3 ай бұрын
I'm just curious if this visualisation helps someone who doesn't know what convolution is.
@whitemagickh
@whitemagickh Жыл бұрын
I’m a little confused with the video, because I still don’t understand WHY 2d convolution pictures are wrong. What determines the depth of the first input? Same with the convolutional layer. Is this because we have RGBA layers, or?? What’s the benefit of drawing it as 3D instead of 2d? What’s the benefit to us to have a tensor instead of array of convolutional outputs? I’m sure this sounds like thoughtless complaining but I really am curious, and there must be something about convolution in AI that I’m missing in my own knowledge. Thanks for reading this.
@animatedai
@animatedai Жыл бұрын
Those are good questions. 1) Why are the 2D animations wrong? Short answer: they simplify away the feature dimension. Long answer: The 2D convolution that you'll find in neural network libraries (all the way down to NVIDIA's hardware interface) is "conceptually" performed on 3D data. These end up being batched so the interfaces technically take 4D tensors, but that's just multiple convolution operations on 3D data performed in parallel. If your only understanding of convolution came from the 2D animations, you wouldn't understand how to create the 4D tensors (or the 3D piece of data for a particular sample in the batch). In fact, you wouldn't know why the operation took 4D tensors at all. If you'd like more information on the feature dimension, my first video on the fundamental algorithm (kzbin.info/www/bejne/m367pp5vbLOYias) should provide enough information to understand what the feature dimension is and why it's essential. 2) What determines the depth of the first input? This depends on what data you have. A color image in RGB format would be a depth of 3: red, green, and blue. A color image with transparency in RGBA format would have a depth of 4: red, green, blue, and alpha. A grayscale image with a single brightness value for each pixel would have a depth of 1: brightness. 3) What determines the depth of the output? Check out my video on filter count: kzbin.info/www/bejne/j4SxfYCEo9GBrZo 4) What's the benefit of drawing it as 3D instead of 2D? 2D convolution conceptually operates on 3D data (2 spatial dimensions and one feature dimension), so drawing it in 3D shows everything and doesn't simplify anything away from the viewer. 5) What’s the benefit to us to have a tensor instead of array of convolutional outputs? Could you clarify this question? I'm not sure I understand it. Are you asking why we pack all the output values into a single 3D tensor instead of multiple 2D tensors (maybe one for each feature)? It's rare to want the features separated out like that, so it's convenient to have everything together in one tensor. It also has performance benefits from memory locality.
@TheBlindfischLP
@TheBlindfischLP Жыл бұрын
@@animatedai I still don't think the other animations are wrong though. The mathematical concept is the same for an input feature depth of 1, and for higher input depths you are just performing multiple mathematical convolutions at once. I don't think that makes convolution a _fundamentally_ 3d concept, just because it's computationally opportune to package multiple convolutions as one operation.
@alexeychernyavskiy4193
@alexeychernyavskiy4193 Жыл бұрын
They are not wrong. They are a simplification that helps to understand the concept. As any simplification they are incomplete. But not wrong. It's sad that you use clickbait titles.
@cfranc1s
@cfranc1s 11 ай бұрын
They are not wrong. They just show a special case. They use the special case because the focus is on things like stride, dilation, padding etc. It's good to make the 3D tensor animations, but don't call the existing ones wrong. I think I would have still found it easier to understand the existing ones first and then move on to the 3D animations.
@37window57
@37window57 9 ай бұрын
3D, what tool are you using? Blender?
@animatedai
@animatedai 9 ай бұрын
I'm using Blender with a lot of Geometry Nodes.
@37window57
@37window57 9 ай бұрын
@@animatedai 감사합니다 소스 는 오픈 안되있으셔요?
@NileGold
@NileGold Жыл бұрын
Why does it move by 2 and not 3 and it doesn't go to the end of the big cube?? 3:55
@animatedai
@animatedai Жыл бұрын
It moves by 2 because it has a stride of 2. This setting is independent from the kernel size. animatedai.github.io shows stride examples of 1 and 2 which are the most common. It doesn't go to the end because there is only one column remaining, so it's unable to move by 2. When this happens, the filter window moves to the next row, ignoring the remaining column. This is useful to be aware of because you could end up losing data. In this case, the final column is just a padding column so no real input data is skipped.
@Antagon666
@Antagon666 8 ай бұрын
I don't really care about the animations, the problem is when they start describing convolutions as 2D operations and don't go into detail on the effect of having multiple input and output channels. I wish I found this video sooner, but anyway it's easy enough to derive the solutions yourself from 200 google search results. ( Google really sucks nowadays ). It's actually a good mental excercise to imagine the 3d/4d filter sliding across batch of images... But good luck finding a correct padding for strided convolutions during backpropagation of both Conv and TransConv layers... I had to derive everything by hand, because internet has incorrect and even worse conflicting formulas for that... 😂
@leslietetteh7292
@leslietetteh7292 10 ай бұрын
As someone who has never built a convolutional neural network, but as someone who has done lots of convolution in image processing algorithms, the convolution they are showing is normal 2d convolution. 3x3 pixel values in -> convolution kernel operation -> single pixel value out. For showing what is actually going on during an operation with a convolution kernel those first animations are perfect. As someone that's built at least a couple of neural networks with linear transformations, and knows exactly how convolution kernels work, I'd hoped to be able to intuit what's going on in a convolutional neural network from your animation, but your animation is super confusing without any context. What is your input, a 3D tensor - I thought the input was a 2D image? What is the output, I thought the output was just "features" extracted from the image? What you've created is so abstract that it literally made me more confused than I was to start with. In my opinion, the best diagram is what you've is shown between 1:11 and 1:33. Except for the concept of 'pooling' it's 100% clear what the actual mechanics within the Neural Network are, and it explains the process of convolution. With no prior knowledge of a convolutional neural network mechanics, I understand it roughly, save the concept of 'pooling'. If you think that information is too much for a student to be able to put together, you're doing too much of the thinking for them. Maybe with knowledge of convolutional neural networks the animation you've made would make sense, or with context, but for an introductory course to convolutional neural networks, it is literally so abstract as to be worse than useless. Its actively confusing.
@carnap355
@carnap355 6 ай бұрын
input is 3D because it has multiple channels, so those are like 3D convolutions but they are commonly called 2D because stride and padding are 2D. Lets say you have an image with 3 channels so its 100x100x3. If your layer has 16 output layers, it will have 16 convolution filters with size height*width*3. So you will end up with a 100x100x16 image
@leslietetteh7292
@leslietetteh7292 6 ай бұрын
@@carnap355 I have done a little bit of work with convolutional neural networks now, I apologize if my response came off as rude or blunt, I tend to not use as many niceties as I would in a usual conversation when it comes to the internet. I have to say I still find this confusing, though I get what you're saying, I don't understand what that big 3x3xN cuboid block in the middle is, with regard to each of the convolution kernels, and the image, and the image output (at best guess the 3d block is all the stacked results of the previous convolution operations, being subjected to a new convolution operation?). What do you think of the "CNN Explainer" website (can google it)? That's how I understand convolutional neural networks as of now. I also understand the max pooling layer now to be akin to what I would understand from image processing as a "Maximum Filter" operation. So I "think" I have an understanding of what's going on in a convolutional network, bu tfeel free to correct me.
@WordsThroughTheSky
@WordsThroughTheSky 6 ай бұрын
wait a friggin' minute.... you're telling me that the filter or kernel is 3D? I always thought it's a 2d 3x3 filter that goes to through each "layer" of the input and it recreates a 3d output tensor. Are you sure it's a 3D filter? where is this stated?
@animatedai
@animatedai 6 ай бұрын
Haha, I'm sure :) You can check the documentation for your favorite neural network library to verify. The conv2d operation will actually take a 4D tensor for the filters. Each filter is 3D and you pack all the filters together in one tensor to get a 4D tensor. To convince yourself that it only makes sense for the filters to be 3D, check out the sequel video: kzbin.info/www/bejne/jpW3n2ijmZikiq8 Sources: www.tensorflow.org/api_docs/python/tf/nn/conv2d pytorch.org/docs/stable/generated/torch.nn.functional.conv2d.html
@WordsThroughTheSky
@WordsThroughTheSky 6 ай бұрын
@@animatedai you're right... my mind is actually blown, it's all been a lie, ty for the response
@allNicksAlreadyTaken
@allNicksAlreadyTaken Жыл бұрын
They are not wrong. They are just displaying a different case than what you are interested in. Maybe they are misplaced in the material you were looking at, but if they were animations for different things, like convolution filters in image processing, they wouldn't be wrong. Have some humility.
@TheBlindfischLP
@TheBlindfischLP Жыл бұрын
Also convolutions as an idea are way older and more general than just image processing or neural networks. He comes off as ignorant of this.
@Firestorm-tq7fy
@Firestorm-tq7fy Ай бұрын
No, they are wrong. They give the feeling that each feature after convoluted is being a standalone one and post-processed accordingly, Which just isn’t true. They form a new 3d image, which then gets treated as such.
@yunusbilece8690
@yunusbilece8690 Жыл бұрын
I liked the idea but title is too big for this kind of correction
@osmmanipadmehum
@osmmanipadmehum Жыл бұрын
4:04 why isnt last column scanned
@animatedai
@animatedai Жыл бұрын
I'm glad you noticed. I picked an even column number specifically to demonstrate that. This happens because there's a stride of 2 and an even number of columns (8 in this case counting padding) and an odd sized filter (3x3). So after it's taken 3 steps, there's only 1 pixel of width remaining, and it can't move 2 more spaces. Convolution handles this by simply ignoring the remaining data and moving to the next row. This is important to know because it could cause you to lose data (which could accumulate over many layers to be significant chunks of your input). In this case, the last column was just padding anyway, so no real data is lost. Note: the last row is scanned because, unlike the columns, we have an odd number of rows, 7 counting padding.
@hjtvgfhjtghvfg5919
@hjtvgfhjtghvfg5919 Жыл бұрын
If all are wrong, then why should i watch this one?
@user-xu4vz9hi4m
@user-xu4vz9hi4m Жыл бұрын
Слышу
@HyperFocusMarshmallow
@HyperFocusMarshmallow Жыл бұрын
Honestly, just write down the formula… Nice work though!
@axelanderson2030
@axelanderson2030 Жыл бұрын
Geonodes were easier?
@LokeKS
@LokeKS 22 күн бұрын
There is no convolution
@tomo9908
@tomo9908 10 ай бұрын
Instead of spending 95% of the video ranting about how other animations are bad, I would have appreciated it more if you had spend that time explaining how this animation works. I don't think I learned anything from this video.. How do you go from an input RGB image of size W * H * 3, to some cube of size 5 * 5 * 5 (+padding)? You lost me at step 1..
@animatedai
@animatedai 10 ай бұрын
Check out this 100% rant-free playlist to learn more! kzbin.info/www/bejne/m367pp5vbLOYias
@kuanarxiv
@kuanarxiv Жыл бұрын
The example just a concept. I don't agree with this sensational title.
@pommes9966
@pommes9966 3 ай бұрын
most arrogant voice over ive ever heard
@vinner1997
@vinner1997 Жыл бұрын
Amazing!
Why do Convolutional Neural Networks work so well?
16:30
Algorithmic Simplicity
Рет қаралды 35 М.
Кәріс тіріма өзі ?  | Synyptas 3 | 8 серия
24:47
kak budto
Рет қаралды 1,7 МЛН
Stupid man 👨😂
00:20
Nadir Show
Рет қаралды 30 МЛН
Can You Draw The PERFECT Circle?
00:57
Stokes Twins
Рет қаралды 81 МЛН
But what is a convolution?
23:01
3Blue1Brown
Рет қаралды 2,5 МЛН
Pixel Shuffle - Changing Resolution with Style
2:55
Animated AI
Рет қаралды 7 М.
But what is a neural network REALLY?
11:17
Algorithmic Simplicity
Рет қаралды 60 М.
Dear Game Developers, Stop Messing This Up!
22:19
Jonas Tyroller
Рет қаралды 673 М.
What is convolution? This is the easiest way to understand
5:36
Discretised
Рет қаралды 122 М.
CNN: Convolutional Neural Networks Explained - Computerphile
14:17
Computerphile
Рет қаралды 848 М.
How convolutional neural networks work, in depth
1:01:28
Brandon Rohrer
Рет қаралды 200 М.
C4W1L06 Convolutions Over Volumes
10:45
DeepLearningAI
Рет қаралды 226 М.
Кәріс тіріма өзі ?  | Synyptas 3 | 8 серия
24:47
kak budto
Рет қаралды 1,7 МЛН