All Convolution Animations Are Wrong (Neural Networks)

Рет қаралды 67,712

Күн бұрын

Patreon: / animated_ai
All the neural network 2d convolution animations you've seen are wrong.
Check out my animations: animatedai.git...

Пікірлер: 162

@randyekrer431 2 жыл бұрын

you should've started with the typical RGB 3 layer input image, and animate convolutions on that; that's where most people start to get lost as to how the weights match with inputs, translating from the 2D mental model to 3D.

@amitamola2014 Жыл бұрын

Bang on right. So he made a video of how others weren't doing it right but then didn't start from the start itself to explain what actually goes on correctly. I mean, what good is this new one then :/

@avidrucker 2 жыл бұрын

A major thing that feels missing to me in the animations is clear textual labeling. It's fine that you label them out loud, and then, also, it would be more accessible for folks with hearing challenges or cognitive challenges. My crit aside, this animation is lovely, and I'm very impressed with what you've done. You've earned yourself a new subscriber :)

@thomasprimidis9360 2 жыл бұрын

All these wrong illustration and animation trends have been among the many problems where you would think "why the hell have we been doing this all wrong, all the time, everywhere?". Finally, someone came and did the obvious. Thank you!

@cfranc1s Жыл бұрын

They are not wrong. They just show a special case. They use the special case because the focus is on things like stride, dilation, padding etc. It's good to make the 3D tensor animations, but don't call the existing ones wrong. I think I would have still found it easier to understand the existing ones first and then move on to the 3D animations.

@spider853 2 жыл бұрын

Oh man, I'm so glad someone took a direct approach to this problem, when I was learning I was so confused by all these animations and explanations in 2D, and then seeing resulting tensor shapes got me super confused, where the depth go and where did it appear? Thanks for bringing this video to the world!

@kartikpodugu Жыл бұрын

amazing. you have cleared all my doubts in single shot

@Leibniz_28 Жыл бұрын

Awesome . Finally a good representation of this computations. Thanks for your hard work!!!

@felipelourenco8054 4 ай бұрын

Thanks for that. It was really confusing before your animation came up!

@schorsch7400 10 ай бұрын

Wow, really great, thanks for your work! I was struggling with the very problem you mentioned in the video - bringing together the 2D conv visualizations with the multi-channel 3x3 convolutions that are common in modern CNNs. Thanks to your work, I now understood it.

@peabrane8067 2 жыл бұрын

The animation is just meant as an abstraction of the spatial convolution operation itself. A spatial CNN layer consists of spatial convolution operations across multiple input and output channels (which is what you are referring to)

@lizcoultersmith 6 ай бұрын

These videos are outstanding! Finally, true visualisations that get it right. I'm sharing these with my ML Masters students. Thank you for your considerable effort putting these together.

@exammaterial7511 14 күн бұрын

Amazing Great work

@logon2778 2 жыл бұрын

Forget the animation itself (even though its great). I just appreciate a non-moving camera. It bothers me so much when people spin the camera around a nice animation in a circle. Makes me feel like I am on a carnival ride.

@bala.dhinesh Жыл бұрын

This is what I expected for a long time. This explains everything clearing. Thanks for posting this.

@jazzvids Жыл бұрын

Thank you so much for this! worth mentioning that the animation has a stride of 2

@alessandropolidori9895 2 жыл бұрын

Love it. I always thought there were no accurate visualization on the internet too. Good job

@involuntaryoccupant 3 ай бұрын

i finally understood why convolution makes more channels. thank you so so much

@rezhaadriantanuharja3389 2 жыл бұрын

Premise 1: All convolution animations are wrong Premise 2: This is a convolution animation Conclusion: this is wrong

@Q_20 2 жыл бұрын

oh shit

@zyansheep 2 жыл бұрын

0:02, "all convolution animations you've seen _up to this point_ are wrong"

@LimitedWard 2 жыл бұрын

Maybe it was a proof by contradiction

@Firestorm-tq7fy 10 ай бұрын

No?

@krischalkhanal9591 9 ай бұрын

Someone just read discrete algebra. (Kudos!)

@twbjr2 Жыл бұрын

Thank you for making this video! I have been trying to visualize this using all the horrible diagrams from papers. I immediately understood what they were trying to convey after watching your video!

@momolight2468 2 жыл бұрын

Best video about neural convolution and filters!? YES!!! Thank you so much!

@PeppeMarino 2 жыл бұрын

The use of all these misleading animations is the primary cause of misconception about convolutional neural networks; you have finally provided a good visualization. I am happy to share this content with my colleagues.

@MrShadowjockey 2 жыл бұрын

Thank you for this, recently I tried to explain why the input and output shapes behave the way they do, and what gets combined with what. These animations will make it sooo much easier!!

@pew_pew_pew4377 2 жыл бұрын

well, not speak for the existing animations/figs, i won't say they are wrong, they have some issues, but essentially they are correct. When talking about 2D convonlution, we should know the input and output are 3D as input is a picture and output is also a picture/feature map.

@alimurreza Жыл бұрын

Excellent visualization! I will definitely show these visualizations to my students in Machine Learning course. They will love it.

@sscswist 2 жыл бұрын

Finnaly. You are the best. When i was learning this, i was always looking at all those original animations and i was always so confused ...

@pere_gin 2 жыл бұрын

"a 2D convolution actually takes in a 3D tensor as input and has a 3D convolution as output", well, it depends right? If you have a single channel/grayscale image then the input is in fact a 2D tensor, and each feature outputs a 2D tensor that is joined with all others in the feature map. So if you have a grayscale image with a single feature, the animations would in fact be correct. I think the animations are perfectly fine, as they simplify a concept to it's most basic form for easy understanding. But it is true that after you understand the basic concept, a 3D - 3D representation is also nice to understand more common and complex examples. Disclaimer that I could be wrong as I am by no means an expert, but this is my take from my current understanding of convolutions :)

@animatedai 2 жыл бұрын

I'm not aware of a library where it depends, i.e., where the depth dimension is optional. PyTorch's Conv2D will accept a 3D tensor or a 4D tensor (batched 3D tensors). The functional interface only accepts 4D. Kera's Conv2D layer will only accept a 4D tensor. TensorFlow's conv2d operation with accept anything with at least 4 dimensions (the last three are treated as the height, width, and channels and all the others before that are treated as batch dimensions. NVIDIA's cudnn implementation of 2D convolution takes a 4D tensor. And in all of these cases the weights will be in 4D which can be thought of as a 3D weight for each filter corresponding to the size of the 3D patch that the filters operate on in the input. So as far as the industry standard, there just isn't a 2D convolution where you don't have a depth dimension in the input. Your grayscale case will only have a depth of 1 but will in fact be a 3D tensor. If you're able to find a mainstream library where the depth dimension in the input is optional, let me know.

@Henriiyy 2 жыл бұрын

@@animatedai Still the concept doesn't need the 3d implementation, as the different features are worked on independently anyway. I definitely think it's a stretch and comes off really condescending, to call all other animations wrong.

@aintgonhappen 2 жыл бұрын

This is some amazing content. Thank you, buddy!

@eliagonzalezmolina366 11 ай бұрын

I was looking for something like this to dispel my doubts and it worked! thanks :) (I think you are right, common animations are super misleading)

@rukshanj.senanayaka1467 Жыл бұрын

This video feels like an iPhone moment - a video I didn't know I needed until I saw it. Thanks a lot!

@____-gy5mq 8 ай бұрын

best generalization ever, covers all the corner cases

@hos42 2 жыл бұрын

Thank you for putting this out!

@ayamaitham8430 2 жыл бұрын

Such a hard work ! Thank you so much

@koktszfung 2 жыл бұрын

Convolution is not only used in neural network

@laplaceha6700 Жыл бұрын

this conv2d animation you do is right, thanks alot

@bluvalor7443 2 жыл бұрын

Lol, I literally learned this the hard way just about 2 months ago, when the shape for my 2d convolution required 3 parameters, and this made me super confused :,)

@alexgamingtv7118 2 жыл бұрын

Great work, such a animation for grouped convolutiion would be nice too

@adimaqsood3040 3 ай бұрын

So the point he is trying to make is that an "image" is represented as a 3d object/array , that is height(pixels) , width (pixels) and RGB components, but when we talk about 2d images we usually means a "grayscale image" , which doesn't require the RGB part mentioning explicitly, although it is still a 3d image , . So for a colored image of 128x128 pixels , tenser shape would be (128,128,3) & for a grayscale image it would be (128,128,1)

@GamesEngineer 2 жыл бұрын

Thank you! Great animation. However, I do have a technical nit pick. Your animation shows an operation known as cross-correlation, which is related to convolution, but it is mirrored. "Convolutional neural networks" use cross-correlations in the feed-forward phase and convolutions in the backpropagation phase.

@menkiguo7805 Жыл бұрын

This question has been confusing Mr for a long time thank you

@jkkuusis 2 жыл бұрын

Thanks for the great video! I'm at the moment trying to learn these concepts, so I might be suggesting something that is incorrect. Here it goes anyway: an example with input of depth 3 might be good to better understand this by thinking of rgb image data. It also seems to be the (special) case in your animatioms, that the input depth is the same as the number filters leading to same depth in both input and output. That is not always the case, if I've understood this right.

@animatedai 2 жыл бұрын

I intentionally avoided using an input depth of 3, because that's a special case. Most convolutional layers in a CNN will have an input depth much higher than 3. It's better to think of the convolutional layer as "feature map in, feature map out" rather than "image in, feature map out". By that same logic, I should have made the input depth different than the output depth, because that's also a special case like you said. I had thought about this at one point, but sadly it didn't make it into the final animation. I'll probably fix that on GitHub in the future.

@kshamanthkumar6042 Жыл бұрын

Amazing work 😍

@bediosoro7786 2 жыл бұрын

The first animation you say is wring shows the contribution of one filter operations which is quite accurate. is you considered the number of input channels one and out put channels 1 that is the right figure for the whole operation. the conv2d operation are all element-wise matrices multiplication with shifting windows. the 3D animation you did look great but lack of that notion . that is my option. i stick with the 2D.

@almightysapling 2 жыл бұрын

Thank you! He never explains why his animation is "correct" and in my opinion it simply isn't. 2D convolutions act 2D on 2D data. The fact that we have *multiple data* and *multiple filters* leads to us frequently blocking things in 3D, but the convolution itself is still fundamentally 2D. And if someone doesn't understand that, it's not because of bad animations.

@ГригорийПогорелов-и7о 2 жыл бұрын

Adding a bias term added after convolutions would be a full process representation. Anyway, great visualization!

@hyahyahyajay6029 5 ай бұрын

I have been struggling to mentally visualize convolutions, specially going from one dimension to others. I was reading the book Understanding Deep Learning by Simon Prince and I realized what i thought i looked like was wrong ( The 2D to 2D animations from the beginning). I wish I would have stumbled upon yours before having to imagine what was explained in the books XD (Good book tho)

@kristianmlbachlian5042 2 жыл бұрын

Discovered your channel just now. If only I had these resources at my disposal when learning about these topics myself. Keep it up, you will save many careers!

@naasvanrooyen2894 2 жыл бұрын

Thanks so much for this. Also really struggled to get proper animations. Would have liked to see how this looks in the actual neural network. i.e. how the filter can be visualized as the weights. Or show who the filter parameters are trained. Would greatly appreciate a video of GAN and LSTM. The LSTM diagrams are terrible. Really struggled to visualized how they connect to the overall network

@kuanarxiv 2 жыл бұрын

The example just a concept. I don't agree with this sensational title.

@alexeychernyavskiy4193 2 жыл бұрын

They are not wrong. They are a simplification that helps to understand the concept. As any simplification they are incomplete. But not wrong. It's sad that you use clickbait titles.

@leslietetteh7292 Жыл бұрын

As someone who has never built a convolutional neural network, but as someone who has done lots of convolution in image processing algorithms, the convolution they are showing is normal 2d convolution. 3x3 pixel values in -> convolution kernel operation -> single pixel value out. For showing what is actually going on during an operation with a convolution kernel those first animations are perfect. As someone that's built at least a couple of neural networks with linear transformations, and knows exactly how convolution kernels work, I'd hoped to be able to intuit what's going on in a convolutional neural network from your animation, but your animation is super confusing without any context. What is your input, a 3D tensor - I thought the input was a 2D image? What is the output, I thought the output was just "features" extracted from the image? What you've created is so abstract that it literally made me more confused than I was to start with. In my opinion, the best diagram is what you've is shown between 1:11 and 1:33. Except for the concept of 'pooling' it's 100% clear what the actual mechanics within the Neural Network are, and it explains the process of convolution. With no prior knowledge of a convolutional neural network mechanics, I understand it roughly, save the concept of 'pooling'. If you think that information is too much for a student to be able to put together, you're doing too much of the thinking for them. Maybe with knowledge of convolutional neural networks the animation you've made would make sense, or with context, but for an introductory course to convolutional neural networks, it is literally so abstract as to be worse than useless. Its actively confusing.

@carnap355 Жыл бұрын

input is 3D because it has multiple channels, so those are like 3D convolutions but they are commonly called 2D because stride and padding are 2D. Lets say you have an image with 3 channels so its 100x100x3. If your layer has 16 output layers, it will have 16 convolution filters with size height*width*3. So you will end up with a 100x100x16 image

@leslietetteh7292 Жыл бұрын

@@carnap355 I have done a little bit of work with convolutional neural networks now, I apologize if my response came off as rude or blunt, I tend to not use as many niceties as I would in a usual conversation when it comes to the internet. I have to say I still find this confusing, though I get what you're saying, I don't understand what that big 3x3xN cuboid block in the middle is, with regard to each of the convolution kernels, and the image, and the image output (at best guess the 3d block is all the stacked results of the previous convolution operations, being subjected to a new convolution operation?). What do you think of the "CNN Explainer" website (can google it)? That's how I understand convolutional neural networks as of now. I also understand the max pooling layer now to be akin to what I would understand from image processing as a "Maximum Filter" operation. So I "think" I have an understanding of what's going on in a convolutional network, bu tfeel free to correct me.

@kasuha 2 жыл бұрын

I find these "new and correct" animations confusing, I have no idea what's happening there. I assume this is just "the correct way to display convolution" for AI models? As an old school person who used convolutions mainly for 2D image processing (blur/edge detection) I don't see anything wrong about the old animations, that's exactly what we used to do there.

@simonpikalov6136 2 жыл бұрын

Finally a good animation video !

@grjesus9979 4 ай бұрын

So in case of a feature map input, 2d conv just replicate each 2d filter along the feature dimension and do multiplication wise? In the video, the filters are 2d really just replicate to fill in the the number of features? or does each 2d filter is in reality a 3d tensor to match the feature dimension?

@PurnenduPrabhat 8 ай бұрын

Good job

@liliang3786 12 күн бұрын

Awesome!!

@slime67 2 жыл бұрын

Great work!

@xuerobert5336 Жыл бұрын

brilliant !

@shuninc9273 2 жыл бұрын

Not wrong bro. They are just incomplete.

@orenAm Жыл бұрын

Your videos are very cool! I wonder if you thought about how to present Conv3d, it is a challenge when considering more than one channel

@captainjj7184 6 ай бұрын

I like it, really, love it! But... I don't see what's wrong with other illustrations and peculiarly I think yours just iterates what they already clearly illustrate. I was even expecting CNN representations in XYZ visuals. Am I missing some points here? Honest question, would appreciate any enlightenment! (btw, thank you for sharing the world with your own version of splendid animation!) PS: If you're up for the challenge, do Spiking NN, I'll buy you a beer in Bali!

@rukshanj.senanayaka1467 Жыл бұрын

Super helpful, any plans to make this open source or make interactable cases where we can change the stride and see the variation?

@animatedai Жыл бұрын

Thank you! On my GitHub page (animatedai.github.io), you can see a few different variations, and I'm also working on an interactive webgl app where you can pick the parameters.

@devjeonghwan 2 жыл бұрын

Unfortunately, only half right. How about if we need to understand 4D or 5D convolution situation? Humans can understand 2D most intuitively and I think this is a reason for why made that 2d based animations. (And 2d convolution can extending to a larger dimension.) And deep learning convolution is unfortunately not mathematically organized. It is derived from "filter" in image processing. and "filter" also derived from "cross correlation" long before. You are animation have a multiple kernels, It just depict an argument called "channels" that is only used by "Neural Network" frameworks.

@peabrane8067 2 жыл бұрын

Just abstract/generalize it to higher dimensions then. This is the same as saying "why do we visualize vectors in 2d coordinate systems, even though Nd vectors are well-defiend, or even infinite dimensional vectors (Hilbert space)". Visualizations are meant to capture an intuitive/simplified example, not meant for generality. The generality comes from formal mathematical reasoning which no visualization can capture.

@Henriiyy 2 жыл бұрын

Convolution is well defined mathematically, and has been way before the invention of image processing.

@mariogonzalezotero 2 жыл бұрын

amazing!

@tomo9908 Жыл бұрын

Instead of spending 95% of the video ranting about how other animations are bad, I would have appreciated it more if you had spend that time explaining how this animation works. I don't think I learned anything from this video.. How do you go from an input RGB image of size W * H * 3, to some cube of size 5 * 5 * 5 (+padding)? You lost me at step 1..

@animatedai Жыл бұрын

Check out this 100% rant-free playlist to learn more! kzbin.info/www/bejne/m367pp5vbLOYias

@surfaceoftheoesj 2 жыл бұрын

Beautiful 💙

@cfranc1s Жыл бұрын

@nikilragav 7 ай бұрын

What actually is the 3rd dimension in this context for the source giant cube? Is that multiple colors? A batch of multiple images?

@menkiguo7805 Жыл бұрын

amazing

@dereklust3480 2 жыл бұрын

Great video, but you didn’t explain why the input is a 3d tensor. If we are convoluting a 2d image, where does the 3d tensor come from?

@animatedai 2 жыл бұрын

The short answer is that both the input and output are feature maps. Check out this video explaining it: kzbin.info/www/bejne/m367pp5vbLOYias

@dereklust3480 2 жыл бұрын

@@animatedai A better way to ask my question is this - How is the first feature map created in a convolutional network? Surely the time time the 2d image is convoluted we have a 2d tensor and 2d filters, just like the typical animation, right? I get that the output of this convolution will be a 3d feature map, and thus all further convolution will look like your animation that is 3d to 3d.

@dereklust3480 2 жыл бұрын

the *first* time

@animatedai 2 жыл бұрын

A 2D image is represented as a 3D feature map with 3 features: red, green, and blue. So even the first convolution has a 3D input.

@simonpointner7545 2 жыл бұрын

@@animatedai You may take a grayscale image as an input because for many cases it is sufficient, and for learning it´s a good simplification, I would not consider this being a wrong animation, it just assumes you have a grayscale image as an input. I get the point of your video but the title is clickbait.

@jazzvids Жыл бұрын

3:30 - the final animation

@panjak323 Жыл бұрын

I don't really care about the animations, the problem is when they start describing convolutions as 2D operations and don't go into detail on the effect of having multiple input and output channels. I wish I found this video sooner, but anyway it's easy enough to derive the solutions yourself from 200 google search results. ( Google really sucks nowadays ). It's actually a good mental excercise to imagine the 3d/4d filter sliding across batch of images... But good luck finding a correct padding for strided convolutions during backpropagation of both Conv and TransConv layers... I had to derive everything by hand, because internet has incorrect and even worse conflicting formulas for that... 😂

@guyindisguise 2 жыл бұрын

Nice animation, are you planning on making animations for Transformers as well?

@ChaoticNeutralMatt Жыл бұрын

Oh cool, didn't know Blender had all that.

@allNicksAlreadyTaken 2 жыл бұрын

They are not wrong. They are just displaying a different case than what you are interested in. Maybe they are misplaced in the material you were looking at, but if they were animations for different things, like convolution filters in image processing, they wouldn't be wrong. Have some humility.

@Henriiyy 2 жыл бұрын

Also convolutions as an idea are way older and more general than just image processing or neural networks. He comes off as ignorant of this.

@Firestorm-tq7fy 10 ай бұрын

No, they are wrong. They give the feeling that each feature after convoluted is being a standalone one and post-processed accordingly, Which just isn’t true. They form a new 3d image, which then gets treated as such.

@hannahnelson4569 7 ай бұрын

Convolution is defined for any finite dimension of tensor, even 1 dimensional. While the claims made in this video are valid when looking from the domain of machine learning. I do aggree that calling diagrams describing a different purpose of a general structure 'wrong' because its not how your particular field uses it feels a bit sensationalist.

@Shad0wWarr10r 2 жыл бұрын

With those sahpes og input and filter, is there even any reason to have them 3d over 2d? I get the output as you layer filters, but if the input and filter is just the same thing all thru representing them by singular cubes is not wrong

@Shamisen100 2 жыл бұрын

Any plans to add your animations to Wikimedia commons? :)

@bibimblapblap 6 ай бұрын

Why is your input tensor so many dimensions? Shouldn’t the depth be only 3 (1 for each color channel)?

@andreas.karatzas 2 жыл бұрын

@animatedai How did you learn blender? Which were your sources?

@havehalkow Жыл бұрын

I'm just curious if this visualisation helps someone who doesn't know what convolution is.

@vinner1997 2 жыл бұрын

Amazing!

@yunusbilece8690 2 жыл бұрын

I liked the idea but title is too big for this kind of correction

@curious_one1156 Жыл бұрын

Bravo !

@RojinaPanta1 2 жыл бұрын

I really appreciate the effort and is good one, but I would still go with the 2D one this is way too much jittery for me with so many things happening at one and choice of colors.

@connorvaughan356 2 жыл бұрын

Why is the does the input shape have have so many layers in this animation? Wouldn't it have a shape equal to the image shape, then 3 layers, 1 for rgb respectively?

@animatedai 2 жыл бұрын

I'm glad you asked. In general, convolution takes a feature map as input, which can have any number of features (depth). An image is a special kind of feature map with 3 features: red, green, and blue. However, this typically only applies for the first layer of convolution in a neural network and the other convolutional layers will have more features for input, e.g., 32, 64, 128, ... 1024, 2048. So to better represent the general case and to encourage viewers to consider more than just the special case of an image, I chose to use 8 for the animations. Although 3 would also be perfectly valid.

@JonDornaletetxe 2 жыл бұрын

🔥

@gabrielchan3255 2 жыл бұрын

this is great

@Tyler-i2d 2 ай бұрын

I feel silly for asking, but the different colored blocks (in the middle) correspond to convolutions over different channels of the original matrix right?

@carnap355 Жыл бұрын

I don't understand why filters are 3D. 8 deep = 8 channels? In the end you get as many channels as there are filters?

@animatedai Жыл бұрын

Good questions! First: for a good explanation of "why" the filters are 3D, check out this video: kzbin.info/www/bejne/jpW3n2ijmZikiq8. And second: the filter depth matches the input depth, and the output depth matches the filter count.

@mizoru_ 2 жыл бұрын

Nice!

@hjtvgfhjtghvfg5919 2 жыл бұрын

If all are wrong, then why should i watch this one?

@날개달린_양 Жыл бұрын

tysm

@feddyxdx272 6 ай бұрын

thx

@kage-sl8rz 8 ай бұрын

cool even better add names to the objects like kernel etc would be helpful to new people

@macewindont9922 8 ай бұрын

sick

@jameshopkins3541 7 ай бұрын

Which is correct?????

@honourable8816 8 ай бұрын

Stride value was 2 pixel

@andreansihombing6780 2 жыл бұрын

So the output is a feature map? I don't get it why the feature map on the right, stacked like that. Anyone can explain it?

@animatedai 2 жыл бұрын

Yep, the output is a feature map. Each filter produces one feature of the output, which get stacked together like you see. I've got a video covering that concept here: kzbin.info/www/bejne/m367pp5vbLOYias.

@andreansihombing6780 2 жыл бұрын

@@animatedai Thank you for the explanations, I barely understand with others visualization, but you really do a good job.

@LokeKS 9 ай бұрын

There is no convolution

@davidebic 2 жыл бұрын

Is there a way to access your course online? I'm really interested in this subject!

@animatedai 2 жыл бұрын

The course is a work-in-progress. You can see the videos that are completed so far in this playlist: kzbin.info/www/bejne/m367pp5vbLOYias

@osmmanipadmehum 2 жыл бұрын

4:04 why isnt last column scanned

@animatedai 2 жыл бұрын

I'm glad you noticed. I picked an even column number specifically to demonstrate that. This happens because there's a stride of 2 and an even number of columns (8 in this case counting padding) and an odd sized filter (3x3). So after it's taken 3 steps, there's only 1 pixel of width remaining, and it can't move 2 more spaces. Convolution handles this by simply ignoring the remaining data and moving to the next row. This is important to know because it could cause you to lose data (which could accumulate over many layers to be significant chunks of your input). In this case, the last column was just padding anyway, so no real data is lost. Note: the last row is scanned because, unlike the columns, we have an odd number of rows, 7 counting padding.

@WordsThroughTheSky Жыл бұрын

wait a friggin' minute.... you're telling me that the filter or kernel is 3D? I always thought it's a 2d 3x3 filter that goes to through each "layer" of the input and it recreates a 3d output tensor. Are you sure it's a 3D filter? where is this stated?

@animatedai Жыл бұрын

Haha, I'm sure :) You can check the documentation for your favorite neural network library to verify. The conv2d operation will actually take a 4D tensor for the filters. Each filter is 3D and you pack all the filters together in one tensor to get a 4D tensor. To convince yourself that it only makes sense for the filters to be 3D, check out the sequel video: kzbin.info/www/bejne/jpW3n2ijmZikiq8 Sources: www.tensorflow.org/api_docs/python/tf/nn/conv2d pytorch.org/docs/stable/generated/torch.nn.functional.conv2d.html

@WordsThroughTheSky Жыл бұрын

@@animatedai you're right... my mind is actually blown, it's all been a lie, ty for the response

@Gabbyreel 2 жыл бұрын

I’m a little confused with the video, because I still don’t understand WHY 2d convolution pictures are wrong. What determines the depth of the first input? Same with the convolutional layer. Is this because we have RGBA layers, or?? What’s the benefit of drawing it as 3D instead of 2d? What’s the benefit to us to have a tensor instead of array of convolutional outputs? I’m sure this sounds like thoughtless complaining but I really am curious, and there must be something about convolution in AI that I’m missing in my own knowledge. Thanks for reading this.

@animatedai 2 жыл бұрын

Those are good questions. 1) Why are the 2D animations wrong? Short answer: they simplify away the feature dimension. Long answer: The 2D convolution that you'll find in neural network libraries (all the way down to NVIDIA's hardware interface) is "conceptually" performed on 3D data. These end up being batched so the interfaces technically take 4D tensors, but that's just multiple convolution operations on 3D data performed in parallel. If your only understanding of convolution came from the 2D animations, you wouldn't understand how to create the 4D tensors (or the 3D piece of data for a particular sample in the batch). In fact, you wouldn't know why the operation took 4D tensors at all. If you'd like more information on the feature dimension, my first video on the fundamental algorithm (kzbin.info/www/bejne/m367pp5vbLOYias) should provide enough information to understand what the feature dimension is and why it's essential. 2) What determines the depth of the first input? This depends on what data you have. A color image in RGB format would be a depth of 3: red, green, and blue. A color image with transparency in RGBA format would have a depth of 4: red, green, blue, and alpha. A grayscale image with a single brightness value for each pixel would have a depth of 1: brightness. 3) What determines the depth of the output? Check out my video on filter count: kzbin.info/www/bejne/j4SxfYCEo9GBrZo 4) What's the benefit of drawing it as 3D instead of 2D? 2D convolution conceptually operates on 3D data (2 spatial dimensions and one feature dimension), so drawing it in 3D shows everything and doesn't simplify anything away from the viewer. 5) What’s the benefit to us to have a tensor instead of array of convolutional outputs? Could you clarify this question? I'm not sure I understand it. Are you asking why we pack all the output values into a single 3D tensor instead of multiple 2D tensors (maybe one for each feature)? It's rare to want the features separated out like that, so it's convenient to have everything together in one tensor. It also has performance benefits from memory locality.

@Henriiyy 2 жыл бұрын

@@animatedai I still don't think the other animations are wrong though. The mathematical concept is the same for an input feature depth of 1, and for higher input depths you are just performing multiple mathematical convolutions at once. I don't think that makes convolution a _fundamentally_ 3d concept, just because it's computationally opportune to package multiple convolutions as one operation.