Kernel Size and Why Everyone Loves 3x3 - Neural Network Convolution

  Рет қаралды 27,251

Animated AI

Animated AI

Күн бұрын

Пікірлер: 43
@IoannisKazlaris
@IoannisKazlaris Жыл бұрын
The basic reason we don't use (even number) x (even number) layers, is because those layers don't have a "center". Having a "center" pixel (as in a 3 x 3 configuration) is very useful for max and average pooling - it's much more convenient for us.
@deeps-n5y
@deeps-n5y 10 ай бұрын
I didn't understand it. End of the day for even sized filter you could consider any pixel to be the center pixel right? it will end up giving similar values though not same. also max pooling and average pooling works on a output feature map so how is it related?
@axelanderson2030
@axelanderson2030 Жыл бұрын
This is honestly the best video related to machine learning in general I have seen, amazing work. Most people just pull architectures out of thin air or make a clumsy disclaimer to experiment with numbers. This video shows 3d visual representations of popular CNN architectures, and really helps you build all cnns in general.
@FrigoCoder
@FrigoCoder 2 ай бұрын
I have some hobbyist signal processing experience of a few decades, and these new methods seem so amateurish compared to what we had in the past. FFT, FHT, DCT, MDCT, FIR filters, IIR filters, FIR design based on frequency response, edge adapted filters (so no need for smaller outputs), filter banks, biorthogonal filter banks, window functions, wavelets, wavelet transforms, laplacian pyramids, curvelets, counterlets, non-separable wavelets, multiresolution analysis, compressive sensing, sparse reconstruction, SIFT, SURF, BRISK, FREAK, yadda yadda. Yes we even had even length filters, and different filters for analysis than for synthesis.
@equationalmc9862
@equationalmc9862 2 ай бұрын
There are Equivalents in AI Model Development and Inference for those though. Many of these signal processing techniques have analogs or are directly applicable in AI and machine learning: - **FFT, FHT, DCT, and MDCT:** Used in feature extraction and preprocessing steps for machine learning models, especially in audio and image processing. - **FIR and IIR Filters:** Used in preprocessing steps to filter and clean data before feeding it into models. - **Wavelets and Wavelet Transforms:** Applied for feature extraction and data compression, useful in handling time-series data. - **Compressive Sensing and Sparse Reconstruction:** Important in developing models that can work with limited data and in reducing the dimensionality of data. - **SIFT, SURF, BRISK, and FREAK:** Feature detection and description techniques that are foundational in computer vision tasks like object recognition and image matching. In AI, techniques like convolutional neural networks (CNNs) often use concepts from signal processing (like filtering and convolutions) to process data in a way that mimics these traditional methods. Signal processing principles help in designing more efficient algorithms and models, improving performance in tasks such as image recognition, speech processing, and time-series analysis.
@schorsch7400
@schorsch7400 5 ай бұрын
Thanks for the effort of maxing this excellent visualization! This creates a very good intuition for how convolutions work and why 3x3 is dominant.
@matthewboughton8320
@matthewboughton8320 Жыл бұрын
Such an amazing video. Your going to hit 50k soon! Keep this up!!!
@newperspective5918
@newperspective5918 Жыл бұрын
I think odd sized filters are mainly used since we often use a stride of 1. Each pixel (except for the edges) will then be filtered based on the surrounding pixels (defined by the kernel size). If the kernel size is even the pixel that the kernel represents would be the average pixel of the 4 middle pixels. It introduces a sort of shift of 0.5 pixel. I think it might be fine mathematically speaking, but it feels odd or wrong. Also if you worked with Gaussian filters (which I assume many CNN researchers has) you are literaly forced to use odd sized filters there.
@maxlawwk
@maxlawwk Жыл бұрын
Perhaps 2x2 kernel is a common trick for learnable stride-2 downsample kernel and upsample deconvolution kernel. It is a more likely about computation efficiency instead of network performance, because such kernels are almost equivalent to downsample/upsample followed by a 3x3 kernel. In this regard, 2x2 combo with stride-2 down/upsample operations do not shrink the resultant feature map size by 2 as 3x3 kernel does, possibly beneficial to image generation tasks. In GAN, 2x2 or 4x4 kernels are commonly found in discriminators which emphasize non-overlapping kernels to avoid grid artifacts.
@josephpark2093
@josephpark2093 Жыл бұрын
There was no reason that I should have this very question and there had to be a great video telling me the exact reason why on the internet. Bless!
@sensitive_machine
@sensitive_machine 2 ай бұрын
this is awesome and is inspiring me to learn blender!
@pritomroy2465
@pritomroy2465 5 ай бұрын
In the Unet, GAN architecture when it is required to generate a feature map half of its actual size a 4x4 kernel size is used.
@md.zahidulislam3548
@md.zahidulislam3548 Жыл бұрын
Good Work. amazing explanation
@alansart5147
@alansart5147 Жыл бұрын
friking love your videos! Keep up with your awesome work! :D
@rewanthnayak2972
@rewanthnayak2972 Жыл бұрын
great work in animation and research
@naevan1
@naevan1 Жыл бұрын
Wow really beatiful animations , great job! However I got kinda confused since I always saw convolution in 2d haha
@animatedai
@animatedai Жыл бұрын
Yes, I imagine that many AI students that only see 2D animations are surprised to learn the 2D convolutions actually work with 3D tensors (or 4D if it's batched). That was one of my main motivations for creating these animations :)
@haofanren6284
@haofanren6284 Жыл бұрын
About 2*2 filter, a paper maybe helpful
@yoursubconscious
@yoursubconscious 5 ай бұрын
"we dont talk about the goose goblin" - MadTV
@ankitvyas8534
@ankitvyas8534 Жыл бұрын
good explanation. Looking forward to more.
@bengodw
@bengodw 3 ай бұрын
Hi Animated AI, thanks for your great video. I have below question: 4:45 indicated the color of filters (i.e. red, yellow, green, blue) represent the "Features". A filter (e.g. the red one) itself in 3-dimension (Height, Width, Feature) also include "Feature". Thus, the "Feature" appear twice. Please could you advise why we need "Feature" twice?
@ocamlmail
@ocamlmail Жыл бұрын
Super cool, thank you!
@travislee5486
@travislee5486 Жыл бұрын
great work, your video do help me a lot👍
@kznsq77
@kznsq77 Жыл бұрын
The even size of the kernel does not allow symmetrical coverage of the area around the pixel
@vikramsharma720
@vikramsharma720 Жыл бұрын
Great Video Keep going like this 😊
@benc7910
@benc7910 Жыл бұрын
this is amazing.
@Firestorm-tq7fy
@Firestorm-tq7fy 5 ай бұрын
I don’t see a reason for 1x1. All you achieve is loosing information, while also creating N-features, each scaled by a certain factor. This can also be achieved within a normal layer (the scaling i mean). There is rly no point. Obviously outside of Depthwise-Pointwise combo. Pls correct me if I’m missing smt.
@danychristiandanychristian1060
@danychristiandanychristian1060 Жыл бұрын
really helpful for understanding the concept, correct me if i'm wrong, so for the first conv2d layer, it will always contains 1 feature for black and white image, and 3 features for rgb image. And after that the number of features increases depending the number of filters used in the convolution.
@animatedai
@animatedai Жыл бұрын
Yes, that's correct
@j________k
@j________k Жыл бұрын
Nice video I like it!
@tantzer6113
@tantzer6113 Жыл бұрын
Wait, I didn’t get why for the first layer 5x5 or 7x7 works better.
@animatedai
@animatedai Жыл бұрын
Check out my reply to this comment for an explanation: kzbin.info/www/bejne/jGq9ind5o66nqJI&lc=UgweJZ_Bri8emvyNAMF4AaABAg.9hOBaZTROlX9hQHXzxdOZM
@fosheimdet
@fosheimdet Жыл бұрын
Is there a good reason for why filter sizes of even numbers aren't used at all, except that the padding will be uneven if using "same"?
@thivuxhale
@thivuxhale Жыл бұрын
at 4:23, you said that the exception to the rule "3x3 filters are more efficient than larger filters" is the first layer, since the input only has 3 channels. i still haven't got this part. i thought when comparing the number of weights needed for each kind of filters, only the size of the filters matter, not the number of channels in the input
@animatedai
@animatedai Жыл бұрын
I've been trying to avoid equations in the videos, but the formula for the total number of weights needed is (filter width * filter height * filter count * input feature count). You can see this represented visually in my filter count video. Assuming that the filter count is the same as the input feature count, it's more efficient to break large (5x5, 7x7, ...) filters into multiple 3x3 filters. A concrete example where all inputs and outputs have F feature dimensions: One 7x7: (7 * 7 * F * F) = 49 * F^2 Three 3x3s: (3 * 3 * F * F) + (3 * 3 * F * F) + (3 * 3 * F * F) = 27 * F^2 But for the first layer, the filter count is usually much higher than the input feature count of 3. It's more efficient to perform this dramatic increase in feature count using one large filter than with multiple smaller one. A concrete example where the first layer has 16 filters: One 7x7: (7 * 7 * 3 * 16) = 2352 Three 3x3s: (3 * 3 * 3 * 16) + (3 * 3 * 16 * 16) + (3 * 3 * 16 * 16) = 5040
@agmontpetit
@agmontpetit 8 ай бұрын
Thanks for taking the time to explain this!@@animatedai
@bangsa_puja
@bangsa_puja 8 ай бұрын
How about of kernel 1x7, 7x1 in inception modul C. Please help me
@Antagon666
@Antagon666 Жыл бұрын
Wait so why do we need larger filters in first layer ? To extract more features from only the 3 channels ? And what is better, more chained filters with lower channel count, or lesser amount of chained filters with more channels ?
@animatedai
@animatedai Жыл бұрын
The filters in the first layer don't need to be larger. There's just no performance benefit to splitting them into a chain of smaller filters. And the reason for that is that the number of features increases dramatically from the input (typically 3 channels for RGB) to something like 16 or 32. The performance benefit of splitting a large filter into smaller filters assumes the number of features stays the same from input to output. > And what is better, more chained filters with lower channel count, or lesser amount of chained filters with more channels? This really depends on the data, how long the chain is, and how many filters you have. It's an ongoing area of research where researchers have found great results in both cases.
@minecraftermad
@minecraftermad 22 күн бұрын
wanna bet e by e would be somehow mathematically optimal?
@ati43888
@ati43888 5 ай бұрын
Nİce
Filter Count - Convolutional Neural Networks
5:20
Animated AI
Рет қаралды 15 М.
Why do Convolutional Neural Networks work so well?
16:30
Algorithmic Simplicity
Рет қаралды 44 М.
Running With Bigger And Bigger Lunchlys
00:18
MrBeast
Рет қаралды 15 МЛН
The CUTEST flower girl on YouTube (2019-2024)
00:10
Hungry FAM
Рет қаралды 52 МЛН
这三姐弟太会藏了!#小丑#天使#路飞#家庭#搞笑
00:24
家庭搞笑日记
Рет қаралды 125 МЛН
Players vs Corner Flags 🤯
00:28
LE FOOT EN VIDÉO
Рет қаралды 42 МЛН
Transformer Neural Networks Derived from Scratch
18:08
Algorithmic Simplicity
Рет қаралды 140 М.
Stride - Convolution in Neural Networks
8:39
Animated AI
Рет қаралды 7 М.
Convolutional Neural Network from Scratch | Mathematics & Python Code
33:23
The Independent Code
Рет қаралды 175 М.
Depthwise Separable Convolution - A FASTER CONVOLUTION!
12:43
CodeEmporium
Рет қаралды 95 М.
Source of confusion! Neural Nets vs Image Processing Convolution
9:01
CNN: Convolutional Neural Networks Explained - Computerphile
14:17
Computerphile
Рет қаралды 857 М.
How AI Discovered a Faster Matrix Multiplication Algorithm
13:00
Quanta Magazine
Рет қаралды 1,4 МЛН
Multihead Attention's Impossible Efficiency Explained
6:27
Animated AI
Рет қаралды 4,7 М.
Convolutional Neural Networks from Scratch | In Depth
12:56
Running With Bigger And Bigger Lunchlys
00:18
MrBeast
Рет қаралды 15 МЛН