Filter Count - Convolutional Neural Networks

Рет қаралды 13,472

Жыл бұрын

Patreon: / animated_ai
Learn about filter count and the realistic methods of finding the best values
My Udemy course on High-resolution GANs: www.udemy.com/course/high-res...

Пікірлер: 44

@bycloudAI Жыл бұрын

This is channel is KZbin's undiscovered gold mine, please keep up the amazing content!!

@ChrisRamsland94 Жыл бұрын

for real it's wild how much effort effort he puts into these.

@kinvert Жыл бұрын

Thanks for another great video!

@marceloamado6223 9 ай бұрын

Thank you for the video I really stress out about this matter now I am more calm knowing it's a conventional problem and solving it by euristics is the way

@arjunpalusa9421 Жыл бұрын

Eagerly waiting for your next videos..

@carlluis2045 Жыл бұрын

awesome videos!!!

@KJletMEhandelit 6 ай бұрын

This is wow!

@d_b_ Жыл бұрын

3:24 "only 2 categories...so 512 features is enough" - this statement sounds like it comes from familiarity with the problem. Is there something more to it? Did you see that number of features used in past papers? Was it from your own experimentation against 256 or 1024 features? Is there some math that arrives at this? I'd like to understand this better, so any additional color you have on this would be helpful!

@animatedai Жыл бұрын

You typically want more features than categories. So for something like ImageNET with 1000 categories, 512 wouldn't be enough. You'd want 2048 or higher. But this case only has 2 categories so 512 easily meets that requirement. And the exact value of 512 came from NVIDIA's StyleGAN papers, which is what I based that architecture on. I don't remember them giving a reason for that value, but it gave them good results and a higher value wouldn't fit into memory during training on the Google Colab hardware. It's more of an art than a science so let me know if that doesn't completely answer your question. I'm happy to answer follow-ups.

@d_b_ Жыл бұрын

@@animatedai Thank you, that helps

@user-tg6dd7qh5l 3 ай бұрын

Best video for the CNN.

@buddhadevbhattacharjee1363 6 ай бұрын

Please create a tutorial on conv3d as well, and which would better for video processing (conv2d or conv3d)

@umamusume1024 2 ай бұрын

When it comes to one-dimensional signals, such as time domain signals collected from speed sensors, what is the difference between visualization of 1D CNN and 2D one? Does it just change the height of the cuboid into 1? And, what algorithms do you recommend for deep learning of one-dimensional time domain signals? I would really appreciate your reply, because as a Chinese student doing an undergraduate graduation project, I can't find any visualization of 1D CNN on the Chinese Internet.

@igorg4129 Жыл бұрын

wow!

@lionbnjmn Жыл бұрын

hey! do you have any sources on your statements at 2:21(about doubling channels when downsampling) and 2:50 (downsampling units should be followed by dimension-steady units)? currently writing a paper and trying to argue the same point, but i cant find any real research on it :)

@animatedai Жыл бұрын

It's just a common pattern that I've seen. There's no shortage of examples if you want to cite them, from the original resnet all the way up to modern diffusion architectures.

@whizziq Жыл бұрын

I have one question. What does number of features mean? For example the initial image is 512x512x3 (3 in this case are red-green-blue coloros). But what happened in the next layers? What are these 64, 128 and more numbers of features? Why do we need so many instead of just 3? Thanks. Appreciate your videos!

@Anodder1 Жыл бұрын

You have 3 dimensions, 2 spatial and 1 feature dimension. The 2 spatial dimensions encode where the information is and the feature dimension encodes different aspects under which the information can be interpreted. In the beginning you have the 3 color channels but the next layer has a much larger feature dimension in which each index represents one particular aspect like "how much red-green color contrast is between left and right is at this position". These aspects become more higher-level like "is this a circle", so the feature dimension needs to increase to cover all useful interpretations that could be applicable at that point. This agrees well with the shrinking spatial dimensions because each pixel in a later layer represents a larger area of the original image for which these many higher-level interpretations would be necessary.

@SpicyMelonYT Жыл бұрын

How do you halve the resolution? The only way I can Imagine is to have a kernel that has half the size of the input data plus one. Is that correct to do or is something else happening?

@animatedai Жыл бұрын

This will be covered in an upcoming video, but to give away the answer: you can pad with "SAME" (in TensorFlow or manually pad the equivalent of it in PyTorch) and use a stride of 2.

@SpicyMelonYT Жыл бұрын

@@animatedai Oh I see, and that is still backpropagation compatible? I guess it would be but I have no clue how to do that little step backwards, I assume just act like the data was always smaller for that step

@animatedai Жыл бұрын

Yes, that still works fine with backpropagation. Are you working with a library like TensorFlow or PyTorch? If so, they'll handle the backpropagation for you with their automatic differentiation. If you're using a kernel size of 1x1, it would work to act like the data was always smaller (specifically you would treat it like you dropped every other column/row of the data and then did a 1x1 convolution with a stride of 1). But for a larger kernel size like 3x3, all of the input data will be used so that won't work.

@SpicyMelonYT Жыл бұрын

@@animatedai Ah I see that makes sense. I am actually building it from scratch in javascript. It is pretty slow but I am doing it do get a better understanding for it and I also find it fun. Also thank you for the responses that is really cool. I think what you are doing with these videos is really sleek and useful. I personally would like if you went into more depths about the actual math and numbers but I completely understand that your goal here is not that and to give a more intuitive explanation for people. Keep it up!

@animatedai Жыл бұрын

Good luck on your javascript CNN project! And thank you; I appreciate your support. The math is something I plan to cover; I even have the rough draft of the script written for a future video that goes over the math in detail. I just wanted to focus on teaching the intuition separately first so that it doesn't get lost in the calculation details.

@omridrori3286 Жыл бұрын

This are animation you also use in the course?

@animatedai Жыл бұрын

I don't use these in my current course, but I'm planning to incorporate them into a new course that I'm working on now.

@bagussajiwo321 Жыл бұрын

So, if the filter value is 64, that's mean you stack the 512x512's photo 64 times? like you stack that face 64 times? or there are different pixel value for every filter?

@bagussajiwo321 Жыл бұрын

take an example like 3x3 matrix with 5 filters 1 0 1 0 1 0 1 0 1 so this value is 5 times stacked bcs the filter value is 5?

@animatedai Жыл бұрын

Have you seen my video on the fundamental algorithm? kzbin.info/www/bejne/m367pp5vbLOYias You can think of each filter as a different pattern that the algorithm is searching for in the image (or input feature map). Each output value (each cube) represents how closely that area of the input matched the pattern. So you get a 2D output for each filter. And those outputs are stacked depth-wise to form the 3D output feature map. If you have 64 filters, you'll stack 64 of these 2D outputs together.

@bagussajiwo321 Жыл бұрын

@@animatedai i see!! thanks for giving me the previous video's link sorry for my silly question 😅

@superzolosolo Жыл бұрын

How did you go from 512x512x3 to 512x512x64 while still using a kernel size of 3x3? Wouldnt you have to use a kernel size of 1x1 and have 64 kernels? That is the only way I understand it so far, I'm hoping you explain it later on. Other than that this is super helpful thank you so much 🙏

@animatedai Жыл бұрын

You're correct that you need 64 kernels, but the size of the kernels doesn't matter. It's fine to have a kernel size of 3x3 and 64 kernels.

@superzolosolo Жыл бұрын

@@animatedai I see now, thanks for clarifying

@adityamurtiadi911plus 4 ай бұрын

@@animatedai have you used padding of 1 to keep the dimension of output same ?

@pi5549 9 ай бұрын

2:08 Don't think we can go 512x512x3 to 512x512xN if filterSize>1. If filterSize=3 we'd be going to 510x510xN, right? Thought experiment: 5 items, slidingWindowLen 3. 3 slide-positions (123 234 345).

@pi5549 9 ай бұрын

hmm, I suppose a feature can extend beyond the image by a pixel, might even collect useful information that informs it that it's dealing with an edge. Solving a jigsaw puzzle you usually collect the edge-pieces & try to work with them first.

@animatedai 9 ай бұрын

This question is actually the perfect lead-in to my video on padding: kzbin.info/www/bejne/ppmXfaWao9mChNEfeature=shared That's actually the video that directly follows this one in my playlist on convolution. You can see the full playlist here: kzbin.info/aero/PLZDCDMGmelH-pHt-Ij0nImVrOmj8DYKbB&feature=shared

@tazanteflight8670 Жыл бұрын

Why do your filter examples have a depth of 8 ?

@jntb3000 11 күн бұрын

Because the input has a depth of eight

@tazanteflight8670 10 күн бұрын

@@jntb3000 Eight ... what? And why was 7 insufficient, and why is 9 too much?

@jntb3000 10 күн бұрын

I believe The depth of each kernel must match the depth of the input. In his example, the input has the depth of eight. hence the kernel has the depth of eight.

@jntb3000 10 күн бұрын

@@tazanteflight8670 I believe The depth of each kernel must match the depth of the input. In his example, the input has the depth of eight. hence the kernel has the depth of eight. If the input only has depth of 3 (like RGB colors) then the kernel should have depth of 3 also. I guess we could use kernel depth of just one for all input sizes also.

@Looki2000 Жыл бұрын

Imagine the ability to build your own TensorFlow neural network using such 3D visualization.

@conficturaincarnatus1034 Жыл бұрын

The 3rd dimension part seems a bit tedious, I believe 2D visualizations is more helpful in practicality, just write down in a text the feature count at the bottom and go to the next. And for building neural networks in 2D visualization I've recently found KNIME to be amazing, although you are abstracting entire layers to a single box lmao.