Visualizing Convolutional Neural Networks

Visualizing Convolutional Neural Networks | Layer by Layer

Рет қаралды 93,435

Күн бұрын

Пікірлер: 98

@Purely_Andy 2 жыл бұрын

I've been wanting a video like this for a long time, you nailed the balance between showing and explaining. I subscribed and liked the video.

@far1din 2 жыл бұрын

Thank you my friend. Glad you got value out of it! 🔥

@swedenontwowheels Жыл бұрын

This is an excellent video explaining how convolutional layers work at a glance! Fantastically well done!

@fdshdsfdsqq Жыл бұрын

Man, you're amazing! This is a perfect visualization, thank you God give you health!

@far1din Жыл бұрын

Haha, thank you for the kind words my friend :D

@AnaS-lp4mf 9 ай бұрын

Literally thank you so much for your video. You don’t know how much relief you brought me in understanding it! Your voice is so clear and your explanations were just spot on and easy to understand. I’ve spent like weeks trying to figure out how the pieces go together and you have completed the puzzle of confusion in my mind. Thank you so much !!!

@teofilciobanu Жыл бұрын

Hi there! Pretty cool video :D One suggestion I would have would be to show how the output looks if we change the input. For example, at 2:50 and 4:30 a 2x10 and 4x10 grid respectively where the ith column is the layer output for the ith digit.

@ArashSoleimanifar Жыл бұрын

You are fantastic! I look forward to watching your next videos. please produce mooore videos!

@far1din Жыл бұрын

Thank you brother. Working on it! 😃🚀

@joaogustavo6268 8 ай бұрын

This is art! Never delete this video bro!

@sugurufoundation4127 7 ай бұрын

Thank you very much for this explanation. I have been struggling to understand how exactly convolutions work, and you explained it perfectly. Thanks keep up the good work bro.

@Democracy_Manifest Жыл бұрын

Excellent video. We needed this, thank you. Keep it up!

@dndr4818 Жыл бұрын

One of the best video explaining this topic. What a great job and thank you!

@programmingwithchatti4063 2 жыл бұрын

why is the background music soo soothing xD I want to focus not lost in thoughts xD

@far1din 2 жыл бұрын

Haha, next time myfriend. I will turn down the music 🚀

@r0cketRacoon 6 ай бұрын

WWOWW! I need a video visualizing CNN from layer to layer like that, and I encountered ur channel. The best CNN visualization for me now. Tks u!

@SterileNeutrino 4 ай бұрын

Nice. I remember working on digit recognition using handcoded analysis of pixel runs a long time ago. It never worked properly 😂 And it was computationally intensive.

@maxedin7910 2 жыл бұрын

Amazing visualization! I wish for more short vids like this!

@boramin3077 7 ай бұрын

Amazing work! Please continue to make more videos on other ML topics. I find your videos are really helpful to understand the concepts.

@RAHUL1181995 2 жыл бұрын

This was really helpful....Thank you so much for the vizualization...Keep up the good work...Looking forward to your future uploads.

@khayyamnaeem5601 2 жыл бұрын

This is my favourite channel. I especially like it when you explain everything so nicely. I wish you a lot of success with the channel and happy life.

@far1din 2 жыл бұрын

That’s great to hear. I wish you all the best! 😃

@mehdismaeili3743 Жыл бұрын

excellent. 3 video and 1.89 k subscriber !!! it's amazing, and you are amazing .

@far1din Жыл бұрын

Thank you my friend! 💯

@AutobahnBuilder Жыл бұрын

I gotta say your animation is great and the way you explain it is really good. It feels like 3blue 1brown and just as well explained. Keep the good work!

@far1din Жыл бұрын

Thank you my friend 🔥

@KarolBura 11 ай бұрын

thank you from Silicon Valley Startup

@Sowmya_codes 9 ай бұрын

Hello user-zn1hm6tk1o Do let me know if you're in need of an intern !

@alfashow2439 2 жыл бұрын

Great video! you got me interseted!

@_aitalks_ 2 жыл бұрын

Nice walkthrough!

@far1din 2 жыл бұрын

I am glad you enjoyed the content!

@zafer-x1v 28 күн бұрын

Wonderfullexplanation. I appreciate it.

@hackeymabel8296 Жыл бұрын

thank you really needed this

@websciencenl7994 2 жыл бұрын

Great video, Thank you!

@lapusikrid475 2 жыл бұрын

Great video! So good and affordable explanation! Thank you, it helps me a lot!)

@locngoxuan1329 2 жыл бұрын

it's really informative for me. Tks bro!!

@far1din 2 жыл бұрын

Thank you brother. I will try to post a more in depth video soon!

@sivaladeepak1538 Жыл бұрын

Great Viz!!!

@fiya.. 2 жыл бұрын

Nice video! Thank you!

@jaberfarahani6645 5 ай бұрын

the best channel❤

@suthanraja1657 5 ай бұрын

Thank you so much!

@ForbiddenPrime 4 ай бұрын

Thank you for the source code. this will help me to create some content for my syllabus. with love

@far1din 3 ай бұрын

Glad it was helpful although not the most ideal code 😂

@randomtypebeats7348 2 жыл бұрын

keep em coming brodie!

@far1din 2 жыл бұрын

For sure! :)

@IvoVacca 2 жыл бұрын

Very cool.

@emadilabid 7 ай бұрын

Bravo dude!

@nvmvel 2 жыл бұрын

Wow. Such a high quality content with great effort. U deserve million subscribers.. Hope soon u ll get popular and make more videos, 🙌

@far1din 2 жыл бұрын

Haha, stay tuned! 🚀🔥

@KhanhTran-qp5pm Жыл бұрын

Love your video.

@emmanuelsheshi961 6 ай бұрын

nice work sir

@mattialoszach6405 9 ай бұрын

Awesome video🫶

@EpromLee 11 ай бұрын

thank u!!

@Dreadwinner 2 жыл бұрын

🧡🖤

@richardfoldesi3980 2 жыл бұрын

best

@far1din 2 жыл бұрын

🚀

@Leuel48Fan Жыл бұрын

4:24 kind of wild how after this layer there's no resemblance or perceived logic to our eyes how it's characterizing the digit. Can you show maybe 10 examples each of the 10 different digits at this step (just the output)? I wonder if they will look similar in pattern to each other right before the fully connected final layer.

@far1din Жыл бұрын

It is very strange how all og this just «works». I will try to do that for one of my next videos! 😊

@ratfuk9340 2 жыл бұрын

4:10 How are the outputs of the two convolutions for each filter combied to get 1 filter map / filter?

@far1din 2 жыл бұрын

So, you multiply the values from one filter with the corresponding input layer. Then you add together the outputs + the bias. Then you pass them through an activation funtion (in this case the sigmoid). After this you move the filter one step to the right (since the stride is one) and do the same thing until everything is covered. This is hard to explain in text. I will do my best to get out an in depth video by this weekend where everything will be explained in detail! 💯

@ratfuk9340 2 жыл бұрын

@@far1din I think I get it. Anyway thanks for making these videos, they help a lot.

@far1din 2 жыл бұрын

@@ratfuk9340 kzbin.info/www/bejne/oHXIZnV3qLllY7s

@ahamedismail1630 Жыл бұрын

So we use filters to catch the features, and maxpooling to capture the most active feature? Thus if we use maxpooling of higher dimensions, we loose the other less significant features to reduce the model complexity but results in loss of information?

@far1din Жыл бұрын

That is the idea and it seems to be working in practice. However, people use average-pooling and min-pooling also. It all comes down to what yields the best result in the end! :D A lot of this is just trial and error until the best performance is acchieved.

@abdullahhashmi654 2 жыл бұрын

For the convolution, you have two filters through which you pass the original image and you get two images which have been passed by the filter. But for the second convolution, you have 4 filters each for the two images, so a total of 8 filters for the two images. 4 for one and 4 for the other. But in the visualization, dissimilar to the first convolution, you run both the filters through both the images in parallel and then combine them. So instead of getting 8 images from the second convolution, we get 4. Why?

@far1din 2 жыл бұрын

Great question my friend. This is one of the reasons why I felt I had to make this animation. First of all, in the second convolution, we have one “image” consisting of two channels. You can think of an RGB (colored) image. These images consists of three channels that you stack together. See reference 1 for image. This basically implies that if you have a 28 x 28 pixel RGB image, when you extract the values for the three channels (Red, Green and Blue) you will have three matrices. The dimensions are therefore actually 28 x 28 x 3, where 3 is the number of channels. Now, when you build a convolutional neural network, you choose how many filters you want in each layer. If you have X amount of channels in the inputted image, each of the filters will have X amount of channels. Examples: Input = 28 x 28 x 3 and you choose five filters with dimensions 5 x 5, you will technically have five filters that are 5 x 5 x 3. Let’s say you run the convolution. Your outputted activation layer will have the dimensions 24 x 24 x 5. Where 5 is the number of channels. This implies that if you choose to have a convolutional layer next with two filter with the dimensions 7 x 7, these will actually have the dimensions 7 x 7 x 5. I hope this made sense and I know it can be a bit confusing. I highly recommend you watch Andrew NG’s video on convolutions over volumes (Reference 2) as he explains the concept very well. Reference 1: media.geeksforgeeks.org/wp-content/uploads/Pixel.jpg Reference 2: kzbin.info/www/bejne/gYWlkIJ8pKaEmcU

@nithyaprakash5812 Жыл бұрын

Great job.. U got a new subscriber 🎉😊

@ektor1212 9 күн бұрын

ma la rete come fa a riconoscere il numero iniziale se dopo il maxpooling finale e il layer piatto finale non si capisce il numero di riferimento?

@vivek65522 Жыл бұрын

Great Video, a quick question, does kernel values (in every conv. layer) changes during training of CNN? Does the kernel changes it self so as to only extract exactly the shapes and pattern required for classification ?

@kneehaw8093 Жыл бұрын

Convolutional kernels will train their weights such that they “learn” whatever feature detection is most desirable. I believe that is a “yes” to your question.

@far1din Жыл бұрын

1. Does the kernel values change? It depends. Normally it does, but you can also freeze the network and train for example only the last 3 layers. This is common for transfer learning: kzbin.info/www/bejne/fIKwYmZ-oKqZeM0 2. It definetly looks like that is the case if you look at a network post training. Why it extract certain shapes and patterns are still unknown for me.

@leesohaeng Жыл бұрын

hi, how to make 5*5 kernel? just random?

@far1din Жыл бұрын

It is initialized randomly if that is what you were asking! :)

@triplemmm3 2 жыл бұрын

What do use use for making videos ?

@far1din 2 жыл бұрын

Manim! www.manim.community

@Brandonator24 5 ай бұрын

I'm curious, why is the first convolution using ReLU and then later convolutions using sigmoid? Edit: Also, when convolving over the previous convolution-max pooling output, we have two 2 images, how are the convolutions from these two separate images combined? Is it just adding them together?

@far1din 5 ай бұрын

Hey Brandon! 1. The ReLU and Sigmoid are just serving as examples to showcase the different activation functions. This video is just a «quick» visualization from the longer in depth version. 2. Not sure if I understood, but if your referring to the filters, they are added. I go through the math behind this with visualizations in the in depth video. I believe it should clarify your doubts! 😄

@Brandonator24 4 ай бұрын

@@far1din Will be checking that out, thanks!

@very_old_dog Жыл бұрын

I didn't get why in 1st convolution after passing through 5*5 filter image of size 28*28 we've ended up with image of size 24*24. Stride is 1 hence filter should perfectly marsh till the end of the image with no bypassing certain pixels. Probably I miss smth but what?

@far1din Жыл бұрын

Brother, for example, let’s say you have an image with the size of 7 x 7 and a filter that is 5 x 5. You start at the left, where the left side of the filter touch the left side of the image. You slide it to the end where the filters **right** side touch the right side of the image. Since the filter is 5 x 5, you can only slide it two times to the right. Essentially, 5 «squares» of the image is already covered when you start, and there is only two more left. I will highly recommend you draw this out and «scale down» the image to 7 x 7 or something to get some clarification. I’ll also link a video that show the process. Watch the animations in this video for clarification: kzbin.info/www/bejne/qoK8i5R6o8Seick

@very_old_dog Жыл бұрын

@@far1din Bro, I've got it !!! Thank You so much for a great clarification and a link. That helped a lot.

@paedrufernando2351 Жыл бұрын

why use 2nd convolution with 8 filters

@far1din Жыл бұрын

Hello my friend, the number of filters are typically choosen. A lot of deep learning is trial and error. I used 4 filters in the second convilution and these filters had two channels. Therefore, technically four filters. There is no underlying reason for this as this video was just meant to visualize the process. I could have gone with 10 filters, but the video would have been impossible to render.

@ritwiksharma4359 2 жыл бұрын

Why there are 2 parts of each filter in layer 2? Is this just for the different color channels?

@far1din Жыл бұрын

Yes my friend. If there where four channels that were convoluted over, then each filter would have had four channels as well. The amount of channels in the filter will mirror that of the input! 💯

@phanstudio Жыл бұрын

How do you create the visualizations

@far1din Жыл бұрын

Manim! docs.manim.community/en/stable/

@figloalds Жыл бұрын

If you start from the last layer with a 1 on the 7 and 0 on the rest then do the reverse process, how would the image look like?

@far1din Жыл бұрын

My friend, I do not understand. Could you explain in detail? :)

@figloalds Жыл бұрын

@@far1din i mean trying to reverse the process from a given output all the way to a generated image and see a visualization of what the network "understands" of a 7 for example. I was seeing some videos of people reversing AI outputs in this way for deep dream and gpt and they found very weird stuff, like for example very weird tokens in GPT that causes the model to completely break

@Leuel48Fan Жыл бұрын

@@figloalds I believe I've seen this tried somewhere and it'll look nothing like a '7'. Basically it can't generate images, you'd just get random gibberish static looking pixels turned on. I also think if you input a smiley face or letter, unless specifically trained for "not a number" scenario, you may get an inconclusive output or falsely confident on one number. Even though the math of the process checks out, nothing about this whole concept makes any intuitive sense, which is why I find it fascinating tbh lol

@far1din Жыл бұрын

@@Leuel48Fan Yes haha. I believe I have seen this been done on a dumbbell. It was very noisy, but you could still see the outer line. However, I was unable to find it now 🤷🏾‍♀️

@far1din Жыл бұрын

@@figloalds Ahaa, yes I know what you are talking about. This is a great idea. This would be what the computer perceives as the ideal «7» or ideal «1». I will try to make a video on that! 🚀

@emotional_stuff 2 жыл бұрын

Love it! Sub+

@gamercatsz5441 Жыл бұрын

Then you verify it’s a 7,and the AI algorithm gets better. Does the machine learn what is a 7? or does it learn how to get positive feedback from us. If we use advanced AI to ask advanced questions. Do we actually know for certain, it is not lying to us to get good feedback? Advanced AI like chat gtp 4 which is rapidly advancing will already is much smarter than 99% of humans, potentially is a lying manipulating horrible thing. It’s better to make 100% sure it is truthfull and fully understand intentions before making it smarter and more powerfull, train it woth more data, more computers, ect…

@far1din Жыл бұрын

Hello, my friend. Here is my take on this topic. First of all, these kinds of AI models are mathematical models that learn to get the "correct" response. This is called supervised learning, where the model is trained on labelled data. It is hard to define whether or not a model is lying, as you have to define lying. Is it lying if the answers are correct? The concerns are valid, as models can converge towards outputting responses that are not correct but perceived as such. This is something researchers and users should be wary about. On the other hand, human intelligence and deep neural networks seem to take different paths. Even though deep neural networks are inspired by how the brain works and are capable of many of the "intelligent tasks" we throw at them, the learning process is not the same. Modern deep learning models use backpropagation while humans don't (or at least, that is the perception for now). Although we come to the same answer, the consciousness factor might differ. Therefore, it is hard to tell if a model is lying because lying is something I would like to think of as a conscious act. You may find it beneficial to watch an interview with Geoffrey Hinton, as he discusses some of the concerns you've raised: kzbin.info/www/bejne/p6HSg4JpbJ2kiLs

@justinnejbauer 7 ай бұрын

@@far1dinI would argue that humans do use back propagation. When you make a mistake in life, how many times have you looked back to ask yourself what you would have done differently to minimize the negative effects.