Why do Convolutional Neural Networks work so well?

Рет қаралды 39,743

Күн бұрын

While deep learning has existed since the 1970s, it wasn't until 2010 that deep learning exploded in popularity, to the point that deep neural networks are now used ubiquitously for all machine learning tasks. The reason for this explosion is the invention of the convolutional neural network. This remarkably simple architecture allowed neural networks to be trained on new kinds of data which were previously thought impossible.
In this video I discuss what a convolutional neural network is, why it is needed, what it can and cannot do, and why it works so damn well.
00:00 Intro
01:18 The curse of dimensionality
06:39 Convolutional neural networks
13:09 The spatial structure of images
15:06 Conclusion

Пікірлер: 90

@algorithmicsimplicity Жыл бұрын

Transformer video coming next! I'm still getting the hang of animating, but the transformer video probably won't take as long to make as this one. I haven't decided what I will do after that, so if you have any suggestions/requests for computer science, mathematics or physics topics let me know.

@bassemmansour3163 Жыл бұрын

what program are you using for animation? thanks!

@algorithmicsimplicity Жыл бұрын

I'm using the python package manim: github.com/ManimCommunity/manim

@davidmurphy563 Жыл бұрын

I'd say probably RNNs would flow nicely from this [excellent] video. GANs too I guess. Autoencoders for sure. Oh, LSTMs, the memory problem is a fascinating one. Oh and Deep Q-Networks. Meh, the field is so broad you can't help but hit. I'd say RNNs first as going from images to text seems a natural progression.

@wissemrouin4814 Жыл бұрын

@@davidmurphy563 yess please, I guess RNNs needs to be presented even before transformers

@davidmurphy563 Жыл бұрын

@@wissemrouin4814 Yeah, I would agree with you there. RNNs serve as a good introduction to a lot of the approaches you'll see for sequence-vector problems and its drawbacks explains the development of transformers. I'd suggest RNNs then LSTNs then transformers. That said, this channel has done sterling work explaining everything so far so I'm sure he'll do a great job even if he dives straight into the deep end.

@dradic9452 Жыл бұрын

Please make more videos. I've been watching countless neural networks videos and until I saw your two videos I was still lost. You explained it so clearly and concisely. I hope you make more videos.

@ozachar 9 ай бұрын

As a physicist, I recognize this process as "real space renormalization group" procedure in statistical mechanics. So each layer is equivalent to a renormalization step (a coarse graining). The renormalization flows are then the gradual flow towards a resolution decision of the neural net. It makes the whole "magic" very clear conceptually, and also automatically points the way for less trivial renormalization procedures known in theoretical physics (not just simple real space coarse graining). The clarity of videos like yours is so stimulating! Thanks

@warpdrive9229 7 ай бұрын

Bingo!

@IllIl Жыл бұрын

Dude, your teaching style is absolutely superb! Thank you so much for these. This surpasses any of the explanations I've come across in online courses. Please make more! The way you demystify these concepts is just in a league of its own!

@Number_Cruncher Жыл бұрын

This was a very cool twist in the end with the rearranged pixels. Thx, for this nice experiment.

@nananou1687 Ай бұрын

This is genuinely one of the best videos I have ever seen! No matter the type of content. You have somehow made one of the most complicated topic, and simply distilled it to this. Brilliant!

@rohithpokala 6 ай бұрын

Bro ,you are real super man.This video gave so many deep insights in just 15 mintues providing so much strong foundation. I can confidently say,this video single handedly throwed 1000's of neural networks videos present on the internet.You raised the bar so high for others to compete.Thanks.

@j.j.maverick9252 Жыл бұрын

another superb summary and visualisation, thank you!

@bassemmansour3163 Жыл бұрын

best illustrations in the subject. thank you for your work!

@Embassy_of_Jupiter Жыл бұрын

Your videos are extremely good, especially for such a small channel

@djenning90 9 ай бұрын

Both this and the transformers video are outstanding. I find your teaching style very interesting to learn from. And the visuals and animations you include are very descriptive and illustrative! I’m your newest fan. Thank you!

@jollyrogererVF84 9 ай бұрын

A brilliant introduction to the subject. Very clear and informative. A good base for further investigation.👍

@jcorey333 4 ай бұрын

This is one of the best explanations I've seen! Thanks for making videos

@neilosborne8682 9 ай бұрын

@4:17 Why is it 9^N points required to densely fill N dimensions? Where is 9 being derived from? Is it for the purpose of the example given - or a more general constraint?

@algorithmicsimplicity 9 ай бұрын

It is a completely arbitrary number just for demonstration purposes. In general, in order to fill a 1d interval of length 1 to a desired density d you need d evenly spaced points. To maintain that density for n-d volume you need d^n points. I just chose d=9 for the example. And the more densely filled the input space is with training examples, the lower the test error of a model will be.

@senurahansaja3287 9 ай бұрын

@@algorithmicsimplicity thank you for ur explanation but in here kzbin.info/www/bejne/bpqslYp-n9GYf9U dimensional points mean the input dimension right ?

@algorithmicsimplicity 9 ай бұрын

@@senurahansaja3287 Yes that's correct.

@illeto Ай бұрын

Fantastic videos. Here before you inevitably hit 100k subscribers.

@khoakirokun217 Ай бұрын

I love that you point out that we have "super human capability" because we are pre trained with assumption about the spatial information :D TLDR: "we are sucked" :D

@connorgoosen2468 Жыл бұрын

How has the KZbin Algorithm not suggested you sooner? This is such a great video, just subscribed and keen to see how the channel explodes!

@manthanpatki146 5 ай бұрын

Man, keep making more videos, this is a brilliant video

@sergiysergiy8875 8 ай бұрын

This was great. Please, continue your content

@PotatoMan1491 15 күн бұрын

Best video I found for explaining this topic

@user-sg4lw7cb6k 9 ай бұрын

Your videos are extremely good, especially for such a small channel. Great video! Can do one in Recurrent Neural Networks please .

@thomassynths 7 ай бұрын

This is by far the best explanation of CNNs I have ever come across. The motivational examples and the presentation are superb.

@escesc1 Ай бұрын

This channel is top notch quality. Congratulations!

@GaryBernstein 9 ай бұрын

Can you explain how the NN produces the important-word-pair information-scores method described after 12:15 from the sentence problem raised at 10:17? Can you recommend any tg groups for this Q & topic?

@5_inchc594 Жыл бұрын

amazing content thanks for sharing!

@benjamindilorenzo 3 ай бұрын

The best video on CNN´s. Please make a video about V-Jepa, the proposed SSL Architecture from Yann LeCun. Also it would be nice to have a deeper look at Diffusion Transformers or Diffusion in general. Really really good work man!

@jorgesolorio620 Жыл бұрын

Great video! Can do one in Recurrent Neural Networks please 🙏🏽

@reubenkuhnert6870 9 ай бұрын

Excellent content!

@joshmouch 8 ай бұрын

Yeah. Jaw dropped. This is an amazing explanation. More please.

@jamespogg 9 ай бұрын

amazing vid good job man

@anangelsdiaries 4 ай бұрын

Fam, your videos are absolutely amazing. I finally understand what the heck a CNN is. Thanks a lot!

@justchary 9 ай бұрын

I do not know who you are, but please continue! You definitely have a wast knowledge on the subject, because you can explain complex things simply.

@nadaelnokaly4950 2 ай бұрын

wow!! ur channel is a treasure

@terjeoseberg990 10 ай бұрын

I believe that the main advantage to convolutional neural networks over fully connected neural networks is the computational savings and the increased training data. A convolutional neural network is basically a tiny fully connected network that’s being trained on every NxN square on every imaginable. This means that a 256x256 image is effectively turned into 254x254 or 64,516 tiny images. If you start with 1 million images in your training data, you now have 64.5 billion 3x3 images that you’re going to train the tiny neural network on. You can then create 100 of these tiny neural networks for the first layer, another 100 for the second layer, and another 100 for the third layer, and so on for 10 to 20 layers.

@algorithmicsimplicity 10 ай бұрын

I think that these 2 reasons are the most commonly cited reasons for the success of CNNs (along with translation invariance, which is absolutely incorrect), but I don't think that these 2 things are sufficient to explain the success of the CNN. It is true that CNN uses much less computation than fully connected neural networks, but there are other ways to make deep neural networks which are just as computationally efficient as CNNs. For example, using a MLP-Mixer style architecture in which a linear transform is first applied independently across channels to all spatial locations, and then a linear transform is applied independently across spatial locations to all channels. In fact, this is exactly what I used when making this video! The "Deep Neural Network" I used was precisely this, it would have taken too long to train a deep fully connected neural network. This MLP-Mixer variant uses the same computation as CNN, but allows each layer to see the entire input. Which is why it achieves less accuracy than a CNN. As for the increased training data size, it is possible this helps but even if you multiply your dataset size by 100,000, it is still nowhere near the amount of data you would expect to need to learn in 256*256 dimensional space. Also, if it was merely the increased training data, then I would expect CNNs to perform better than DNNs even on shuffled data (after all, having more data should still help in this case). But in fact we observe the opposite, CNNs perform worse than DNN when the spatial structure is destroyed. For these reasons I believe that the fact that each layer sees a low effective dimensional input is necessary and sufficient to explain the success of CNNs.

@terjeoseberg990 10 ай бұрын

@@algorithmicsimplicity, It’s a combination of multiplying the dataset size by 64,500 and reducing the network size from 256x256 to 3x3. In fact it’s the reduction of the network size to 3x3 that’s allowing the effective 64,500 times increase in dataset size. It’s not one or the other, but both. Each weight gets a whole lot more training/gradient following. You should do a video on the MLP-Mixer, and how it compares to CNN.

@user-to4hq2nm1m 8 ай бұрын

Really nice! What tool did you use to do those awesome animations?

@algorithmicsimplicity 8 ай бұрын

This was done using Manim ( www.manim.community/ )

@Isaacmellojr 5 ай бұрын

Mais videos por favor! Vc tem o dom!!

@pedromartins9889 4 ай бұрын

Great video. You explain things really well. My only complain is that you don't cite references. Citing references (which can be made simply as a list in the description) makes your less obvious statements more sound (like the fact that the quantity of significant outputs of a layer is more or less constant and small, I understand it would be very hard to explain it maintaining the flow of the video, but if there was in the description some link to that explanations or at least to a practical demonstration, the viewer could, if wanted, understand it better or at least be more sure that it is really true). Citing references also helps the viewer a lot if he wants to further study the topic (and this is fair, since you already made the rersearch for the video, so it costs you way less to show your sources than to the viewer to rediscover them). In summary: citing references gives you more credibility (in a digital world filled with so much bullshit) and gives a great deal of help to interested viewers to go deeper on the topic. Don't be mistaken, I really like your channel.

@HD-Grand-Scheme-Unfolds Жыл бұрын

@AlgorithmicSimplicity greetings, may I ask: in you video presentation could you please specify in what sense do you mean by "Randomly re-order the pixels" (13:55)? let me explain my question. Although I know you mean reshuffling the permutation order of the set of input pixels; when I said in what sense I meant: is it (A-> a unique random re-order seed for each training example (as in for every picture) |OR| (B-> the same random re-order seed for each training example? if you meant in the sense of "A" I would be amazed the convolution-net can get that 62.9% accuracy you mentioned earlier. That 62.9% would be more believable for me if you meant in the sense of "B".

@algorithmicsimplicity Жыл бұрын

I meant in the B sense, same shuffle applied to every image in the dataset (training and test). If it was a different random shuffle for each input then no machine learning model (or human) would ever get above 10% accuracy. If you have some experience with machine learning, this operation is equivalent to shuffling the columns of a tabular dataset which of course all standard machine learning algorithms are invariant to.

@HD-Grand-Scheme-Unfolds Жыл бұрын

@@algorithmicsimplicity lol, speaking from in hindsight, your point in now taken. Dwl lmao😄🤣. Which human or person.... but let me be the devil's advocate for entertainment and curiosity purposes a bit: I it was somehow in sense "A", then I'd imagine that imply a phenomenon we all may call pure memorization at its finest. But to go back on main track, I love that you went out of the way to make that clear in you presentation, yours is the second video that mentioned, but you were to first to settle the question the big question (that I already asked you, thanks again). by the way in the name of opportunity sake I would like to ask: Do you know where a non-programmer person may source a intuitive interactive GUI based executable program that simulate and implement recurrent neural networks (especially if its the simple RNN, prefer against but will accept LSTMs or GRUs)? Github for example mostly accommodates those who meet coding knowledge prerequisite. "MemBrain" (meets concept but its RNN is still puzzling for me to figure out, and train test etc) (but its the most promising one to try work with so far) and "Neuroph Studio" (meets the concept but have no RNN support) and "Knime Analytics Platform" is likened onto coding skills, in disguise as GUI with click and parameter controls. rules for arrangments are too complex, and counter intuitive. IBM Watson studio seems similar and matlab is a puzzlebox too.

@algorithmicsimplicity Жыл бұрын

I'm afraid I don't know of any GUI programs that simulate RNNs explicitly, but I do know that RNNs are a subset of feedforward NNs. That is, it should be possible to implement an RNN in any of those programs you suggested. All you would need to do is have a bunch of neurons in each layer that copy the input directly (i.e. the i'th copy neuron should have a weight of 1 connected to the i'th input and 0 for all other connections), and then force all neuron weights to be the same in every layer. That will be equivalent to an RNN. I would also recommend you just try and program such an app yourself. Even if you have no experience programming, you can just ask ChatGPT to write the code for you 😄.

@neithanm 9 ай бұрын

I feel like I missed a step. The layers on top of the horse looked like an homogeneous color. Where's the information? I was expecting to see features from small parts to recognizing the horse, but ...

@Emma2-cg5jh Ай бұрын

Where does the Performance value for the rearranged images come from? Did you made Them By Yourself or is there a paper for that?

@algorithmicsimplicity Ай бұрын

All of the accuracy scores in this video are from models I trained myself on CIFAR10.

@scarletsence Жыл бұрын

This god like visualizations thanks.

@pypypy4228 8 ай бұрын

It's brilliant!

@bobuilder4444 Ай бұрын

13:09 How would you know which numbers to remove?

@algorithmicsimplicity Ай бұрын

You can simply order the weights by absolute value and remove the smallest weights (the ones closest to 0). This probably isn't the best way to prune weights, but it already allows you to prune about 90% of them without any loss in accuracy: arxiv.org/abs/1803.03635

@bobuilder4444 Ай бұрын

@@algorithmicsimplicity Thank you

@Walczyk 4 ай бұрын

7:04 this is just like the boost library from microsoft

@ThankYouESM 8 ай бұрын

Seems like the bag-of-words algorithm can do a faster job at image recognition since it doesn't need to read a pixel more than once.

@yourfutureself4327 9 ай бұрын

💚

@uplink-on-yt 9 ай бұрын

12:58 Wait a minute... Did you just describe neural pruning, which has been observed in young human brains?

@nageswarkv 8 ай бұрын

definitely good video, not fluff video

@peki_ooooooo Жыл бұрын

Hi, how's the next video?

@solaokusanya955 Жыл бұрын

So technically, what the computer sees or not is high dependent on "whatever" we has humans dictate it to be...

@blonkasnootch7850 8 ай бұрын

Thank you for the video. I am not sure if it is right to say that humans have build knowledge into the brain about how the world works from birth.. accepting vision input for data processing, detecting objects or separating regions of interest is something every baby has to clearly learn. I have seen that with my children it is remarkable but not there from beginning.

@algorithmicsimplicity 8 ай бұрын

Of course children still need to learn how to do visual processing, but the fact that children can learn to do visual processing implies that the brain already has some structure about the physical world built into it. It is quite literally impossible to learn from visual inputs alone, without any prior knowledge.