Universal Approximation Theorem - The Fundamental Building Block of Deep Learning

Рет қаралды 3,622

Serrano.Academy

Күн бұрын

Пікірлер: 37

@MrProgrammer-yr1ed 10 күн бұрын

What a masterpiece, mind blowing!, I am waiting for video like this long ago.

@andyandurkar7814 2 күн бұрын

Amazing topic and much needed clarity. You are a real teacher and you have passion for teaching!

@coffeezealot8535 8 күн бұрын

waoo what a clear and concise way to present this topic!

@KumR 12 күн бұрын

Always has been looking for an answer to this. Thank you Luis. Lego analogy made it so easy to understand.

@SerranoAcademy 12 күн бұрын

Thank you so much @KumR, I'm glad you liked it!

@TotallyNotARobot__ 3 күн бұрын

Excellent. Thank you.

@dr.mikeybee 12 күн бұрын

You always do a really fine job of explaining difficult material in an easy to understand way. The universal approximation theorem is absolutely key to understanding why neural networks are not stochastic parrots, therefore, the universal approximation theorem is the key to understanding how neural networks learn. Might I suggest that you follow this up with an episode on holistic training?

@SerranoAcademy 12 күн бұрын

Thank you so much! Holistic training, that sounds awesome! I don't know much about it, do you know any good resources?

@neeloor2004able 12 күн бұрын

Absolutely new information and thanks for explaining it in this simple and detail

@SerranoAcademy 12 күн бұрын

@@neeloor2004able thank you! I’m glad you liked it :)

@MrProgrammer-yr1ed 10 күн бұрын

Hey Luis please keep it up, make more video on why neural networks work.👍

@asv5769 10 күн бұрын

Very interesting points at 11:40 about non polynomial functions. Yes on Uni we always learned that we can use Taylor series to approximate nonpolynomial functions with sum of polynomial functions. How beautiful is that?

@asv5769 7 күн бұрын

I hope you will soon make a video about DeepSeek, there are already plenty, with clickbait titles, but it would be nice to have at least one from a professional. I have enjoyed your course about probability in Machine Learning specialisation on coursersa. Keep up the good work. All the best.

@MrProgrammer-yr1ed 9 күн бұрын

Please make a video on how relu makes patterns in neural networks.

@AravindUkrd Күн бұрын

Please do a video that discusses how the Deepseek model is different from other LLMs.

@diemilio 12 күн бұрын

Great video. Thank you!

@SerranoAcademy 12 күн бұрын

Thank you so much, I'm glad you liked it!

@AJoe-ze6go 11 күн бұрын

This sounds functionally identical to a Fourrier series - by adding a sufficient number of the right kinds of simple wave functions, you can approximate any continuous curve.

@chessfighter-r5g 11 күн бұрын

how it maps output and input never seen before , my answer is it splits as chunks and then it produces according to first chunk and then continuation is second chunk + first output and then third chunk + second output and it splits as chunks by cosine similarity if it becomes to big cuts off so this way chunks happens . what do you think about this

@ocamlmail 6 күн бұрын

Thank you so much, fantastic! But what is wrong with polynomials if we can approximate any differentiable (continuous) functions with Taylor series which are polynoms?

@Pedritox0953 12 күн бұрын

Great video! Peace out

@SerranoAcademy 12 күн бұрын

Thank you!

@sanjayshekhar7 12 күн бұрын

Wow! Just wow!!

@SerranoAcademy 12 күн бұрын

:) Thank you!

@neelkamal3357 12 күн бұрын

crazy video

@serhatakay8351 7 күн бұрын

is this going to link to kolmogorov-arnold networks in the following videos?

@SerranoAcademy 7 күн бұрын

@@serhatakay8351 good question! not really. It’s in the same spirit of the Kolmogorov Arnold theorem of universal approximation with only two layers, but other than that there’s no relation.

@tomoki-v6o 12 күн бұрын

what happens in 2D . 7:02

@chessfighter-r5g 12 күн бұрын

Hi , do you explain what is difference between o1 and normal transformers 4o, and why it waits 30 about seconds ,what it makes in that range of time

@SerranoAcademy 12 күн бұрын

That's a great question! Two things that o1 does are RAG, and chain of prompting. RAG means before talking, it searches for the answer, either on Google, or other databases. Chain of prompting means that it first generates an answer, then reads it and elaborates on it by expanding it, and may do this a few times. These methods make it more consistent, and remove hallucination.

@chessfighter-r5g 12 күн бұрын

@@SerranoAcademy Thank you so much , how it maps output and input never seen before , my answer is it splits as chunks and then it produces according to first chunk and then continuation is second chunk + first output and then third chunk + second output and it splits as chunks by cosine similarity if it becomes to big cuts off so this way chunks happens . what do you think about this

@neelkamal3357 12 күн бұрын

I love how it's 2024 and we still don't know what " values in neural networks " actually represent

@SerranoAcademy 12 күн бұрын

Great point! It can mean a lot of things, it could be the outputs, the weights, etc. I'm probably use that term for several things... :)

@dmip9884 12 күн бұрын

waiting for Kolmogorov-Arnold representation theorem descripton as more strong theoretic basis for KAN

@SerranoAcademy 12 күн бұрын

Thanks! Here's a video on the Kolmogorov-Arnold theorem! kzbin.info/www/bejne/pISVmaGjZa-FeM0 (in there, there's a link to one on Kolmogorov-Arnold networks too)