Can you really use ANY activation function? (Universal Approximation Theorem)

  Рет қаралды 28,813

Premature Abstraction

Premature Abstraction

Күн бұрын

Пікірлер: 119
@GrinddMaster
@GrinddMaster 14 күн бұрын
Can't believe I watched this video for free.
@oleonardohn
@oleonardohn 14 күн бұрын
As long as it is not a linear activation, it should approximate arbitrary functions when you increase the number of parameters. The reason why linear activations do not work is that the system becomes a linear transformation, so it can only approximate linear functions.
@PrematureAbstraction
@PrematureAbstraction 13 күн бұрын
I made a short about this! kzbin.info/www/bejne/m4nHh3R3e6iefrc
@FunctionallyLiteratePerson
@FunctionallyLiteratePerson 13 күн бұрын
It can be linear for digital systems, given floating point inaccuracies. They won't be the most effective, but they do work to some extent! (see a video called GradlEEEnt half decent on KZbin)
@fnytnqsladcgqlefzcqxlzlcgj9220
@fnytnqsladcgqlefzcqxlzlcgj9220 13 күн бұрын
God I love that video so much​@@FunctionallyLiteratePerson
@PrematureAbstraction
@PrematureAbstraction 13 күн бұрын
@@FunctionallyLiteratePerson All of Suckerpinch's videos can be really recommended!
@BillPark-ey6ih
@BillPark-ey6ih 9 күн бұрын
I am not a math expert, but is there a function-like object that linear activation can approximate? I feel like there could be a pattern.
@Kokurorokuko
@Kokurorokuko 12 күн бұрын
Interesting take-home message. I would never think that just the non-linearity itself is so important.
@dkapur17
@dkapur17 12 күн бұрын
So any neural network with arbitrary number of layers without non-linearities can be reduced to a single matrix multiplication. Non-linearity is what gives these networks so much power.
@goatfishplays
@goatfishplays 12 күн бұрын
Anyone else expect to hear like lecture hall applause at the end of the video lmaooo, that was really good
@samevans4834
@samevans4834 12 күн бұрын
The clip of the lecture hall might have primed your brain a bit lol Very good content nonetheless! I didn't notice the AI voice until about halfway through with some weird inflection.
@ProgrammingWithJulius
@ProgrammingWithJulius 13 күн бұрын
Stellar video! As another KZbinr who recently started, I wish you all the best :) I know now how much effort it takes to make these videos. Great use of manim, too.
@PrematureAbstraction
@PrematureAbstraction 13 күн бұрын
@ProgrammingWithJulius Thank you! Your videos also sound fun, subbed. :) Yeah getting started with manim was a pain at first, but after two or three videos you're really picking up speed.
@eliasbouhout1
@eliasbouhout1 12 күн бұрын
Didn't even notice it was an AI voice, great video
@jartho3996
@jartho3996 10 күн бұрын
wait are you srs?? was just about to comment saying he had a nice voice ;-;
@DeclanMBrennan
@DeclanMBrennan 9 күн бұрын
That was great. You distilled the essence of a complex topic and presented it in a crystal clear and entertaining way.
@milandavid7223
@milandavid7223 13 күн бұрын
Honestly the most surprising result was the performance of sine&square
@peppermint13me
@peppermint13me 12 күн бұрын
What if you used a neural network to approximate the optimal activation function for another neural network?
@PrematureAbstraction
@PrematureAbstraction 12 күн бұрын
This is more or less how they came up with Swish: arxiv.org/abs/1710.05941
@peppermint13me
@peppermint13me 11 күн бұрын
@@PrematureAbstraction Cool!
@The_JPo
@The_JPo 13 күн бұрын
keep it up. i watch a lot of these sort of videos, and normally they don't pull me in. this one did
@mmmusa2576
@mmmusa2576 13 күн бұрын
These is really good. If this really an AI voice it’s so natural lol
@sodiumfluoridel
@sodiumfluoridel 5 күн бұрын
I was literally just thinking about this, awesome video
@doormango
@doormango 13 күн бұрын
Hang on, don't we need non-polynomial activation functions for the Universal Approximation Theorem? You gave x^2 as an example activation function...
@schmeitz_
@schmeitz_ 12 күн бұрын
I was wondering the same..
@Kokurorokuko
@Kokurorokuko 12 күн бұрын
I'll wait for the reply here
@PrematureAbstraction
@PrematureAbstraction 12 күн бұрын
@doormango You are correct, I oversimplified this a bit. When the activation function is polynomial, the model can only approximate polynomial functions. In my experiment, this seems to be enough to achieve a reasonable accuracy, but of course, it's too restricting for real-world problems.
@alperakyuz9702
@alperakyuz9702 8 күн бұрын
​@@PrematureAbstractionbut arent polynomials dense in the C(D) where D is compact, by weierstrass approximation theorem any continuous function on a compact domain can be approximated uniformly by a polynomial (its 5 am and im drunk out of my mind, i may have misremembered things, so im not sure)
@zacklee5787
@zacklee5787 8 күн бұрын
​@@alperakyuz9702In the universal approximation theorem the activation is fixed, meaning you don't get to choose a new polynomial during training that fits your function better. You're only modifying the linear transformation.
@RaaynML
@RaaynML 14 күн бұрын
AI Generated voices teaching people how to do machine learning, what a time to be alive
@SpinyDisk
@SpinyDisk 14 күн бұрын
What a time to be alive!📄📄
@4thpdespanolo
@4thpdespanolo 13 күн бұрын
This is not a generated voice
@RegularGod
@RegularGod 12 күн бұрын
​@@4thpdespanolo it 100% is, and the uploader confirmed this in the comments section of the video preceding this one in their library. Someone asked, and they replied something like 'Yes, it is synthesized (Elevenlabs)'
@lu_ck
@lu_ck 3 күн бұрын
​@@RegularGod Yep, I officially can't tell AI voices apart anymore, it's joever for humanity
@PrematureAbstraction
@PrematureAbstraction Күн бұрын
@lu_ck Not sure about that, first people can make decent-sounding educational content without having to buy expensive gear or trying to get rid of their accent. :)
@benwilcox1192
@benwilcox1192 12 күн бұрын
Great educational video! I expected this to have a lot more views; keep it up and you'll grow quickly!
@quadmasterXLII
@quadmasterXLII 13 күн бұрын
Did the minecraft activation have zero gradient everywhere? (b/c cubes lol)
@PrematureAbstraction
@PrematureAbstraction 13 күн бұрын
@@quadmasterXLII Interesting question! I used PyTorch for the implementation, and if you don't explicitly define the gradient, it will use its autograd feature. You can read in the docs what the applicable rules for this are, but to make it short, it will estimate/interpolate a reasonable continuous gradient from the sampled values.
@Speed001
@Speed001 7 күн бұрын
I watched this video backwards, you did a good job.
@jasperneo1
@jasperneo1 13 күн бұрын
Just watched the video, and I am shocked this does not have thousands of views
@SpinyDisk
@SpinyDisk 14 күн бұрын
How does this only have 700 views?!
@rukascool
@rukascool 12 күн бұрын
This looks insane on an OLED monitor
@LinkLaine
@LinkLaine 9 күн бұрын
that video should be in my university classes
@SpicyMelonYT
@SpicyMelonYT 10 күн бұрын
I was not ready for the height map statement at the beginning 😂
@sutsuj6437
@sutsuj6437 13 күн бұрын
How did you define the derivative of the Minecraft activation function to use in backprop?
@UQuark0
@UQuark0 13 күн бұрын
Maybe numerical differentiation? Literally taking a neighboring height and subtracting
@PrematureAbstraction
@PrematureAbstraction 13 күн бұрын
UQuark0 is correct, I just let PyTorch autograd do its thing.
@gokusaiyan1128
@gokusaiyan1128 14 күн бұрын
Subbed !! I think i like this channel. hope it grows
@londek
@londek 10 күн бұрын
Yessssssssss, i love this
@sebaitor
@sebaitor 12 күн бұрын
Would have been nice to comment on why some non linear functions are better than others and not only by a marginal amount, which shows that non-linearity while necessary is not sufficient for good models.
@youtube_fantastic
@youtube_fantastic 11 күн бұрын
Fantastic video omg!!! Instant sub
@NoenD_io
@NoenD_io 11 күн бұрын
Can we run doom on a function
@EternalCelestialChambers
@EternalCelestialChambers 12 күн бұрын
Can you please give me the source of the lecture from this timestamp? 0:32 Prof. Thomas Garity?
@PrematureAbstraction
@PrematureAbstraction 12 күн бұрын
It's "On Mathematical Maturity", I can really recommend watching it. kzbin.info/www/bejne/sHm4Yqt-a7SaqZY
@benjaminhogan9669
@benjaminhogan9669 12 күн бұрын
Since pooling is already nonlinear, what would happen if you omitted the activation layer?
@PrematureAbstraction
@PrematureAbstraction 12 күн бұрын
In this limited experiment, using max pooling and no activations actually worked about as good as using ReLU.
@Mega-wt9do
@Mega-wt9do 10 күн бұрын
Only problem I see with this video is that it ended xd
@andersonm.5157
@andersonm.5157 11 күн бұрын
That's gold.
@Garfield_Minecraft
@Garfield_Minecraft 11 күн бұрын
I think the most difficult part about machine learning is "training", "choosing activation function", and "data preparation" because that takes a lot of time, and very difficult to control, it takes lots of time and resources
@Pramerios
@Pramerios 9 күн бұрын
Didn't realize until over 3 minutes in that you were using an AI voice to read your scripts. My, what a world.
@qfurgie
@qfurgie 9 күн бұрын
I don't think it is, is it?
@qfurgie
@qfurgie 9 күн бұрын
oh wait the sigma and tanh functions was kinda weird but it might just be a weird cut
@Pramerios
@Pramerios 9 күн бұрын
@ Nahhh, it's definitely AI. I've heard the voice before, but I didn't want to pay for it!
@Blesdfa
@Blesdfa 8 күн бұрын
@@Pramerioswhat website do you find them in?
@sidneyw.mathiasdeoliveira8621
@sidneyw.mathiasdeoliveira8621 4 күн бұрын
The blocks at 2:40 are unappealing for being tilted to the side. Otherwise, excelent video 👋🏼👋🏼👋🏼
@arcadesmasher
@arcadesmasher 8 күн бұрын
I can't tell if this is a TTS or not...
@IchHabeGerufen
@IchHabeGerufen 13 күн бұрын
bro this is such a good video.. Nice voice, nice animations and overall stile... I wish you all the best. Keep it up!
@giacomodovidio2871
@giacomodovidio2871 9 күн бұрын
I was here hoping that more minecraft involved. Anyway useful video 😊
@sssssemper
@sssssemper 14 күн бұрын
ah great video! new sub
@novantha1
@novantha1 12 күн бұрын
🤔 Makes me wonder how the performance would be if there were some sort of gating mechanism for choosing the most appropriate activation function for any given situation.
@foebelboo
@foebelboo 13 күн бұрын
underrated
@sidneyw.mathiasdeoliveira8621
@sidneyw.mathiasdeoliveira8621 4 күн бұрын
Excelent content 👏🏼👏🏼👏🏼
@Blooper1980
@Blooper1980 14 күн бұрын
Neat.. More please
@skeleton_craftGaming
@skeleton_craftGaming 12 күн бұрын
No the random inputs i use, use the count of alpha particles
@aidanmokalla7601
@aidanmokalla7601 3 күн бұрын
It's affine, but 2x+1 technically isn't a linear function, right? I figure it's not an extremely meaningful difference in this context but I'm not confident it doesn't affect your analysis
@eli_steiner
@eli_steiner 14 күн бұрын
how do you only have such few subs 😶
@anguswetty
@anguswetty 10 күн бұрын
Took me 2 mins before I realized the voice was ai lol
@AIShipped
@AIShipped 13 күн бұрын
This is a great video! Worth the effort. I would love to see more on different activation functions and their performance if that is the direction you would like to go
@purplenanite
@purplenanite 13 күн бұрын
I wonder if you could use this to evolve a good activation function
@PrematureAbstraction
@PrematureAbstraction 12 күн бұрын
Good idea, this actually an active area of research! This is how they came up with Swish: arxiv.org/abs/1710.05941
@purplenanite
@purplenanite 10 күн бұрын
@@PrematureAbstraction huh, i did not know that was how they derived it!
@spamspamer3679
@spamspamer3679 12 күн бұрын
Computer scientists: "Anything in the world is described by functions" Physicists not believing in super determinism: Am I a joke to you
@airman122469
@airman122469 12 күн бұрын
Answer to the physicists: yes.
@spamspamer3679
@spamspamer3679 12 күн бұрын
@airman122469 hahah
@jonathanquang2117
@jonathanquang2117 13 күн бұрын
I'd also be interested in a half-formal proof of the universal approximation theorem instead of just empirical results. Nice video though!
@PrematureAbstraction
@PrematureAbstraction 13 күн бұрын
I thought about including it. Sadly, it's very technical and often limited in its "direct" applicability. E.g. in the theorem itself it is more important that you have enough neurons, not which activation function is used. In practice then you mainly experiment with the number of layers and see what sticks, instead of a theoretical derivation.
@stephaneduhamel7706
@stephaneduhamel7706 13 күн бұрын
So, max pooling with no further activation function would probably work just as well?
@PrematureAbstraction
@PrematureAbstraction 12 күн бұрын
Yes, in my limited experiment it performed about as well as ReLU.
@stephaneduhamel7706
@stephaneduhamel7706 12 күн бұрын
@@PrematureAbstraction Very interesting stuff.
@language-qq8xv
@language-qq8xv 13 күн бұрын
i thought it was minecraft 100 days challenging video. i'm too brain rotted
@MrEliteXXL
@MrEliteXXL 13 күн бұрын
I wonder how would the minecraft+max pooling perform
@PrematureAbstraction
@PrematureAbstraction 13 күн бұрын
In my experiment, it worked about as well as minecraft+avg pooling (a few percent better).
@starship9874
@starship9874 12 күн бұрын
Took me a while to realize the voice was AI
@john.dough.
@john.dough. 14 күн бұрын
this is great! :0
@brummi9869
@brummi9869 13 күн бұрын
How did you train the minecraft network? Doesnt it have the same issue as the step function witha derivative of 0 everywhere?
@markusa3803
@markusa3803 13 күн бұрын
Pretty sure he used each blocks height as a single datapoint, connected linearly.
@PrematureAbstraction
@PrematureAbstraction 13 күн бұрын
Almost, I implemented it as a step function, but did not explicitly define the backwards routine. So PyTorch autograd takes over with subgradients and continuous interpolation (see their docs for the rules).
@danielr3177
@danielr3177 11 күн бұрын
Nice
@zacklee5787
@zacklee5787 8 күн бұрын
You actually can't use the square function as an activation, or any polynomial for that matter.
@mtalons3202
@mtalons3202 10 күн бұрын
Hey what softwares do you use to make videos?? Would love to know that
@PrematureAbstraction
@PrematureAbstraction 10 күн бұрын
Mainly the manim Python library from 3blue1brown. Then some editing in DaVinci Resolve.
@paulwaller3587
@paulwaller3587 13 күн бұрын
obviously not any function will work, the fuctions have to be a unital point-separating subalgebra
@jacobwilson8275
@jacobwilson8275 13 күн бұрын
Which is a very lax restriction. It feels a little pedantic to be so clear.
@Galinaceo0
@Galinaceo0 13 күн бұрын
@@jacobwilson8275 i think it's important to be clear when explaining these things to new people as they might get misconceptions otherwise. Maybe you don't need to be as precise as this, but just saying "nice enough functions" might get the idea across.
@jacobwilson8275
@jacobwilson8275 13 күн бұрын
@@Galinaceo0 agreed
@coolplay20
@coolplay20 14 күн бұрын
high quality educational vid. 🎉Subscribed Thanks for it
@aeghohloechu5022
@aeghohloechu5022 13 күн бұрын
Nvidia 6090 rushing to implement this as dlss 6 instead of adding 2 more giagabytes of vram:
@Aiken-kosh
@Aiken-kosh 12 күн бұрын
Fire
@birdbrid9391
@birdbrid9391 14 күн бұрын
Approximation*
@archonicmakes
@archonicmakes 12 күн бұрын
subbed :)
@codexed-i
@codexed-i 7 күн бұрын
You should choose a different area of the world for that...
@MercuriusCh
@MercuriusCh 9 күн бұрын
Oh shit, here we go again... broadening continuous with compact domain functions to all function is inaccurate af. Even real word is not always continuous wtf
@Neil001
@Neil001 13 күн бұрын
Your videos are really great but i'd really rather listen to your real voice, the AI one is just too jarring
@thecoldlemonade3532
@thecoldlemonade3532 12 күн бұрын
great video but AI voice :(
@sleeptalkenthusiast
@sleeptalkenthusiast 2 күн бұрын
this wouldve been 5x cooler if you werent pretending to be 3blue1brown
The Lever Paradox
24:43
Steve Mould
Рет қаралды 774 М.
Why Does Diffusion Work Better than Auto-Regression?
20:18
Algorithmic Simplicity
Рет қаралды 440 М.
Quando A Diferença De Altura É Muito Grande 😲😂
00:12
Mari Maria
Рет қаралды 45 МЛН
It works #beatbox #tiktok
00:34
BeatboxJCOP
Рет қаралды 41 МЛН
I am not sorry for switching to C
11:34
Sheafification of G
Рет қаралды 154 М.
The Particle Swarm Optimization Algorithm
20:18
Premature Abstraction
Рет қаралды 11 М.
I Trained a Neural Network in Minecraft!
18:14
William Yao
Рет қаралды 3 М.
Lattice-based cryptography: The tricky math of dots
8:39
Chalk Talk
Рет қаралды 175 М.
OpenAI would be dumb if they didn't do this... (Watermarking)
10:31
Premature Abstraction
Рет қаралды 20 М.
How I Made A Laptop From Scratch - anyon_e
23:31
Byran
Рет қаралды 534 М.
How Your Brain Chooses What to Remember
17:19
Artem Kirsanov
Рет қаралды 333 М.
Watching Neural Networks Learn
25:28
Emergent Garden
Рет қаралды 1,4 МЛН
The secret behind constants
18:04
MAKiT
Рет қаралды 82 М.