Can you really use ANY activation function? (Universal Approximation Theorem)

Рет қаралды 28,813

Premature Abstraction

Күн бұрын

Пікірлер: 119

@GrinddMaster 14 күн бұрын

Can't believe I watched this video for free.

@oleonardohn 14 күн бұрын

As long as it is not a linear activation, it should approximate arbitrary functions when you increase the number of parameters. The reason why linear activations do not work is that the system becomes a linear transformation, so it can only approximate linear functions.

@PrematureAbstraction 13 күн бұрын

I made a short about this! kzbin.info/www/bejne/m4nHh3R3e6iefrc

@FunctionallyLiteratePerson 13 күн бұрын

It can be linear for digital systems, given floating point inaccuracies. They won't be the most effective, but they do work to some extent! (see a video called GradlEEEnt half decent on KZbin)

@fnytnqsladcgqlefzcqxlzlcgj9220 13 күн бұрын

God I love that video so much@@FunctionallyLiteratePerson

@PrematureAbstraction 13 күн бұрын

@@FunctionallyLiteratePerson All of Suckerpinch's videos can be really recommended!

@BillPark-ey6ih 9 күн бұрын

I am not a math expert, but is there a function-like object that linear activation can approximate? I feel like there could be a pattern.

@Kokurorokuko 12 күн бұрын

Interesting take-home message. I would never think that just the non-linearity itself is so important.

@dkapur17 12 күн бұрын

So any neural network with arbitrary number of layers without non-linearities can be reduced to a single matrix multiplication. Non-linearity is what gives these networks so much power.

@goatfishplays 12 күн бұрын

Anyone else expect to hear like lecture hall applause at the end of the video lmaooo, that was really good

@samevans4834 12 күн бұрын

The clip of the lecture hall might have primed your brain a bit lol Very good content nonetheless! I didn't notice the AI voice until about halfway through with some weird inflection.

@ProgrammingWithJulius 13 күн бұрын

Stellar video! As another KZbinr who recently started, I wish you all the best :) I know now how much effort it takes to make these videos. Great use of manim, too.

@PrematureAbstraction 13 күн бұрын

@ProgrammingWithJulius Thank you! Your videos also sound fun, subbed. :) Yeah getting started with manim was a pain at first, but after two or three videos you're really picking up speed.

@eliasbouhout1 12 күн бұрын

Didn't even notice it was an AI voice, great video

@jartho3996 10 күн бұрын

wait are you srs?? was just about to comment saying he had a nice voice ;-;

@DeclanMBrennan 9 күн бұрын

That was great. You distilled the essence of a complex topic and presented it in a crystal clear and entertaining way.

@milandavid7223 13 күн бұрын

Honestly the most surprising result was the performance of sine&square

@peppermint13me 12 күн бұрын

What if you used a neural network to approximate the optimal activation function for another neural network?

@PrematureAbstraction 12 күн бұрын

This is more or less how they came up with Swish: arxiv.org/abs/1710.05941

@peppermint13me 11 күн бұрын

@@PrematureAbstraction Cool!

@The_JPo 13 күн бұрын

keep it up. i watch a lot of these sort of videos, and normally they don't pull me in. this one did

@mmmusa2576 13 күн бұрын

These is really good. If this really an AI voice it’s so natural lol

@sodiumfluoridel 5 күн бұрын

I was literally just thinking about this, awesome video

@doormango 13 күн бұрын

Hang on, don't we need non-polynomial activation functions for the Universal Approximation Theorem? You gave x^2 as an example activation function...

@schmeitz_ 12 күн бұрын

I was wondering the same..

@Kokurorokuko 12 күн бұрын

I'll wait for the reply here

@PrematureAbstraction 12 күн бұрын

@doormango You are correct, I oversimplified this a bit. When the activation function is polynomial, the model can only approximate polynomial functions. In my experiment, this seems to be enough to achieve a reasonable accuracy, but of course, it's too restricting for real-world problems.

@alperakyuz9702 8 күн бұрын

@@PrematureAbstractionbut arent polynomials dense in the C(D) where D is compact, by weierstrass approximation theorem any continuous function on a compact domain can be approximated uniformly by a polynomial (its 5 am and im drunk out of my mind, i may have misremembered things, so im not sure)

@zacklee5787 8 күн бұрын

@@alperakyuz9702In the universal approximation theorem the activation is fixed, meaning you don't get to choose a new polynomial during training that fits your function better. You're only modifying the linear transformation.

@RaaynML 14 күн бұрын

AI Generated voices teaching people how to do machine learning, what a time to be alive

@SpinyDisk 14 күн бұрын

What a time to be alive!📄📄

@4thpdespanolo 13 күн бұрын

This is not a generated voice

@RegularGod 12 күн бұрын

@@4thpdespanolo it 100% is, and the uploader confirmed this in the comments section of the video preceding this one in their library. Someone asked, and they replied something like 'Yes, it is synthesized (Elevenlabs)'

@lu_ck 3 күн бұрын

@@RegularGod Yep, I officially can't tell AI voices apart anymore, it's joever for humanity

@PrematureAbstraction Күн бұрын

@lu_ck Not sure about that, first people can make decent-sounding educational content without having to buy expensive gear or trying to get rid of their accent. :)

@benwilcox1192 12 күн бұрын

Great educational video! I expected this to have a lot more views; keep it up and you'll grow quickly!

@quadmasterXLII 13 күн бұрын

Did the minecraft activation have zero gradient everywhere? (b/c cubes lol)

@PrematureAbstraction 13 күн бұрын

@@quadmasterXLII Interesting question! I used PyTorch for the implementation, and if you don't explicitly define the gradient, it will use its autograd feature. You can read in the docs what the applicable rules for this are, but to make it short, it will estimate/interpolate a reasonable continuous gradient from the sampled values.

@Speed001 7 күн бұрын

I watched this video backwards, you did a good job.

@jasperneo1 13 күн бұрын

Just watched the video, and I am shocked this does not have thousands of views

@SpinyDisk 14 күн бұрын

How does this only have 700 views?!

@rukascool 12 күн бұрын

This looks insane on an OLED monitor

@LinkLaine 9 күн бұрын

that video should be in my university classes

@SpicyMelonYT 10 күн бұрын

I was not ready for the height map statement at the beginning 😂

@sutsuj6437 13 күн бұрын

How did you define the derivative of the Minecraft activation function to use in backprop?

@UQuark0 13 күн бұрын

Maybe numerical differentiation? Literally taking a neighboring height and subtracting

@PrematureAbstraction 13 күн бұрын

UQuark0 is correct, I just let PyTorch autograd do its thing.

@gokusaiyan1128 14 күн бұрын

Subbed !! I think i like this channel. hope it grows

@londek 10 күн бұрын

Yessssssssss, i love this

@sebaitor 12 күн бұрын

Would have been nice to comment on why some non linear functions are better than others and not only by a marginal amount, which shows that non-linearity while necessary is not sufficient for good models.

@youtube_fantastic 11 күн бұрын

Fantastic video omg!!! Instant sub

@NoenD_io 11 күн бұрын

Can we run doom on a function

@EternalCelestialChambers 12 күн бұрын

Can you please give me the source of the lecture from this timestamp? 0:32 Prof. Thomas Garity?

@PrematureAbstraction 12 күн бұрын

It's "On Mathematical Maturity", I can really recommend watching it. kzbin.info/www/bejne/sHm4Yqt-a7SaqZY

@benjaminhogan9669 12 күн бұрын

Since pooling is already nonlinear, what would happen if you omitted the activation layer?

@PrematureAbstraction 12 күн бұрын

In this limited experiment, using max pooling and no activations actually worked about as good as using ReLU.

@Mega-wt9do 10 күн бұрын

Only problem I see with this video is that it ended xd

@andersonm.5157 11 күн бұрын

That's gold.

@Garfield_Minecraft 11 күн бұрын

I think the most difficult part about machine learning is "training", "choosing activation function", and "data preparation" because that takes a lot of time, and very difficult to control, it takes lots of time and resources

@Pramerios 9 күн бұрын

Didn't realize until over 3 minutes in that you were using an AI voice to read your scripts. My, what a world.

@qfurgie 9 күн бұрын

I don't think it is, is it?

@qfurgie 9 күн бұрын

oh wait the sigma and tanh functions was kinda weird but it might just be a weird cut

@Pramerios 9 күн бұрын

@ Nahhh, it's definitely AI. I've heard the voice before, but I didn't want to pay for it!

@Blesdfa 8 күн бұрын

@@Pramerioswhat website do you find them in?

@sidneyw.mathiasdeoliveira8621 4 күн бұрын

The blocks at 2:40 are unappealing for being tilted to the side. Otherwise, excelent video 👋🏼👋🏼👋🏼

@arcadesmasher 8 күн бұрын

I can't tell if this is a TTS or not...

@IchHabeGerufen 13 күн бұрын

bro this is such a good video.. Nice voice, nice animations and overall stile... I wish you all the best. Keep it up!

@giacomodovidio2871 9 күн бұрын

I was here hoping that more minecraft involved. Anyway useful video 😊

@sssssemper 14 күн бұрын

ah great video! new sub

@novantha1 12 күн бұрын

🤔 Makes me wonder how the performance would be if there were some sort of gating mechanism for choosing the most appropriate activation function for any given situation.

@foebelboo 13 күн бұрын

underrated

@sidneyw.mathiasdeoliveira8621 4 күн бұрын

Excelent content 👏🏼👏🏼👏🏼

@Blooper1980 14 күн бұрын

Neat.. More please

@skeleton_craftGaming 12 күн бұрын

No the random inputs i use, use the count of alpha particles

@aidanmokalla7601 3 күн бұрын

It's affine, but 2x+1 technically isn't a linear function, right? I figure it's not an extremely meaningful difference in this context but I'm not confident it doesn't affect your analysis

@eli_steiner 14 күн бұрын

how do you only have such few subs 😶

@anguswetty 10 күн бұрын

Took me 2 mins before I realized the voice was ai lol

@AIShipped 13 күн бұрын

This is a great video! Worth the effort. I would love to see more on different activation functions and their performance if that is the direction you would like to go

@purplenanite 13 күн бұрын

I wonder if you could use this to evolve a good activation function

@PrematureAbstraction 12 күн бұрын

Good idea, this actually an active area of research! This is how they came up with Swish: arxiv.org/abs/1710.05941

@purplenanite 10 күн бұрын

@@PrematureAbstraction huh, i did not know that was how they derived it!

@spamspamer3679 12 күн бұрын

Computer scientists: "Anything in the world is described by functions" Physicists not believing in super determinism: Am I a joke to you

@airman122469 12 күн бұрын

Answer to the physicists: yes.

@spamspamer3679 12 күн бұрын

@airman122469 hahah

@jonathanquang2117 13 күн бұрын

I'd also be interested in a half-formal proof of the universal approximation theorem instead of just empirical results. Nice video though!

@PrematureAbstraction 13 күн бұрын

I thought about including it. Sadly, it's very technical and often limited in its "direct" applicability. E.g. in the theorem itself it is more important that you have enough neurons, not which activation function is used. In practice then you mainly experiment with the number of layers and see what sticks, instead of a theoretical derivation.

@stephaneduhamel7706 13 күн бұрын

So, max pooling with no further activation function would probably work just as well?

@PrematureAbstraction 12 күн бұрын

Yes, in my limited experiment it performed about as well as ReLU.

@stephaneduhamel7706 12 күн бұрын

@@PrematureAbstraction Very interesting stuff.

@language-qq8xv 13 күн бұрын

i thought it was minecraft 100 days challenging video. i'm too brain rotted

@MrEliteXXL 13 күн бұрын

I wonder how would the minecraft+max pooling perform

@PrematureAbstraction 13 күн бұрын

In my experiment, it worked about as well as minecraft+avg pooling (a few percent better).

@starship9874 12 күн бұрын

Took me a while to realize the voice was AI

@john.dough. 14 күн бұрын

this is great! :0

@brummi9869 13 күн бұрын

How did you train the minecraft network? Doesnt it have the same issue as the step function witha derivative of 0 everywhere?

@markusa3803 13 күн бұрын

Pretty sure he used each blocks height as a single datapoint, connected linearly.

@PrematureAbstraction 13 күн бұрын

Almost, I implemented it as a step function, but did not explicitly define the backwards routine. So PyTorch autograd takes over with subgradients and continuous interpolation (see their docs for the rules).

@danielr3177 11 күн бұрын

Nice

@zacklee5787 8 күн бұрын

You actually can't use the square function as an activation, or any polynomial for that matter.

@mtalons3202 10 күн бұрын

Hey what softwares do you use to make videos?? Would love to know that

@PrematureAbstraction 10 күн бұрын

Mainly the manim Python library from 3blue1brown. Then some editing in DaVinci Resolve.

@paulwaller3587 13 күн бұрын

obviously not any function will work, the fuctions have to be a unital point-separating subalgebra

@jacobwilson8275 13 күн бұрын

Which is a very lax restriction. It feels a little pedantic to be so clear.

@Galinaceo0 13 күн бұрын

@@jacobwilson8275 i think it's important to be clear when explaining these things to new people as they might get misconceptions otherwise. Maybe you don't need to be as precise as this, but just saying "nice enough functions" might get the idea across.

@jacobwilson8275 13 күн бұрын

@@Galinaceo0 agreed

@coolplay20 14 күн бұрын

high quality educational vid. 🎉Subscribed Thanks for it

@aeghohloechu5022 13 күн бұрын

Nvidia 6090 rushing to implement this as dlss 6 instead of adding 2 more giagabytes of vram:

@Aiken-kosh 12 күн бұрын

Fire

@birdbrid9391 14 күн бұрын

Approximation*

@archonicmakes 12 күн бұрын

subbed :)

@codexed-i 7 күн бұрын

You should choose a different area of the world for that...

@MercuriusCh 9 күн бұрын

Oh shit, here we go again... broadening continuous with compact domain functions to all function is inaccurate af. Even real word is not always continuous wtf