As long as it is not a linear activation, it should approximate arbitrary functions when you increase the number of parameters. The reason why linear activations do not work is that the system becomes a linear transformation, so it can only approximate linear functions.
@PrematureAbstraction13 күн бұрын
I made a short about this! kzbin.info/www/bejne/m4nHh3R3e6iefrc
@FunctionallyLiteratePerson13 күн бұрын
It can be linear for digital systems, given floating point inaccuracies. They won't be the most effective, but they do work to some extent! (see a video called GradlEEEnt half decent on KZbin)
@fnytnqsladcgqlefzcqxlzlcgj922013 күн бұрын
God I love that video so much@@FunctionallyLiteratePerson
@PrematureAbstraction13 күн бұрын
@@FunctionallyLiteratePerson All of Suckerpinch's videos can be really recommended!
@BillPark-ey6ih9 күн бұрын
I am not a math expert, but is there a function-like object that linear activation can approximate? I feel like there could be a pattern.
@Kokurorokuko12 күн бұрын
Interesting take-home message. I would never think that just the non-linearity itself is so important.
@dkapur1712 күн бұрын
So any neural network with arbitrary number of layers without non-linearities can be reduced to a single matrix multiplication. Non-linearity is what gives these networks so much power.
@goatfishplays12 күн бұрын
Anyone else expect to hear like lecture hall applause at the end of the video lmaooo, that was really good
@samevans483412 күн бұрын
The clip of the lecture hall might have primed your brain a bit lol Very good content nonetheless! I didn't notice the AI voice until about halfway through with some weird inflection.
@ProgrammingWithJulius13 күн бұрын
Stellar video! As another KZbinr who recently started, I wish you all the best :) I know now how much effort it takes to make these videos. Great use of manim, too.
@PrematureAbstraction13 күн бұрын
@ProgrammingWithJulius Thank you! Your videos also sound fun, subbed. :) Yeah getting started with manim was a pain at first, but after two or three videos you're really picking up speed.
@eliasbouhout112 күн бұрын
Didn't even notice it was an AI voice, great video
@jartho399610 күн бұрын
wait are you srs?? was just about to comment saying he had a nice voice ;-;
@DeclanMBrennan9 күн бұрын
That was great. You distilled the essence of a complex topic and presented it in a crystal clear and entertaining way.
@milandavid722313 күн бұрын
Honestly the most surprising result was the performance of sine&square
@peppermint13me12 күн бұрын
What if you used a neural network to approximate the optimal activation function for another neural network?
@PrematureAbstraction12 күн бұрын
This is more or less how they came up with Swish: arxiv.org/abs/1710.05941
@peppermint13me11 күн бұрын
@@PrematureAbstraction Cool!
@The_JPo13 күн бұрын
keep it up. i watch a lot of these sort of videos, and normally they don't pull me in. this one did
@mmmusa257613 күн бұрын
These is really good. If this really an AI voice it’s so natural lol
@sodiumfluoridel5 күн бұрын
I was literally just thinking about this, awesome video
@doormango13 күн бұрын
Hang on, don't we need non-polynomial activation functions for the Universal Approximation Theorem? You gave x^2 as an example activation function...
@schmeitz_12 күн бұрын
I was wondering the same..
@Kokurorokuko12 күн бұрын
I'll wait for the reply here
@PrematureAbstraction12 күн бұрын
@doormango You are correct, I oversimplified this a bit. When the activation function is polynomial, the model can only approximate polynomial functions. In my experiment, this seems to be enough to achieve a reasonable accuracy, but of course, it's too restricting for real-world problems.
@alperakyuz97028 күн бұрын
@@PrematureAbstractionbut arent polynomials dense in the C(D) where D is compact, by weierstrass approximation theorem any continuous function on a compact domain can be approximated uniformly by a polynomial (its 5 am and im drunk out of my mind, i may have misremembered things, so im not sure)
@zacklee57878 күн бұрын
@@alperakyuz9702In the universal approximation theorem the activation is fixed, meaning you don't get to choose a new polynomial during training that fits your function better. You're only modifying the linear transformation.
@RaaynML14 күн бұрын
AI Generated voices teaching people how to do machine learning, what a time to be alive
@SpinyDisk14 күн бұрын
What a time to be alive!📄📄
@4thpdespanolo13 күн бұрын
This is not a generated voice
@RegularGod12 күн бұрын
@@4thpdespanolo it 100% is, and the uploader confirmed this in the comments section of the video preceding this one in their library. Someone asked, and they replied something like 'Yes, it is synthesized (Elevenlabs)'
@lu_ck3 күн бұрын
@@RegularGod Yep, I officially can't tell AI voices apart anymore, it's joever for humanity
@PrematureAbstractionКүн бұрын
@lu_ck Not sure about that, first people can make decent-sounding educational content without having to buy expensive gear or trying to get rid of their accent. :)
@benwilcox119212 күн бұрын
Great educational video! I expected this to have a lot more views; keep it up and you'll grow quickly!
@quadmasterXLII13 күн бұрын
Did the minecraft activation have zero gradient everywhere? (b/c cubes lol)
@PrematureAbstraction13 күн бұрын
@@quadmasterXLII Interesting question! I used PyTorch for the implementation, and if you don't explicitly define the gradient, it will use its autograd feature. You can read in the docs what the applicable rules for this are, but to make it short, it will estimate/interpolate a reasonable continuous gradient from the sampled values.
@Speed0017 күн бұрын
I watched this video backwards, you did a good job.
@jasperneo113 күн бұрын
Just watched the video, and I am shocked this does not have thousands of views
@SpinyDisk14 күн бұрын
How does this only have 700 views?!
@rukascool12 күн бұрын
This looks insane on an OLED monitor
@LinkLaine9 күн бұрын
that video should be in my university classes
@SpicyMelonYT10 күн бұрын
I was not ready for the height map statement at the beginning 😂
@sutsuj643713 күн бұрын
How did you define the derivative of the Minecraft activation function to use in backprop?
@UQuark013 күн бұрын
Maybe numerical differentiation? Literally taking a neighboring height and subtracting
@PrematureAbstraction13 күн бұрын
UQuark0 is correct, I just let PyTorch autograd do its thing.
@gokusaiyan112814 күн бұрын
Subbed !! I think i like this channel. hope it grows
@londek10 күн бұрын
Yessssssssss, i love this
@sebaitor12 күн бұрын
Would have been nice to comment on why some non linear functions are better than others and not only by a marginal amount, which shows that non-linearity while necessary is not sufficient for good models.
@youtube_fantastic11 күн бұрын
Fantastic video omg!!! Instant sub
@NoenD_io11 күн бұрын
Can we run doom on a function
@EternalCelestialChambers12 күн бұрын
Can you please give me the source of the lecture from this timestamp? 0:32 Prof. Thomas Garity?
@PrematureAbstraction12 күн бұрын
It's "On Mathematical Maturity", I can really recommend watching it. kzbin.info/www/bejne/sHm4Yqt-a7SaqZY
@benjaminhogan966912 күн бұрын
Since pooling is already nonlinear, what would happen if you omitted the activation layer?
@PrematureAbstraction12 күн бұрын
In this limited experiment, using max pooling and no activations actually worked about as good as using ReLU.
@Mega-wt9do10 күн бұрын
Only problem I see with this video is that it ended xd
@andersonm.515711 күн бұрын
That's gold.
@Garfield_Minecraft11 күн бұрын
I think the most difficult part about machine learning is "training", "choosing activation function", and "data preparation" because that takes a lot of time, and very difficult to control, it takes lots of time and resources
@Pramerios9 күн бұрын
Didn't realize until over 3 minutes in that you were using an AI voice to read your scripts. My, what a world.
@qfurgie9 күн бұрын
I don't think it is, is it?
@qfurgie9 күн бұрын
oh wait the sigma and tanh functions was kinda weird but it might just be a weird cut
@Pramerios9 күн бұрын
@ Nahhh, it's definitely AI. I've heard the voice before, but I didn't want to pay for it!
@Blesdfa8 күн бұрын
@@Pramerioswhat website do you find them in?
@sidneyw.mathiasdeoliveira86214 күн бұрын
The blocks at 2:40 are unappealing for being tilted to the side. Otherwise, excelent video 👋🏼👋🏼👋🏼
@arcadesmasher8 күн бұрын
I can't tell if this is a TTS or not...
@IchHabeGerufen13 күн бұрын
bro this is such a good video.. Nice voice, nice animations and overall stile... I wish you all the best. Keep it up!
@giacomodovidio28719 күн бұрын
I was here hoping that more minecraft involved. Anyway useful video 😊
@sssssemper14 күн бұрын
ah great video! new sub
@novantha112 күн бұрын
🤔 Makes me wonder how the performance would be if there were some sort of gating mechanism for choosing the most appropriate activation function for any given situation.
@foebelboo13 күн бұрын
underrated
@sidneyw.mathiasdeoliveira86214 күн бұрын
Excelent content 👏🏼👏🏼👏🏼
@Blooper198014 күн бұрын
Neat.. More please
@skeleton_craftGaming12 күн бұрын
No the random inputs i use, use the count of alpha particles
@aidanmokalla76013 күн бұрын
It's affine, but 2x+1 technically isn't a linear function, right? I figure it's not an extremely meaningful difference in this context but I'm not confident it doesn't affect your analysis
@eli_steiner14 күн бұрын
how do you only have such few subs 😶
@anguswetty10 күн бұрын
Took me 2 mins before I realized the voice was ai lol
@AIShipped13 күн бұрын
This is a great video! Worth the effort. I would love to see more on different activation functions and their performance if that is the direction you would like to go
@purplenanite13 күн бұрын
I wonder if you could use this to evolve a good activation function
@PrematureAbstraction12 күн бұрын
Good idea, this actually an active area of research! This is how they came up with Swish: arxiv.org/abs/1710.05941
@purplenanite10 күн бұрын
@@PrematureAbstraction huh, i did not know that was how they derived it!
@spamspamer367912 күн бұрын
Computer scientists: "Anything in the world is described by functions" Physicists not believing in super determinism: Am I a joke to you
@airman12246912 күн бұрын
Answer to the physicists: yes.
@spamspamer367912 күн бұрын
@airman122469 hahah
@jonathanquang211713 күн бұрын
I'd also be interested in a half-formal proof of the universal approximation theorem instead of just empirical results. Nice video though!
@PrematureAbstraction13 күн бұрын
I thought about including it. Sadly, it's very technical and often limited in its "direct" applicability. E.g. in the theorem itself it is more important that you have enough neurons, not which activation function is used. In practice then you mainly experiment with the number of layers and see what sticks, instead of a theoretical derivation.
@stephaneduhamel770613 күн бұрын
So, max pooling with no further activation function would probably work just as well?
@PrematureAbstraction12 күн бұрын
Yes, in my limited experiment it performed about as well as ReLU.
@stephaneduhamel770612 күн бұрын
@@PrematureAbstraction Very interesting stuff.
@language-qq8xv13 күн бұрын
i thought it was minecraft 100 days challenging video. i'm too brain rotted
@MrEliteXXL13 күн бұрын
I wonder how would the minecraft+max pooling perform
@PrematureAbstraction13 күн бұрын
In my experiment, it worked about as well as minecraft+avg pooling (a few percent better).
@starship987412 күн бұрын
Took me a while to realize the voice was AI
@john.dough.14 күн бұрын
this is great! :0
@brummi986913 күн бұрын
How did you train the minecraft network? Doesnt it have the same issue as the step function witha derivative of 0 everywhere?
@markusa380313 күн бұрын
Pretty sure he used each blocks height as a single datapoint, connected linearly.
@PrematureAbstraction13 күн бұрын
Almost, I implemented it as a step function, but did not explicitly define the backwards routine. So PyTorch autograd takes over with subgradients and continuous interpolation (see their docs for the rules).
@danielr317711 күн бұрын
Nice
@zacklee57878 күн бұрын
You actually can't use the square function as an activation, or any polynomial for that matter.
@mtalons320210 күн бұрын
Hey what softwares do you use to make videos?? Would love to know that
@PrematureAbstraction10 күн бұрын
Mainly the manim Python library from 3blue1brown. Then some editing in DaVinci Resolve.
@paulwaller358713 күн бұрын
obviously not any function will work, the fuctions have to be a unital point-separating subalgebra
@jacobwilson827513 күн бұрын
Which is a very lax restriction. It feels a little pedantic to be so clear.
@Galinaceo013 күн бұрын
@@jacobwilson8275 i think it's important to be clear when explaining these things to new people as they might get misconceptions otherwise. Maybe you don't need to be as precise as this, but just saying "nice enough functions" might get the idea across.
@jacobwilson827513 күн бұрын
@@Galinaceo0 agreed
@coolplay2014 күн бұрын
high quality educational vid. 🎉Subscribed Thanks for it
@aeghohloechu502213 күн бұрын
Nvidia 6090 rushing to implement this as dlss 6 instead of adding 2 more giagabytes of vram:
@Aiken-kosh12 күн бұрын
Fire
@birdbrid939114 күн бұрын
Approximation*
@archonicmakes12 күн бұрын
subbed :)
@codexed-i7 күн бұрын
You should choose a different area of the world for that...
@MercuriusCh9 күн бұрын
Oh shit, here we go again... broadening continuous with compact domain functions to all function is inaccurate af. Even real word is not always continuous wtf
@Neil00113 күн бұрын
Your videos are really great but i'd really rather listen to your real voice, the AI one is just too jarring
@thecoldlemonade353212 күн бұрын
great video but AI voice :(
@sleeptalkenthusiast2 күн бұрын
this wouldve been 5x cooler if you werent pretending to be 3blue1brown