No video

The Function That Changed Everything

  Рет қаралды 66,794

Underfitted

Underfitted

Жыл бұрын

This is a story about the unreasonable effectiveness of the function that made deep learning possible.
Citations: gist.github.com/svpino/8c34ec...
🔔 Subscribe for more stories: www.youtube.com/@underfitted?...
📚 My 3 favorite Machine Learning books:
• Deep Learning With Python, Second Edition - amzn.to/3xA3bVI
• Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow - amzn.to/3BOX3LP
• Machine Learning with PyTorch and Scikit-Learn - amzn.to/3f7dAC8
Twitter: / svpino
Disclaimer: Some of the links included in this description are affiliate links where I'll earn a small commission if you purchase something. There's no cost to you.

Пікірлер: 228
@InfiniteQuest86
@InfiniteQuest86 Жыл бұрын
This is so good. My colleague and I were joking about how when we were first learning NNs (around 2008), they sucked. They couldn't solve any problems. It was a complete joke compared to anything else like random forests. And now pretty much everyone throws a NN at every problem as a first try no matter what the data or problems are.
@underfitted
@underfitted Жыл бұрын
Thanks!
@johanlarsson9805
@johanlarsson9805 Жыл бұрын
That is not true. It was about 2008 or 2009 I got into ANNs as well and back then youtube were already flodded with videos about ANNS and the MNIST digit recognition sets. Around that time they also started to win competitions for speach regocnition and performed verry well in image recognition. You knew you were riding the wave to the future if you were learning about ANNS back then, it was obvious.
@InfiniteQuest86
@InfiniteQuest86 Жыл бұрын
@@johanlarsson9805 Sorry, we were working on real problems where they sucked still. Plus ReLU didn't really come out until 2010, so I'm not sure how you can say that. It was still very unclear if they were going to be better than the state of the art pre-2010.
@johanlarsson9805
@johanlarsson9805 Жыл бұрын
@@InfiniteQuest86 Not sure but I think you missunderstood. Of course I can say that ANNs didn't suck in 2008 and that they weren't a joke compared to other machine learning, and that statement has nothing at all to do with ReLU so it doesn't matter that ReLU wasn't widely used yet. I mean, if you extrapolated and figured out where things were going it was clear in 2008 that ANNs would be the next big thing. You don't need to mix any of that up with ReLU activation function.
@LeoAr37
@LeoAr37 11 ай бұрын
Pretty sure random forests and gradient boosting are still better at tabular problems
@parlor3115
@parlor3115 Жыл бұрын
That ending sent chills down my spine. As a data scientist and an engineer, it's the things that you don't that don't know are the scariest concepts in our lives.
@underfitted
@underfitted Жыл бұрын
True!
@kodfkdleepd2876
@kodfkdleepd2876 Жыл бұрын
Um, you do realize that as humans we are born in ignorance and remain so? I don't find any concepts scary.
@HesderOleh
@HesderOleh Жыл бұрын
There are quite a few problems that would be literally world changing if we could solve, where people generally think what we lack is hardware or engineering to solve, but actually even with the current technology we would probably be able to solve if we figured out new ways of analyzing data mathematically.
@kodfkdleepd2876
@kodfkdleepd2876 Жыл бұрын
@@HesderOleh So what type of new mathematics does it take?
@atlasflare7824
@atlasflare7824 Жыл бұрын
I think many people just learn what ReLU is, but they mostly miss the point that importance and reasons of invention of it. So, the end of video hints that why we learn also why it's invented and why it is so important to open new doors in DL. Thanks for sharing.
@simonemariani8663
@simonemariani8663 Жыл бұрын
I think this has to be the greatest channel I've discovered in a while. Keep up the good work!
@underfitted
@underfitted Жыл бұрын
Thank you, Simone!
@kurosakishusuke1748
@kurosakishusuke1748 Жыл бұрын
So much stories delivered from this video and your explanation about functions and how they contribute to vanishing problem really puts these ideas together.
@underfitted
@underfitted Жыл бұрын
Thanks! Glad it was informative!
@anondude6361
@anondude6361 Жыл бұрын
For my first neural network project I used this function. I did not know much about neural networks so I did not know that there were other activation functions like tanh and sigmoid
@sn2001
@sn2001 Жыл бұрын
I love this channel. The content here i get as aspiring Machine learning Engineer is unbeatable.
@underfitted
@underfitted Жыл бұрын
Thank you! Really appreciate the comment!
@Elias_IV
@Elias_IV Жыл бұрын
Why is it so good. The narrative and the editing together with dense information feels soooooo good to watch. Thank you very much!
@underfitted
@underfitted Жыл бұрын
Thanks, Elias!
@ioannischristou2362
@ioannischristou2362 Жыл бұрын
to be more exact, the ReLU definitely saturates (zero derivative leading to no updates in the weights) whenever the weighted sum of the inputs is less than zero). This is why people have proposed "leaky ReLU" which "leaks" a bit for negative numbers, so that it's derivative is never zero (except maybe at the origin). Also, to be more precise, the ReLU has worked great for image and audio processing problems, but when working with simple tabular data (e.g. CSV files), it often fails to outperform the old activation functions. Finally, it was the same guy, J. Hinton, that together with Rosenblatt in the 80s were advocating the use of the sigmoid and the tangent function for activations...
@underfitted
@underfitted Жыл бұрын
Frank Rosenblatt died in 1971. You probably meant someone else. The big problem with ReLU are the dying neurons (they stop updating.) Yes this is because ReLU saturates but only for negative values. In practice, however, we have shown this is not a big problem and that's why Leaky ReLU is not as popular as ReLU itself.
@greyhat_gaming
@greyhat_gaming Жыл бұрын
You have a gift for explaining these subjects! I look forward to getting notifications when you’ve uploaded more content. 🎉
@underfitted
@underfitted Жыл бұрын
Thank you!
@roboltamy
@roboltamy Жыл бұрын
This feels like a well-produced video from a multi-million subscriber channel. I'm sure if your future content is anything like this, you have a bright future on youtube. Great video, kept my interest all the way through, learned something new about a topic I'm already somewhat knowledgeable in.
@underfitted
@underfitted Жыл бұрын
Thanks very much! Really appreciate it!
@mareged9978
@mareged9978 Жыл бұрын
Great job man! I followed you on twitter, but since I use it occasionally wasn't aware of your KZbin channel. What a pleasant surprise :), everything explained so clearly and without additional complications, such a great way to get into de meaty maths
@underfitted
@underfitted Жыл бұрын
Thanks!
@frankseidinger2112
@frankseidinger2112 Жыл бұрын
Thanks for sharing your insights. There are at least two things coming to my mind about your relu function. 1.) The graph is similar to the ones that where used in fuzzy logic to be used in (non linear) control units. 2.) Decision making in Gaussian distributions work better with non linear functions, in order to decide if staying with a decision or stop it when the outcome is not predictable. Useful is this e.g. in agile methods like scrum (fail fast with ideas instead of waiting to the end of project) or with futures/options in stock market to either draw or bury an option. Both do heavily influence your cost over time.
@ful7481
@ful7481 Жыл бұрын
This guy so underrated thats sad
@underfitted
@underfitted Жыл бұрын
Thanks!
@riteshpanditi3635
@riteshpanditi3635 Жыл бұрын
He'll grow exponentially, like data in our world
@ful7481
@ful7481 Жыл бұрын
@@riteshpanditi3635 yes lol
@sinfinite7516
@sinfinite7516 Жыл бұрын
Don’t worry, he’s on the come up right now. There’s no doubt the channel will grow bigger.
@underfitted
@underfitted Жыл бұрын
Thanks! Yeah, I'm still learning this KZbin thing. I hope I'll get better with more practice.
@gazbriel
@gazbriel Жыл бұрын
Recently discover your channel, greate videos! I fell in love with the way you explain machine learning concepts. Keep it up.
@underfitted
@underfitted Жыл бұрын
Thanks, Gabriel! Really appreciate your comment!
@shivanshusharma8154
@shivanshusharma8154 Жыл бұрын
I never thought about this concept that much when I studied it.....I think I have look at it again now at deeper level.......Very Good video Cinematics on Point as always..
@persiancarpet5234
@persiancarpet5234 Жыл бұрын
Are there other activation functions that solve the vanishing gradient problem (without exploding either ofc)? Great video btw, I didn't know relu had such a big impact!
@vtrandal
@vtrandal Жыл бұрын
I have the same question. Excellent video.
@iheartalgebra
@iheartalgebra Жыл бұрын
Do you have a reference or link for the clip at 1:03? I would love to see the full video
@nahkaimurrao4966
@nahkaimurrao4966 Жыл бұрын
If you integrate sigmoid you get a differentiable approximation of max(0,x)
@SanKum7
@SanKum7 Жыл бұрын
" Mr. Underfitted ", you are Underrated for your explainations. Great Job man, keep them coming. Goto as past decades as you can. Great Stuff.🎖🏆
@underfitted
@underfitted Жыл бұрын
Much appreciated!
@elevated_minds09
@elevated_minds09 Жыл бұрын
Hello Santiago, One quick question -- The first derivate of ReLU is 1 if x > 0 and 0 x
@hamdaniyusuf_dani
@hamdaniyusuf_dani Жыл бұрын
This video deserves millions more views. One of the best explanation on machine learning I've ever found.
@underfitted
@underfitted Жыл бұрын
Thanks!
@ichwanrasidi5934
@ichwanrasidi5934 Жыл бұрын
You can get a differentiable relu from by looking at its derivative, the step function. It looks like a sigmoid function. Integrate the sigmoid function, and you get the differentiable relu. Shift and scale the sigmoid vertically and you can get a differentiable leaky relu. Another seemingly simple solution to a seemingly complex problem in ML is the residual connection
@devorein
@devorein Жыл бұрын
The production quality and the significance of the content is hands down the best in Machine learning education field.
@underfitted
@underfitted Жыл бұрын
Thanks so much!
@watchthe1369
@watchthe1369 Жыл бұрын
That explains why memory is associative. Trigger Function happens from THAT POINT in the memory as a link. That steep deep peak allso allows you to skip layers of analysis once you notice there are 3 trigger that verify this may be an entirely different class layer. What do lemons limes and oranges have in common and does that mean a grapefruit with similar properties is part of the same......
@merbst
@merbst Жыл бұрын
Something about that rectified linear function makes me think of the tropics! (Tropical Geometry, invented in Brazil, has lots of fun with Max+ & Min-)
@55mikeburns
@55mikeburns Жыл бұрын
When you make a network of mixed Relu and Tanh functions, does it eventually make the weighting of all of the Tanh outputs zero, or does it settle on a mix? Maybe Tanh is better for certain things but not others.
@ugochukwuonyeka1735
@ugochukwuonyeka1735 Жыл бұрын
Wow!! Just wow!! @svpino, I could watch your videos all day long!
@underfitted
@underfitted Жыл бұрын
Thanks!
@michaelduffy5309
@michaelduffy5309 Жыл бұрын
Love the videos, love the daily questions from binomial. I learn so much. Thank you!
@underfitted
@underfitted Жыл бұрын
Thanks!
@tedarcher9120
@tedarcher9120 Жыл бұрын
Actually people wrote neural networks that worked in the 1970s, they just took decades to train with tech of that time
@kikoulikikouli
@kikoulikikouli Жыл бұрын
Your videos keeps getting better and better. Keep up the good work! 👏
@underfitted
@underfitted Жыл бұрын
Thanks man!
@josephhansen1598
@josephhansen1598 Жыл бұрын
Another really good video!! I love your style of explaining - it's new and refreshing on a topic that is usually monotonous and academic
@underfitted
@underfitted Жыл бұрын
Thank you! Really appreciate it!
@benmcreynolds8581
@benmcreynolds8581 Жыл бұрын
This got me so Curious about all the possibilities that we could be overlooking? You never know? Sometimes a 100 people walk by the same place but the 101 person ends up seeing something everyone else overlooked.. certain discoveries in science happen like this too. It keeps me endlessly facinated and curious about the world around me.
@TruthOfZ0
@TruthOfZ0 Жыл бұрын
I have solved the problem years ago by combining these two ways: 1.get rid off activation functions and use f(x)=x,.(it looks similar to what he uses.. funny).. 2. When the Error2 >= Error1 , do a comparison of to epochs errors and the new learning rate r = r/2 if true and if else r=r*(1.3) ....this solves the exploding gradient and other weird things that occur like fluctuating between two values because you have a fixed r of r=0.01 .....now it learns a new learning rate from the error !!
@vtrandal
@vtrandal Жыл бұрын
Great video, but did you actually show why RELU solves the vanishing gradient? If so then I missed it. I'm looking for a mathematical explanation.
@underfitted
@underfitted Жыл бұрын
Although not formally, I tried to explain why it solves it: Gradients don't vanish because the result of the activation function is never small (like it is with Sigmoid and Tanh.)
@CZghost
@CZghost Жыл бұрын
It should be stated that the function IS derivativable. However the resulting derivation doesn't actually yield a continuous function. If you look at the single value, which is zero, you can't determine wether the function value is zero or one. This is the point of disjunction. That point separates the two parts of the function and excludes itself from the definition array. However, once you get to discrete functions (which computers can work with), the discontinuity of the function suddenly doesn't really matter as now you have separated values of the X axis that are not continuous, you've got samples instead of it. That means derivation of such function is stupidly easy, you just subtract current sample from the next one. That's cool. Ironically, computers we know today can work with these kinds of functions that break the laws of continuous functions to work properly. Something that analog computers are not capable of. The reason is that nowadays computers only work with digital signals, which are discrete by nature, each value is a sample and every sample is separated by some amount of time. That time is determined by the sample rate, and that's what makes non-trivial functions work. You can define arbitrary functions this way and derivative function is then trivial to compute, since you can just subtract the current sample from the next one. If there is no next sample (you're on a last one), eighter derivative function is one sample less, or you can make an assumption based on previous values. And that's relatively easy on discrete systems. If it was on an analog system, doing that kind of computation would be impossible.
@patrickjdarrow
@patrickjdarrow Жыл бұрын
It feels a bit incomplete to talk about vanishing gradients without a mention of batch norm. For those wondering how this is handled in practice, check out Andrej Karpathy’s series on NNs from scratch. Good stuff here nonetheless.
@ThankYouESM
@ThankYouESM Жыл бұрын
I now feel like I'm a genius having figured that out by myself 2014 wondering what would happen if I tried to simplify to such a simple function, especially by avoiding decimal places, yes... integers only... because floating numbers are far too unstable... and division is much slower than addition and subtraction. The main reason why I made the attempt to find if there was such a simplistic solution was that I figured it might be the fastest way for me to be a billionaire... which it also seemed like a really fun challenge...and... I was curious to finally figure out how it really works all together. But... it still seems like magic to me to see it find patterns in a dataset of 2 million pairs of six-letter words originally placed in a randomized order whereas if the index of the first 6 is greater than the last 6, mask it as one, otherwise zero. It was shocking to see it found the answer in 3 full cycles without any layers. Oh... and I was also inspired by the bag-of-words algorithm, basically starting from there. However... I not claiming to be the first to discover that as the solution, but... i do have the last-modified verification if that's worth a huge fortune.
@paulbizard3493
@paulbizard3493 Жыл бұрын
Keep cool, don't let yourself get carried away.
@raptoress6131
@raptoress6131 Жыл бұрын
"Machine that will be able to walk, talk, see and write." Great, then it only needs to be awkward and sit in weird positions, and it can impersonate me at work.
@digxx
@digxx Жыл бұрын
It's still interesting though, that nature uses sigmoid type of functions in neurotransmitter signal propagation, which is probably the reason these functions where used in the first place...
@openroomxyz
@openroomxyz Жыл бұрын
Do you think there is an oportunity of training as one person neural networks on sigle GPU for use in games, or visualizations, video ?
@underfitted
@underfitted Жыл бұрын
Yes, of course
@sid-rs
@sid-rs Жыл бұрын
This guy doesn’t have a clue how awesome he is and how awesome his selection of topics are!
@underfitted
@underfitted Жыл бұрын
Thanks!
@paxdriver
@paxdriver Жыл бұрын
Fantastic content man, thanks for the hard work!
@underfitted
@underfitted Жыл бұрын
Thank you!
@JimNichols
@JimNichols Жыл бұрын
Maybe it isn't missing but being withheld. And thank you for this video as I understand the time in research, scripting, editing and ULing that goes into these vids.
@azzyfreeman
@azzyfreeman Жыл бұрын
Thank you soo much, you bring so much enthusiasm and passion
@underfitted
@underfitted Жыл бұрын
I appreciate that!
@teamtom
@teamtom Жыл бұрын
thank you, it is a great video, again! Not clear for me though how the decision boundary images are calculated and created. Any hint?
@underfitted
@underfitted Жыл бұрын
Do you mean the plots of the sigmoid and tanh functions?
@teamtom
@teamtom Жыл бұрын
​@@underfitted no, i mean the orange/blue "maps" that plot how classes are separated
@swarnodipnag
@swarnodipnag Жыл бұрын
@@teamtom watch the video again and implement that on the tf playground 😀
@TheNaturalLawInstitute
@TheNaturalLawInstitute Жыл бұрын
Entertaining. True. And... a wonderful insight into the problem of trying to describe the neurological system of information processing with mathematics. And worse, how we often spend years without comprehending what we're doing, only to realize someone solved the problem decades before in another context. BTW: the presenter is awesome. ;)
@underfitted
@underfitted Жыл бұрын
Thanks!
@TheNaturalLawInstitute
@TheNaturalLawInstitute Жыл бұрын
@@underfitted FWIW: Constructive feedback from a long-term tech marketing and advertising guy: You do a fabulous job of overcoming the language and pronunciation barrier with the right pace, lots of inflection, body language, and organizing the content in digestable segments. Plus, you're personally charming - and that helps a bit. -Cheers
@peterfireflylund
@peterfireflylund Жыл бұрын
That's a sensationalist, tabloid-like take. I do agree with your last sentence, though... what else are we missing? (We had also missed figuring out the proper distributions to draw from when initializing neuron weights -- and we had missed skip connections. Aren't they more important for allowing the "deep" in deep learning to work? Inserting normalization layers every now and then also protects against vanishing/exploding gradients but they are much less necessary with skip connections and proper initialization.)
@underfitted
@underfitted Жыл бұрын
There has been many, many ideas and transformations that got us to where we are today. I focused my video on one specific idea: the simplest one, yet fundamental.
@KitagumaIgen
@KitagumaIgen Жыл бұрын
Perhaps something with similar shape that is differentiable would allow fancier optimisation algorithms.
@DarkTobias7
@DarkTobias7 Жыл бұрын
This channel is what I needed!!
@underfitted
@underfitted Жыл бұрын
Thanks!
@RonPaul20082012
@RonPaul20082012 Жыл бұрын
There are lots of things we are missing, partly because of video transitions like the one at 6:21.
@CorbinSimpson
@CorbinSimpson Жыл бұрын
Oh, also, since you like to quote folks who talk about building brains, it's worth remembering that ReLU has fuckall to do with neurons. In general, machine learning is not neuroscience, and we shouldn't confuse the two fields of study.
@Someonner
@Someonner Жыл бұрын
Amazingly simple explanation of a very complex concept.
@underfitted
@underfitted Жыл бұрын
Thanks
@LouisDuran
@LouisDuran 7 ай бұрын
Before I even started watching this, I was thinking ReLU... On the surface, it seems like it should not work, It seems pretty much linear so why would it behave better than any other simple regression analysis. But it works! It's almost magic. Almost
@RageForSeven
@RageForSeven Жыл бұрын
so...what about an AI to work on the "what are we missing today" related to relu? or is that too close to comfort? Having an AI working on improving itself? wouldn't that take us too close to singularity?
@RadenVijaya
@RadenVijaya Жыл бұрын
Good things I have something new to learn! Thank you!
@75hilmar
@75hilmar Жыл бұрын
I hope this is not stupid but wouldn't it suffice to multiply the activation function by the number of layers to correct it?
@TheNettforce
@TheNettforce Жыл бұрын
Another gem, thanks underfitted
@underfitted
@underfitted Жыл бұрын
Thanks, Brian!
@jonmichaelgalindo
@jonmichaelgalindo Жыл бұрын
LOL when I started learning NNs I read about other "activation functions" and was just like... But why??? You can't add information to a signal by messing with it like that. You just need a line.
@aaronprindle385
@aaronprindle385 Жыл бұрын
Incredible, thanks for this. Subscribed
@underfitted
@underfitted Жыл бұрын
Awesome, thank you!
@0omarhamdy
@0omarhamdy Жыл бұрын
your style is really amusing I love it
@underfitted
@underfitted Жыл бұрын
Thanks, Omar!
@CassianLore
@CassianLore Жыл бұрын
Excellent video. Subscribed !
@underfitted
@underfitted Жыл бұрын
Thanks!
@maximevanlaer3570
@maximevanlaer3570 Жыл бұрын
Your videos are awesome
@freydawg56
@freydawg56 Жыл бұрын
Ty for the video.
@johanlarsson9805
@johanlarsson9805 Жыл бұрын
I just want to point out that neural nets worked way before that, and multiple layers deep as well. It was back propagation that didn't work! Me personally used a genetic algorithm and the neurons simply had threshold values. If their input was greater than that value they were active and signaled their full strenght, like in nature. Then each connection ofcourse have weights, like in nature. So everything was more "nature like" and that could support multiple layers... the only issue is that it was hundreds/thousands of times slower than backpropagation.
@jiwujang3508
@jiwujang3508 Жыл бұрын
Nice! I was pretty surprised that such a simple function as ReLU solved the diminishing values problem of neural networks.
@underfitted
@underfitted Жыл бұрын
I know, right?
@elmoreglidingclub3030
@elmoreglidingclub3030 Жыл бұрын
But ReLU looks to be linear, right? It doesn’t provide the bump or nudge of nonlinearity needed for complex learning, or does it?
@underfitted
@underfitted Жыл бұрын
It does. It sets to 0 any negative values. That’s enough
@rogue1413
@rogue1413 Жыл бұрын
Every video is a gem!
@underfitted
@underfitted Жыл бұрын
Thanks!
@tedarcher9120
@tedarcher9120 Жыл бұрын
Why not just multiply the gradients for each step through the layers by some amount like 10 so it doesn't vanish?
@AnyVideo999
@AnyVideo999 Жыл бұрын
The original designers were trying to mimic neuron activation which very much behaves more like an on/off cycle. So the functions they were looking for were ones which smoothly joined 0 and 1 together. Changing the gradient so it does not shrink quickly is tantamount to changing the activation function to fall outside of the 0 to 1 window. ReLU also throws all of that out the window and just focuses on the gradients since neural networks share virtually nothing in common with literal neural networks, but better than multiplying through since its much more efficient to compute with.
@tristanwegner
@tristanwegner Жыл бұрын
Great video!
@underfitted
@underfitted Жыл бұрын
Thanks!
@ChronicleContent
@ChronicleContent Жыл бұрын
what if I create a model like this model = keras.Sequential([ keras.layers.Dense(4, input_shape=(2,), activation=tf.nn.tanh), keras.layers.Dense(2, activation=tf.nn.tanh), keras.layers.Dense(1, activation=tf.nn.tanh) ])
@underfitted
@underfitted Жыл бұрын
Sounds like… well, a model.
@sonnydey
@sonnydey Жыл бұрын
Lot of stuff in this real world is made so complex that , the we don't have many amazing techs. Most of the things I build are so simple, but are really amazing too at the same time. I struggle with this world because of how complex is really is and I mean almost everything in this world is so complex that I hate it.
@midnightwatchman1
@midnightwatchman1 Жыл бұрын
I am not sure I am so impressed by the current state of AI, from my days in college i am hearing the hype behind deep learning and neutral network. with all the effort we are putting in why is it not doing more
@yapdog
@yapdog Жыл бұрын
Well explained. *SUBSCRIBED*
@underfitted
@underfitted Жыл бұрын
Welcome and thanks!
@JxH
@JxH Жыл бұрын
Puh. Could have used an arbitrary lookup table "function" since Day 1. Could have used ML to optimize the table since Week 2.
@Simulera
@Simulera 6 ай бұрын
So you asked if there are simple things lying around left to try. Well, here’s two very simple things to consider that I don’t see much/at all about now, although in the mid-late 1980s in a couple of places some bright people did look at some related sorts of things. First, The ReLu function is a degenerate Viterbi algorithm, blending the hypothesis of no signal with time series of non-zero data points of unknown quality. Namely this: activation = a(max(0,X)) + (1-a)X = max(0,X) when a =1. So that is the first thing to try, mess with the Viterbi and thus generalize ReLU very simply. A second thing would be to just use simple radial basis functions as the activation function. So, lots of possible functions, but one sort of like: activation = K*erf(-ax). Doesn’t need to be erf, if fact that is just to point out that it just has to be some sort of of activation function that goes up and then goes back down as a function of increasing x. The width and other shape characteristics are controlled by the parameters K and a above. You can create a spectrum of these activation functions.Point, you can learn XOR without a hidden layer with such an activation function. Another point, it can be used to reduce the hidden layer depth at the cost of increased size of the input layer, the activation activation width spectrum idea can sample feature of different “spatial frequencies” in the sampled process data time series and combine them analogously to a Fourier-style (but its not Fourier obviously) spectrum. Moreover, maybe this is 2 1/2 suggestions then, you can in various ways combine the Viterbi with the radial basis function/ width spectrum. I mean as long as you can just jam stuff in the test bed tool and mess around to see what happens, you can mess around with this. Try learning non-monotonic relations like XOR for example. There a so many simple things to try, to “mathemaicalize”. Green field days wont last forever, might as well take some leaps of intuition as well as serious mathematical motivations. Too much slavish, blind MM/ML data learning programming going on in the world right now, in my opinion.
Жыл бұрын
The whole thing was built on heuristics and that bound the legs. Sigmoids were good in logistic regression, but it was a hassle in modeling anything else.
@cisimon7
@cisimon7 Жыл бұрын
What an ending 🤣
@andrewrobison581
@andrewrobison581 Жыл бұрын
biology has permeable membranes. passing data through without storing and handing off partially completed equations to higher or lower functions will be the breakthrough. combining comparing and differentiating is not enough, the ability to create defines understanding.
@Hyperlooper
@Hyperlooper Жыл бұрын
What a fantastic channel name
@robertsteinbach7325
@robertsteinbach7325 Жыл бұрын
You mean to tell me that this function was the only thing standing between me and a Ph.D. in Computer Science with studies in Artificial Intelligence in the 90s? The more I looked at this function and my past neutral network code from then the answer is YES! f(x) = max(0,x)
@SmokeShadowStories
@SmokeShadowStories Жыл бұрын
it may not be elegant, but can't brute force problem solving come up with the best functions for solving future, similar problems?
@magamindplanet8930
@magamindplanet8930 Жыл бұрын
great video :)
@underfitted
@underfitted Жыл бұрын
Thanks!
@gtt9894
@gtt9894 Жыл бұрын
I can't help but wonder why it took so long for neural network researchers to try this function, which is not novel at all.
@claudiosebastiancastillo3115
@claudiosebastiancastillo3115 Жыл бұрын
Excelexcellent as always Santiago
@underfitted
@underfitted Жыл бұрын
Thanks, Claudio!
@TheWorldBelow360
@TheWorldBelow360 Жыл бұрын
Do you think a NN would have a positive return if you asked it to review Occam’s Razor? Or is that too sharp? Snark snark.
@s.v.discussion8665
@s.v.discussion8665 Жыл бұрын
An exciting video.
@thedailyepochs338
@thedailyepochs338 Жыл бұрын
Awesome Video
@underfitted
@underfitted Жыл бұрын
Thanks!
@wilhelmmeyer89
@wilhelmmeyer89 Жыл бұрын
Unusual ideas sometimes help. Here is a usual one: Let AI create, test and improve AI.
@_John_Sean_Walker
@_John_Sean_Walker Жыл бұрын
Put the origin in the centre of the paper.
@sinfinite7516
@sinfinite7516 Жыл бұрын
Bro this was and amazing video
@underfitted
@underfitted Жыл бұрын
Glad you think so!
@2k10clarky
@2k10clarky Жыл бұрын
I knew it would be ReLu but great presentation
@BananaLassi
@BananaLassi Жыл бұрын
great video
@underfitted
@underfitted Жыл бұрын
Glad you enjoyed it!
@shpensive
@shpensive Жыл бұрын
I really can't take the production, what is this reality tv on cable? Cool telling of the story of solving vanishing gradient though.
@underfitted
@underfitted Жыл бұрын
Sorry you don't like this style of videos.
@ThankYouESM
@ThankYouESM Жыл бұрын
Been playing this video over 100 times feeling like I'm being given the ultimate compliment, although indirectly, despite my many other awesome achievements of quite a variety, which is fine because I don't want to ever be famous (again).
@underfitted
@underfitted Жыл бұрын
I’m glad you feel like that :)
@Tabu11211
@Tabu11211 Жыл бұрын
What about leaky relu?
@underfitted
@underfitted Жыл бұрын
Leaky ReLU solves the Dying ReLU problem. In practice, however, ReLU works very well for most applications, so Leake ReLU isn’t as popular as ReLU.
@ominollo
@ominollo Жыл бұрын
That was interesting 🤔
@williammouncey9198
@williammouncey9198 Жыл бұрын
Someone couldn't just multiply by the inverse of the highest possible output, to make sure the output would reach 1? Seems simple to me.
@underfitted
@underfitted Жыл бұрын
Most things look simple in retrospect
@goldenknowledge5914
@goldenknowledge5914 Жыл бұрын
Awesome
How to evaluate an LLM-powered RAG application automatically.
50:42
What's the future for generative AI? - The Turing Lectures with Mike Wooldridge
1:00:59
Mom's Unique Approach to Teaching Kids Hygiene #shorts
00:16
Fabiosa Stories
Рет қаралды 35 МЛН
What it feels like cleaning up after a toddler.
00:40
Daniel LaBelle
Рет қаралды 87 МЛН
50 YouTubers Fight For $1,000,000
41:27
MrBeast
Рет қаралды 210 МЛН
A teacher captured the cutest moment at the nursery #shorts
00:33
Fabiosa Stories
Рет қаралды 52 МЛН
The moment we stopped understanding AI [AlexNet]
17:38
Welch Labs
Рет қаралды 835 М.
The weirdest paradox in statistics (and machine learning)
21:44
Mathemaniac
Рет қаралды 1 МЛН
8 Mistakes Holding Your Career Back | Machine Learning
6:57
Underfitted
Рет қаралды 64 М.
But What Is A Neural Network?
9:24
Jackson Zheng
Рет қаралды 1,3 М.
A Machine Learning roadmap (the one I recommend to my students)
19:56
Researchers thought this was a bug (Borwein integrals)
17:26
3Blue1Brown
Рет қаралды 3,4 МЛН
Why are vector databases so FAST?
44:59
Underfitted
Рет қаралды 15 М.
Mom's Unique Approach to Teaching Kids Hygiene #shorts
00:16
Fabiosa Stories
Рет қаралды 35 МЛН