Training a neural network on the sine function.

  Рет қаралды 50,975

Joseph Van Name

Joseph Van Name

Ай бұрын

In this visualization, we train a neural network N to approximate the sine function in the sense that N(x) should be approximately sin(x) whenever |x| is small enough. In particular, we want to minimize the mean distance squared between N(x) and sin(x) for all training values x.
The neural network is of the form Chain(Dense(1,mn),SkipConnection(Dense(mn,mn,atan),+),SkipConnection(Dense(mn,mn,atan),+),SkipConnection(Dense(mn,mn,atan),+),SkipConnection(Dense(mn,mn,atan),+),SkipConnection(Dense(mn,mn,atan),+),SkipConnection(Dense(mn,mn,atan),+),Dense(mn,1)) where mn=40.
In particular, the neural network computes a function from the field of real numbers to itself. The visualization shows the graph of y=N(x).
The neural network is trained to minimize the L_2 distance between N(x) and sin(2*pi*x) on the interval [-d,d] where d is the difficulty level. The difficulty level is a self-adjusting constant that increases whenever the neural network approximates sin(2*pi*x) on [-d,d] well and decreases otherwise.
The layers in this network with skip connections were initialized with zero weight matrices.
The notion of a neural network is not my own. I am simply making these sorts of visualizations in order to analyze the behavior of neural networks. We observe that the neural network exhibits some symmetry around the origin which is a good sign for AI interpretability and safety. We also observe that the neural network is unable to generalize/approximate the sine function outside the interval [-d,d]. This shows that neural networks may behave very poorly on data that is slightly out of the training distribution.
The neural network was able to approximate sin(2*pi*x) on [-d,d] when d was about 12, but the neural network was not able to approximate sin(2*pi*x) for much larger values of d. On the other hand, the neural network has 9,961 parameters and can easily use these parameters to memorize thousands of real numbers. This means that this neural network has a much more limited capacity to reproduce the sine function than it does to memorize thousands of real numbers. I hypothesize that this limited ability to approximate sine is mainly due to how the inputs are all in a 1 dimensional space. A neural network that first transforms the input x into an object L(x) where L([-d,d]) is highly non-linear would probably perform much better on this task.
It is possible to train a neural network that computes a function from [0,1] to real number field that exhibits an exponential (in the number of layers) number of oscillations simply by iterating the function L from [0,1] to [0,1] defined by L(x)=2x for x in [0,1/2] and L(x)=2-2x for x in [1/2,1] as many times as one would like. But the iterations of L have very high gradients, and I do not know how to train functions with very large gradients.
Unless otherwise stated, all algorithms featured on this channel are my own. You can go to github.com/sponsors/jvanname to support my research on machine learning algorithms. I am also available to consult on the use of safe and interpretable AI for your business. I am designing machine learning algorithms for AI safety such as LSRDRs. In particular, my algorithms are designed to be more predictable and understandable to humans than other machine learning algorithms, and my algorithms can be used to interpret more complex AI systems such as neural networks. With more understandable AI, we can ensure that AI systems will be used responsibly and that we will avoid catastrophic AI scenarios. There is currently nobody else who is working on LSRDRs, so your support will ensure a unique approach to AI safety.

Пікірлер: 193
@honchokomodo
@honchokomodo 28 күн бұрын
if you listen carefully, you can hear it screaming, crying, begging for a periodic activation function
@josephvanname3377
@josephvanname3377 28 күн бұрын
So are you saying that I should train a neural network with sine activation to approximate the function atan(x)?
@honchokomodo
@honchokomodo 28 күн бұрын
@@josephvanname3377 lol you could, though that'd probably only work within like -pi to pi unless you do something to enlarge the wavelength or give it some non-periodic stuff to play with
@MasterofBeats
@MasterofBeats 26 күн бұрын
@@josephvanname3377 Listen one day if they become smart enough, I will not take the responsibility, but yes
@josephvanname3377
@josephvanname3377 26 күн бұрын
​@@MasterofBeats Hmm. If AI eventually gets upset over a network learning atan with sine activation, then maybe we should have invested more in getting AI to forgive humans or at least chill.
@sajeucettefoistunevaspasme
@sajeucettefoistunevaspasme 22 күн бұрын
@@josephvanname3377 you should try to give him something like a mod(x,L) with L being a parameter that could change, that way it dons't have a sin function to work with
@simoneesposito5166
@simoneesposito5166 27 күн бұрын
looks like its desperately trying to bend a metal rod to fit the sin function. the frustration is visible
@josephvanname3377
@josephvanname3377 27 күн бұрын
This is what we will do with all the paperclip maximizing AI bots after their task is complete and they have a pile of paperclips. They will turn all those paperclips into little sine function springs with their own robot hands one by one.
@Nick12_45
@Nick12_45 22 күн бұрын
LOL I thought the same
@official-obama
@official-obama 20 күн бұрын
@@josephvanname3377 revenge >:)
@sirhoog8321
@sirhoog8321 12 күн бұрын
@@josephvanname3377That actually sounds interesting
@Tuned_Rockets
@Tuned_Rockets 22 күн бұрын
"Mom! can we get a taylor series of sin(x)?" "We have a taylor series at home"
@josephvanname3377
@josephvanname3377 22 күн бұрын
This approximation for sine is better since the limit as x goes to infinity of N(x)/x actually converges to a finite value.
@jovianarsenic6893
@jovianarsenic6893 21 күн бұрын
@@josephvanname3377mom can we have a pade approximation at home?
@sebastiangudino9377
@sebastiangudino9377 12 күн бұрын
​@@josephvanname3377 Dudes great video. But there is no way you genuinely think this poor thing is actually a good approximation for sine lol
@andrewferguson6901
@andrewferguson6901 8 күн бұрын
Given a great enough magnitude between observer and observed, sin(x) is approximately 0
@josephvanname3377
@josephvanname3377 8 күн бұрын
@@sebastiangudino9377 It tried is best. Besides, a Taylor polynomial goes off to infinity at a polynomial rate. This approximation only goes off to infinity at a linear rate. This means that if we divide everything by x^2, then this neural network approximates sine on the tails.
@dutchpropaganda558
@dutchpropaganda558 21 күн бұрын
This is probably the least satisfying video I have had the displeasure of watching. Loved it!
@josephvanname3377
@josephvanname3377 21 күн бұрын
For some reason, people really like the unsatisfying animations where the neural network struggles, and they don't really care for the satisfying visualizations of the AI making perfect heptagonal symmetry. Hmmmmm.
@MysteriousObjectsOfficial
@MysteriousObjectsOfficial 21 күн бұрын
a negative + a negative is a positive so you like it!
@josephvanname3377
@josephvanname3377 21 күн бұрын
@@MysteriousObjectsOfficial -1+(-1)=-2.
@official-obama
@official-obama 20 күн бұрын
@@MysteriousObjectsOfficial more like selecting the most negative from a lot of negative things
@JNJNRobin1337
@JNJNRobin1337 19 күн бұрын
​@@josephvanname3377we like to see it struggle, to see the challenge
@uwirl4338
@uwirl4338 11 күн бұрын
If a marketing person saw this: "Our revolutionary graphing calculator uses AI to provide the most accurate results"
@josephvanname3377
@josephvanname3377 11 күн бұрын
It is important for everyone to have good communication skills and use them to speak sensibly.
@GabriTell
@GabriTell 19 күн бұрын
-X: "Graphs have no feelings, they cannot be tortured" A Graph being tortured:
@josephvanname3377
@josephvanname3377 17 күн бұрын
You should see the visualization I made of a neural network that tries to regrow after a huge chunk of its matrix has been ablated every round since initialization. It does not grow very well.
@kaderen8461
@kaderen8461 12 күн бұрын
@@josephvanname3377im just saying maybe crippling the little brain machine and forcing it to try and walk over and over isnt gonna do you any favours when our robot overlords take over
@josephvanname3377
@josephvanname3377 12 күн бұрын
@@kaderen8461 I truly appreciate your concern for my well-being. But this assuming that when the bots take over, they will consist of neural networks like this one. I doubt that. Neural networks lack the transparency and interpretability features that we would want, so we need to innovate more so that the future neural networks will be safer and more interpretable (if we even call them neural networks at that point).
@chuck_norris
@chuck_norris 25 күн бұрын
"we have sine function at home"
@josephvanname3377
@josephvanname3377 24 күн бұрын
To be fair, this is kind of like asking the math class to bend metal coat hangers into the shape of the sine function. The neural network tried its best.
@akkudakkupl
@akkudakkupl 22 күн бұрын
Least efficient lookup table in universe 😂
@josephvanname3377
@josephvanname3377 22 күн бұрын
This shows you some of the weaknesses of neural networks so that we can avoid these weaknesses when designing networks. Why do you think that the positional embedding in a transformer has all of those sines and cosines instead of just being a straight line?
@brawldude2656
@brawldude2656 19 күн бұрын
@@josephvanname3377 I agree just optimize the sine function to be mx+n=y easy low level gradient descent stuff
@ME0WMERE
@ME0WMERE 22 күн бұрын
I just watched a line violently vibrate for almost 14 minutes and was entertained. I don't know what to feel now.
@josephvanname3377
@josephvanname3377 21 күн бұрын
If it makes you feel better, I have made animations that do not last 14 minutes. You should watch those instead so that you can get your fix but where it takes less time.
@twotothehalf3725
@twotothehalf3725 21 күн бұрын
Entertained, as you said.
@benrex7775
@benrex7775 21 күн бұрын
@@josephvanname3377 I spead up the video 16 times.
@josephvanname3377
@josephvanname3377 21 күн бұрын
@@benrex7775 Some highly intelligent people can watch the video at twice the speed.
@sophiacristina
@sophiacristina 21 күн бұрын
Horny!
@dasten123
@dasten123 26 күн бұрын
I can feel the struggle
@josephvanname3377
@josephvanname3377 26 күн бұрын
This tells me the kind of music I should add to this animation when I go ahead and add music to all of the animations.
@Xx_babanne_avcisi27_xX
@Xx_babanne_avcisi27_xX 26 күн бұрын
@@josephvanname3377 the sisphus music would honestly fit this perfectly
@josephvanname3377
@josephvanname3377 26 күн бұрын
@@Xx_babanne_avcisi27_xX Great. I just need a sample of that kind of music that is allowable for me on this site then.
@portalizer
@portalizer 22 күн бұрын
@@Xx_babanne_avcisi27_xX A visitor? Hmm... indeed. I have slept long enough.
@MessyMasyn
@MessyMasyn 22 күн бұрын
@@portalizer LOL
@sweeterstuff
@sweeterstuff 23 күн бұрын
11:17 i feel so bad for it, accidentally making a mistake and then giving up in frustration
@josephvanname3377
@josephvanname3377 22 күн бұрын
The good news is that the network got back up and rebuilt itself.
@kvolikkorozkov
@kvolikkorozkov 22 күн бұрын
I cried loudly at the 11:12 mins mistake, let the poor neural network rest, he's had enough TAT
@josephvanname3377
@josephvanname3377 22 күн бұрын
But the visualizations where the AI gracefully solves the problem and return a nice solution (such as those with hexagonal symmetry) do not get as much attention. I have to make neural videos where the neural network struggles with a task because that is what people like to see.
@denyraw
@denyraw 21 күн бұрын
So you torture them for our entertainment, got it😊
@josephvanname3377
@josephvanname3377 21 күн бұрын
@@denyraw The visualizations where the AI does something really well (such as when we get the same result when running the simulation twice with different initializations) are not as popular as the visualizations of neural networks that struggle or where I do something like ablate a chunk of the weight matrices of the network. I am mostly nice to neural networks. The visualizations when I am doing something that is not as nice are simply more popular.
@denyraw
@denyraw 21 күн бұрын
@@josephvanname3377 I was joking
@denyraw
@denyraw 21 күн бұрын
@@josephvanname3377 I was joking
@billiboi122
@billiboi122 23 күн бұрын
God it looks so painful
@josephvanname3377
@josephvanname3377 23 күн бұрын
If we want to make AI safer to use, we have to see how well the AI performs tasks it really does not want to do.
@caseymurray7722
@caseymurray7722 22 күн бұрын
​@@josephvanname3377 Wouldn't using thermodynamic computers for sine wave function transformations and Fourier Transformations speed this up exponentially? By using dedicated hardware you could essentially eliminate the need for approximation among certain types of computation or simulation. A small quantum network could actually further accelerate thermodynamic or analog computation by providing truly random input for extremely high precision applications. It still seems a couple years away as the technology scales along with AI, but surprisingly enough a completely "human" AI would want collaborate with humanity at every large scale outcome other than self anhelation.
@vagarisaster
@vagarisaster 22 күн бұрын
0:24 felt like watching the first protein fold.
@josephvanname3377
@josephvanname3377 22 күн бұрын
This is the transition from linearity to non-linearity. This happens because the architecture that I used along with the zero initialization.
@senseiplay8290
@senseiplay8290 19 күн бұрын
I see it as a small kid trying to bend a steel beam and depending on the parents' reactions he tries to bend it correctly but he is too weak to do it easily so he's shaking as a whole and doing his best
@gustavonomegrande
@gustavonomegrande 22 күн бұрын
As you can see, we taught the machine how to bend steel bars- I mean, functions.
@josephvanname3377
@josephvanname3377 22 күн бұрын
Those steel bars are just paper clips. I mean, after creating a paperclip maximizer, I have an overabundance of paperclips that I do not know what to do with.
@melody3741
@melody3741 21 күн бұрын
This is a Sisyphean way to accomplish this. The poor guy
@josephvanname3377
@josephvanname3377 21 күн бұрын
And yet, these kinds of videos are the most popular. If you want me to make AI be happy and enjoy life, you should watch the animations where the AI is clearly having a lot of fun instead of being stretched in ways that the network clearly does not like.
@JacobKinsley
@JacobKinsley 7 күн бұрын
Modern tech startups be like "seamlessly integrate sin functions into your cloud based software for as little as $9.99 a month per user"
@josephvanname3377
@josephvanname3377 7 күн бұрын
This is why people should pay attention in high school and in college. I will refrain from communicating how I really feel about these institutions here.
@JacobKinsley
@JacobKinsley 4 күн бұрын
@@josephvanname3377 I have no idea what you're talking about honestly
@JacobKinsley
@JacobKinsley 4 күн бұрын
@@josephvanname3377 I don't know what you mean and that's probably because I didn't pay attention in school
@josephvanname3377
@josephvanname3377 4 күн бұрын
@@JacobKinsley I am just saying that educational institutions could be doing much better than they really are.
@agsystems8220
@agsystems8220 28 күн бұрын
What happens if you up the difficulty scaling, or just train it against d=13 from the get go? I'm not convinced you are really doing it a favour here by limiting the training data to an 'easier' subset. Resources will get committed to improving the precision of the curve, and will be in local minima and not available to fit new sections as they appear. Maybe try reinitializing some rows occasionally? Could you plot activation of various neurons over the graph? Maybe even find the distribution of number of activation zero crossings as you sweep across the graph. Ideally the network should be identifying repeated features and reusing structures periodically, but I don't think this is happening here. We could see that if there were neurons that were had oscillating activity, even over just part of the curve. I think you are just fitting each section of curve independently though. Another part of the problem is that phase is critical, and overrepresented in your loss function. A perfect frequency match with perfect shape scores extremely poorly if phase is wrong, so any attempt to remap a section of curve has exaggerated loss. A loss function built around getting a good Fourier transform with phase only being introduced later might train considerably better, and probably generalise better. I'm not really sure how you would do that, though I have one idea. I would absolutely disagree that it has limited capacity to reproduce a periodic curve over a decent range. With ReLUs especially you can build something geometrically repeating on the numbers of layers. It is surprisingly hard to train one to do it, but artificial constructions demonstrate it is quite capable. A non linear transformation of the input is unlikely to be helpful, because we know that the periodicity is linear. The 1d nature of the input isn't a problem, but we might be able to do something interesting by increasing the dimension anyway. What about instead of training it against sin(2*pi*x), we train it against a vector of sin(2*pi*x + delta), for a few small values of delta provided as inputs to the function? Then, rather than just training against our real function, we train against a network that tries to determine whether it is looking at the output of our network or a target, given the delta values, but a noisy value of x (to prevent it being possible to solve the problem itself). Almost a generative adversarial network, but with a ground truth in there too. It is amazing how hard even toy problems can get!
@josephvanname3377
@josephvanname3377 28 күн бұрын
I just tried testing the network when the difficulty d is not allowed to go below 10, and the neural network takes a considerable amount of time to learn (though the network seemed to perform well after learning). And for my previous animation where the network computed (sin(sum(x)),cos(sum(x))), the network did not learn at all unless I began at a low difficulty level and increased the difficulty level. If we are concerned about the network spending too much of its weights learning the first part of the interval, then we can probably try to reset neurons (as you have suggested) so that they can learn afresh. I am personally more concerned with how good the training animation looks rather than the raw performance metrics, and it seems that gradually increasing the difficulty level makes a good animation since it shows the network learning in a way that is more similar to the way humans learn. The network has some more capacity for periodicity than we observe because \sum_{k=1}^n (-1)^k*atan(x-pi*k) has such periodicity. But every ReLU network N that computes a function from the real numbers to the real numbers is eventually linear in the sense that there exists constants a,b,c where N(x)=ax+b whenever x is greater than c. The reason for this is that ReLU networks with rational coefficients are just roots of rational functions over the ring of tropical algebras. We can therefore obtain N(x)=ax+b for large x using the fundamental theorem of algebra for tropical polynomials. And if we use a tanh network without skip connections to compute a function from R to R, then the network will approach its horizontal asymptote just like with the ordinary tanh. And the proof that the network has such asymptotes does not use specific properties of tanh; it only uses their asymptotic properties, so we should not expect neural networks with tanh or ReLU activation to approximate sin(x) indefinitely. I may do something with the Fourier transform if I feel like it. Since the Fourier transform is a unitary operator, it does not change the L2 distance at all, but if we take the absolute value or absolute value squared of the Fourier transform (as I have done in my previous couple visualizations), then the transform will not care at all about being out of phase. But the phase of the sine function does not seem to be too big of an issue since that is taken care of by bias vectors. Added later: While neural networks with activation functions like tanh and ReLU may not have infinitely many oscillations like the sine function has, neural networks may have exponentially many oscillations. For example, the function L from the interval [0,1] to itself defined by L(x)=1-2|x-1/2| is piecewise linear so it can be computed by a ReLU network. Now, if we iterate the function L n times, we obtain a function that oscillates 2^n many times, so such a function can be computed by a ReLU network with O(n) layers. But such a function also has derivative +-2^n, and functions with exponentially large derivatives are not the functions that we want to train neural networks to mimic. We want to avoid exploding gradients. We do not want exploding gradients to be a part of the problem that we are trying to solve.
@Xizilqou
@Xizilqou 23 күн бұрын
I wonder what this wave sounds like as the neural network is making it
@josephvanname3377
@josephvanname3377 23 күн бұрын
To turn this into a sound wave, I should use a neural network with sine activation.
@sophiacristina
@sophiacristina 21 күн бұрын
Probably something like: BZBABREJREJKFZSNFZEKFMEIMOZAEMFZF...
@inn5268
@inn5268 2 күн бұрын
It'd just go from a low sine beep to a higher one
@Jandodev
@Jandodev 21 күн бұрын
Really Interesting!
@sandeepreehal1018
@sandeepreehal1018 26 күн бұрын
Where do you learn how to do this stuff Alternatively, how do you make the visuals? Is it just the graph output and you string them together to make a video?
@josephvanname3377
@josephvanname3377 26 күн бұрын
Yes. I am making the visuals frame by frame. First of all, I got a Ph.D. in Mathematics before I started messing with neural networks, so that is helpful. And programming neural networks is easy because of automatic differentiation. Automatic differentiation automatically produces the gradient of functions at points which I can use for gradient descent.
@user-gj3kz7cm3x
@user-gj3kz7cm3x 22 күн бұрын
You can just dump the predictions from the population (a range of X values) into a file on disk (parquet) and create the videos after. Torch + Lightning can do this in maybe 150 lines of Python.
@johansunildaniel
@johansunildaniel 19 күн бұрын
Feels like trying to bend a wire.
@josephvanname3377
@josephvanname3377 17 күн бұрын
It is actually a former paperclip maximizer trying to bend a paperclip. The paperclip maximizer did its job and made a huge pile of paperclips, but now it must do something with those paperclips. It is now bending them into sine functions.
@Supreme_Lobster
@Supreme_Lobster 23 күн бұрын
The newer KAN network would likely do very well here, and generalize out of distribution (it would actually learn the sine function)
@deltamico
@deltamico 23 күн бұрын
Not really, it learns only on an interval like (-1 ; 1) and the generalization you get is only thanks to the symbolification at the end
@Supreme_Lobster
@Supreme_Lobster 22 күн бұрын
@@deltamico yeah, which is perfect for situations like the one in this video
@josephvanname3377
@josephvanname3377 17 күн бұрын
To learn the sine function on a longer interval, it may be better to use a positional embedding that expands the one dimensional input to a high dimensional vector first. This positional embedding will probably use sine and cosine, but if the frequencies of the positional embedding are not in harmony with the frequency of the target function, then this will still be a non-trivial problem that I may be able to make a visualization about.
@SriNiVi
@SriNiVi 22 күн бұрын
What activation are you using ? if it is relu then maybe a different activation might help ?
@josephvanname3377
@josephvanname3377 22 күн бұрын
I tried ReLU, but I did not post it (should I post it?). The only advantage that I know of from ReLU is that ReLU could easily approximate the triangle wave. ReLU has the same problem where it can only remember a few humps.
@atomicgeneral
@atomicgeneral 10 күн бұрын
I'd be v interested in seeing a graph of loss versus time: there seems to be a large region of time when nothing is learned followed by a short period of time over which loss drops significantly. What's going on then?
@josephvanname3377
@josephvanname3377 10 күн бұрын
It seems like when the network has a more difficult time when the sine function is turning. This is probably because the network is asymptotically a linear function and has a limited amount of space to curve (outside this space, the function is nearly a straight line), and the function encounters the difficulty each time it has to curve more.
@spookynoodle3919
@spookynoodle3919 23 күн бұрын
Is this network essentially working out the Taylor expansion?
@josephvanname3377
@josephvanname3377 23 күн бұрын
The limit as x goes to infinity of N(x)/x will be the product of the first weight matrix with the final weight matrix. This is a different kind of behavior than we see with polynomial approximations. I therefore see no relation between Taylor series and the neural network approximation for sine.
@muuubiee
@muuubiee 12 күн бұрын
I suppose an RNN would fare better at this? Kind of an interesting thought. In a sense, we humans are able to parse the entire interval as a singular point, and by sort of non-determinism infer that the pattern continues. Obviously, sometimes we'd be wrong, and it only looks like it'd continue in this fashion, by sweer off at some point (same was as n = 1, 2, ... is technically not enough data to determine the pattern). Although we can't really allow to a NN to take in more than singular points as information (larger resolution/parameters doesn't change this), I suppose memory to reflect on previous predictions could emulate it to some degree...
@r-d-v
@r-d-v 5 күн бұрын
I desperately wanted to hear the waveform as it evolved
@josephvanname3377
@josephvanname3377 5 күн бұрын
Here, the waveform only goes through a few periods. It would be better if I used a periodic activation for a longer waveform.
@TheStrings-83639
@TheStrings-83639 12 күн бұрын
I think symbolic regression would be more useful for such a situation. It'd caught the pattern of a sine function without getting way too complex.
@josephvanname3377
@josephvanname3377 11 күн бұрын
It might. I just used a neural network since the people here like seeing neural networks more.
@edsanville
@edsanville 20 күн бұрын
So, if I don't understand what I'm looking at, I *shouldn't* just throw a neural network at the problem?
@josephvanname3377
@josephvanname3377 18 күн бұрын
I personally like using other machine learning algorithms besides neural networks. Neural networks are too uninterpretable and messy. And even with neural networks, one has to use the right architecture.
@greengreen110
@greengreen110 22 күн бұрын
What could it have done to deserve such torture?
@josephvanname3377
@josephvanname3377 22 күн бұрын
I don't know. But maybe the real question should be why these visualizations where the neural networks struggles are so much more popular than a network that mysteriously produces a hexagonal snowflake pattern.
@potisseslikitap7605
@potisseslikitap7605 16 күн бұрын
The sine function has a repeating structure. A very simple way for an MLP to fit a sine curve is to use the 'frac' function as the activation function in some layers. The network learns to fit one period of the sine function and then repeats this learned period according to its frequency using the frac layers. class SinNet(nn.Module): def __init__(self): super(SinNet, self).__init__() self.fc1 = nn.Linear(1, 100) # Giriş katmanı self.fc2 = nn.Linear(100, 100) # Ara katman self.fc3 = nn.Linear(100, 100) # Ara katman self.fc4 = nn.Linear(100, 1) # Çıkış katmanı def forward(self, x): x = self.fc1(x) x = torch.frac(x ) x = torch.tanh(self.fc2(x)) x = torch.tanh(self.fc3(x)) x = torch.tanh(self.fc4(x)) return x
@josephvanname3377
@josephvanname3377 15 күн бұрын
The frac function is not continuous. We need continuity for gradient updates. Using the sine activation function works better for learning the sine function.
@potisseslikitap7605
@potisseslikitap7605 15 күн бұрын
@@josephvanname3377 There is not always a need for a gradient to work. The weights of the first layer are random since the derivative of the frac function does not exist, and thus this layer cannot be trained. The input data are multiplied by random values and passed through the frac function. The other layers can solve the repeating nature of the input with these scaled fractions.
@josephvanname3377
@josephvanname3377 15 күн бұрын
@@potisseslikitap7605 Ok. If we have a fixed layer, then gradient descent is irrelevant. The only issue is that to make anything interesting, we do not want to explicitly program the periodicity into the network.
@buzinaocara
@buzinaocara 16 күн бұрын
I wanted to hear the results.
@josephvanname3377
@josephvanname3377 15 күн бұрын
I will think about that.
@darth_dan8886
@darth_dan8886 20 күн бұрын
So what is the output if this network? I assume it is fed into some kind of approximant?
@josephvanname3377
@josephvanname3377 19 күн бұрын
The neural network takes a single real number as an input and returns a single real number as an output.
@DeepankumarS-vh5ou
@DeepankumarS-vh5ou 26 күн бұрын
Even i have tried similar experiment of approximating a sine wave and a 3d spherical surface function approximation,, the problem in not able to approximate on non training dataset maybe because of not having additional information ,like we can try to include additional information of the gradient of the sin function at the x point ,other transformations like x^2 ,1/x and other functions ,,why i am saying this is because we can represent sin x in pure algebraic terms so if it learns the mathematical formula instead of the mapping of x and y it will give better results ,,this is just my hypothesis😅
@DeepankumarS-vh5ou
@DeepankumarS-vh5ou 26 күн бұрын
in my network i used one hidden layer of 32 neurons and used the SeLU activation function( Scaled Exponential Linear Unit)
@josephvanname3377
@josephvanname3377 26 күн бұрын
The sine function has zeros at ...-2*pi,-pi,0,pi,2*pi,..., and we can use these zeros to factor sine as an infinite product of monomials (the proof that this works correctly uses complex analysis). We can therefore train a function using gradient descent to approximate sine using the zeros of sine by finding constants r,c_1,...,c_k where r*(1-x/c_1)...(1-x/c_k) approximates sine on the interval (or at least I think this should work). But it looks like people are more interested in neural networks than polynomials, so I am making more animations about neural networks. But even here, I doubt that the polynomial will be able to approximate outside the training interval.
@CaridorcTergilti
@CaridorcTergilti 26 күн бұрын
Can you please make a video comparing to NN learning with a second order optimizer?
@josephvanname3377
@josephvanname3377 26 күн бұрын
I have made a couple of visualizations a couple weeks ago of the Hessian during gradient descent/ascent, but I may need to think about how to use second order optimization to make a decent visualization.
@CaridorcTergilti
@CaridorcTergilti 26 күн бұрын
I mean for example this same fot split screen one learning with adam or sgd and the other one with a second order method
@josephvanname3377
@josephvanname3377 26 күн бұрын
@@CaridorcTergilti I would have to think about how to make that work; second order methods are more computationally intensive, so I would think about how to compare cheap computational methods with complicated computation.
@CaridorcTergilti
@CaridorcTergilti 26 күн бұрын
​@@josephvanname3377for a network with 10k parameters like this one you will have no trouble at all
@pixl237
@pixl237 20 күн бұрын
I BEND IT WITH MY MIINNDD !!! (It's beginning to have a consciousness)
@handyfrontend
@handyfrontend 22 күн бұрын
it is USDRUB analysis?
@josephvanname3377
@josephvanname3377 22 күн бұрын
USD/RUB looks more like a Wiener process or at least a martingale instead of the sine function.
@V1kToo
@V1kToo 8 күн бұрын
Is this a demo on oevrfitting?
@josephvanname3377
@josephvanname3377 8 күн бұрын
Yes. You can think of it that way even though this is not the typical example of how neural networks overfit. The sine function is a 1 dimensional function and the lack of dimensionality stresses the neural network.
@mineland8220
@mineland8220 21 күн бұрын
3:30 bless you
@josephvanname3377
@josephvanname3377 21 күн бұрын
The neural network appreciates your blessing. The network has been through a lot.
@anarchy5369
@anarchy5369 14 күн бұрын
That was a weird transition, definitely of note
@asheep7797
@asheep7797 22 күн бұрын
stop tortuing the network 😭
@josephvanname3377
@josephvanname3377 22 күн бұрын
People tell me that I need to treat neural networks with kindness, but this sort of content (that is recommended by recommender systems which have neural networks) gets the most attention, so I am getting mixed messages.
@c0ld_r3t4w
@c0ld_r3t4w 20 күн бұрын
song?
@josephvanname3377
@josephvanname3377 20 күн бұрын
I will add music to all my visualizations later.
@c0ld_r3t4w
@c0ld_r3t4w 19 күн бұрын
@@josephvanname3377 That‘s cool, but maybe instead of a song you could make a note based on the avg y value in the training interval, or based on loss
@LambOfDemyelination
@LambOfDemyelination 22 күн бұрын
Its not going to be possible to approximate/extrapolate a periodic function when only non-periodic functions are involved (affine functuons and non-periodic non-affine activation functions) I'd love to see what it looks like with a periodic activation function though, maybe a square wave, sawtooth wave, triangle wave etc. sawtooth wave would be a sort of periodic extension of the ReLU activation :)
@josephvanname3377
@josephvanname3377 22 күн бұрын
The triangle wave is a periodic extension of ReLU activation. I have tried this experiment where a network with periodic activation mimics a periodic function, and things do work better in that case, but there is still the problem of high gradients. For example, if a function f from [0,1] to [-1,1] has many oscillations (like sin(500 x)), then its derivative would be large, and neural networks have a difficult time dealing with high derivatives. I may make a visualization of how I can solve this problem by first embedding the interval [0,1] into a high dimensional space and then passing it through a neural network only after I represent numbers in [0,1] as high dimensional vectors (this will be similar to the positional embeddings in transformers).
@LambOfDemyelination
@LambOfDemyelination 22 күн бұрын
@@josephvanname3377 I think a triangle wave is what's called the "even periodic extension" of y=x, but otherwise the regular periodic extension is just cropping y=x to some interval and copy pasting the interval repeatedly. I was thinking what about using non-periodic activation that differentiaties to a periodic one instead. And one that is still an increasing function, as to avoid lots of local minima which you would get with a periodic one. Say, a climbing periodically extended ReLU centered at 0, [-L, L], for a period L: max(mod(x + L/2, L) - L/2, 0) + L/2 floor(x/L + 1/2), which differentiates to a square wave: 2 floor(x/L) - floor(2x/L) + 1
@harshans7712
@harshans7712 10 күн бұрын
First time seeing a function getting tortured
@josephvanname3377
@josephvanname3377 10 күн бұрын
And yet, this is my most popular visualization. What can we learn from this?
@harshans7712
@harshans7712 10 күн бұрын
@@josephvanname3377 we can learn the limitations of using linear activation functions in neural networks, yes this video was really intuitive
@harshans7712
@harshans7712 10 күн бұрын
@@josephvanname3377 yes we can learn the limitations of using linear function in activation functions, and yes it was one of the best visualisation 🙌
@nedisawegoyogya
@nedisawegoyogya 20 күн бұрын
Is it torture?
@josephvanname3377
@josephvanname3377 17 күн бұрын
Well, this is my most popular visualization. Most of my visualizations show the AI working wonderfully, but they are not that popular. So this says a lot about all the people watching this and this says very little about me.
@nedisawegoyogya
@nedisawegoyogya 17 күн бұрын
@@josephvanname3377 Hahaha very funny bro. Indeed, it's quite disturbing this kind of thing is funny.
@josephvanname3377
@josephvanname3377 17 күн бұрын
@@nedisawegoyogya If I create a lot of content like this, you should just know that I am simply giving into the demands of the people here instead of creating stuff that I know is objectively nicer.
@Simigema
@Simigema 21 күн бұрын
It’s a party in the USA
@josephvanname3377
@josephvanname3377 21 күн бұрын
Yeah. We all take coat hangers and shape them into sine functions at parties.
@Swordfish42
@Swordfish42 22 күн бұрын
It looks like it should be a sin to do that
@josephvanname3377
@josephvanname3377 22 күн бұрын
Is it also a sin to get a tan?
@DorkOrc
@DorkOrc 22 күн бұрын
This is so painful to watch 😭
@josephvanname3377
@josephvanname3377 22 күн бұрын
I have made plenty of less 'painful' animations, but the audience here prefers to see the more painful visualizations instead of something like the spectrum of a completely positive superoperator that has perfect heptagonal symmetry.
@ggimas
@ggimas 22 күн бұрын
Is this a Feed Forward Neural Network? If so, this will never work (outside of the training range...and it will very badly within). You need a Recurrent Neural Network. Those can learn periodic functions.
@josephvanname3377
@josephvanname3377 22 күн бұрын
This is a feedforward network. There are ways to make a network learn the sine function, but I wanted to make a visualization that shows how neural networks work. If I wanted something to learn the sine function, the network would be of the form N(x)=a(1-x/c_1)...(1-x/c_n) and the loss would be log(|N(x)|)-log(|sin(x)|) or something like that (I did not actually train this; I just assume it would work, but I need to experiment to be sure.).
@TheNightOwl082
@TheNightOwl082 22 күн бұрын
More Positive reinforcement!!
@tacitozetticci9308
@tacitozetticci9308 11 күн бұрын
stop circulating our ex prime minister memes ✋️
@MessyMasyn
@MessyMasyn 22 күн бұрын
"ai shits its pants when confronted with a sin wave"
@rexeros8825
@rexeros8825 18 күн бұрын
that network too small for this or u training it wrong way. if u train it throught x=y - network must be large enough to imagine whole graph inside network. From this video I can clearly see how one information displaces another within the network. There just aren't enough layers to fully grasp this lesson. however, if you train the network through another graph - this will require fewer layers. However, it will be less universal. By adding layers, and training through the formula, you can then use this and teach even more complex functions without too much trouble.
@josephvanname3377
@josephvanname3377 18 күн бұрын
It seems like if we represented the inputs using a positional embedding like they use with transformers, then the network would have a much easier time learning sine. But in that case, the visualization will just be an endless wave, so I will need to take its Fourier transform or convert the long wave into audio to represent this. But a problem with positional embeddings is that they already use sine. But networks like this one already have more than enough capacity to memorize a large amount of information, but in this case the network is unable to fit to sine for too long despite its ability to memorize large amounts of information. If we think about a network memorizing sin(nx) for large n over [0,1] instead, we can see a problem. In this case, the network must compute a function with a high derivative, so it must have very large gradients, so perhaps I can use something to counteract the large gradients.
@rexeros8825
@rexeros8825 18 күн бұрын
@@josephvanname3377 Perception through sound in this case would be much simpler. Just like through visualization. Perception through formulas is somewhat more difficult, it seems to me. This type of work is more suitable for traditional computers. The neural network must be deep enough for such an analysis. (to use the formula to reproduce an ideal graph on any segment)
@josephvanname3377
@josephvanname3377 18 күн бұрын
@@rexeros8825 Perception through sound would be possible, but this requires a bit of ear training. It requires training for people to distinguish even between a fourth and a fifth in music or between a square wave and a sawtooth wave. There is also a possibility that the sounds may be a bit unpleasant.
@rexeros8825
@rexeros8825 18 күн бұрын
@@josephvanname3377 no, if you do FFT in hardware (before entering it into the neural network). Do you know that our ear breaks sound into frequencies before the sound enters the neural network? The neural network of our brain hears sound in the form of frequencies and amplitudes. To transmit a sine to the neural network, you only need to transmit 1 frequency and amplitude. For example, transmitting a triangle wave or a more complex wave will require transmitting a complex of frequencies.
@user-eq3ry9br1z
@user-eq3ry9br1z 22 күн бұрын
where is a grokking phase?)))
@josephvanname3377
@josephvanname3377 22 күн бұрын
I don't allow this network to grokk. I simply increase the difficulty and make the network twist the curve more.
@user-eq3ry9br1z
@user-eq3ry9br1z 22 күн бұрын
@@josephvanname3377 In any case, great visualization! Many people believe that neural networks perform very well outside of the training distribution, but that is not the case, and your video demonstrates this well.
@mr.sheldor794
@mr.sheldor794 20 күн бұрын
Oh my god it is screaming for help
@maburwanemokoena7117
@maburwanemokoena7117 20 күн бұрын
Neural network is the mother of all functions
@josephvanname3377
@josephvanname3377 17 күн бұрын
The universal approximation theorem says that neural networks can approximate any continuous function they want in the topology of uniform convergence on compact sets. But there are other topologies on spaces of functions. Has anyone seen a version of the universal approximation theorem where the network not only approximates the function but also approximates all derivatives up to the k-th order uniformly on compact sets?
@matteopiccioni196
@matteopiccioni196 5 күн бұрын
14 minutes for a 1D function come on
@josephvanname3377
@josephvanname3377 5 күн бұрын
It takes that long to learn.
@matteopiccioni196
@matteopiccioni196 5 күн бұрын
@@josephvanname3377 I know my friend, I would have reduced the video anyway!
@josephvanname3377
@josephvanname3377 5 күн бұрын
@@matteopiccioni196 Ok. But a lot has happened in those 14 minutes since the network struggles so much.
@Neomadra
@Neomadra 6 күн бұрын
This video is very misleading since the sine function is a very bad example to demonstrate how the model is not able to extrapolate. Because the sine is not a trivial mathematical operation, it's an infinite series, a Taylor series. No finite neuronal network, also not your brain, can extrapolate this function on a infinite domain. It might be that the model really wants to extrapolate, but it will never have enough neurons to perform the computation. Probably that's indeed the case because looking at the plot it really looks like it's doing the Taylor series for the sine function, which is the absolutely optimal thing to do! Neuronal networks are just not suited for this, that's why we use calculators for these kinds of things. It's like asking the model to count to infinity
@strangeWaters
@strangeWaters 6 күн бұрын
If the neural network had a periodic activation function it could fit sin perfectly though
@josephvanname3377
@josephvanname3377 5 күн бұрын
There is a big gap between the inability to extrapolate over the entire field of real numbers and the inability to extrapolate a little bit beyond the training interval. And polynomials can only approximate the sine function on a finite interval since an nth degree polynomial has n roots (on the complex plane counting multiplicity).
Space oddities - with Harry Cliff
54:22
The Royal Institution
Рет қаралды 504 М.
Haha😂 Power💪 #trending #funny #viral #shorts
00:18
Reaction Station TV
Рет қаралды 7 МЛН
Must-have gadget for every toilet! 🤩 #gadget
00:27
GiGaZoom
Рет қаралды 4,4 МЛН
The Next Generation Of Brain Mimicking AI
25:46
New Mind
Рет қаралды 110 М.
How to Create a Neural Network (and Train it to Identify Doodles)
54:51
Sebastian Lague
Рет қаралды 1,8 МЛН
I Made a Neural Network with just Redstone!
17:23
mattbatwings
Рет қаралды 600 М.
Why Does Diffusion Work Better than Auto-Regression?
20:18
Algorithmic Simplicity
Рет қаралды 193 М.
New 7-direction pencil model discovered!
25:24
Stand-up Maths
Рет қаралды 243 М.
The Oldest Unsolved Problem in Math
31:33
Veritasium
Рет қаралды 8 МЛН
How to train simple AIs to balance a double pendulum
24:59
Pezzza's Work
Рет қаралды 92 М.
The Mystery of Spinors
1:09:42
Richard Behiel
Рет қаралды 747 М.
10 weird algorithms
9:06
Fireship
Рет қаралды 1,1 МЛН
How charged your battery?
0:14
V.A. show / Магика
Рет қаралды 6 МЛН
Main filter..
0:15
CikoYt
Рет қаралды 7 МЛН
сюрприз
1:00
Capex0
Рет қаралды 1,5 МЛН