Thank you for this. I haven't noticed this paradigm shift, but upon hearing the magic names Kolmogorov & Arnold you surely got my attention. 👀👀👀
@JoelRosenfeld8 ай бұрын
It’s a pretty neat approach to learning theory. It’s still in the early adoption stage, but exciting nonetheless!
@lahirurasnayake52858 ай бұрын
I think the pun "got my attention" was intended 😄
@JoelRosenfeld8 ай бұрын
@@lahirurasnayake5285 really, it’s all you need
@TranquilSeaOfMath8 ай бұрын
Great video and interesting topic. I like that you had part of the paper visible for viewers to read. You have a nice library there.
@JoelRosenfeld8 ай бұрын
Thank you! I’m glad you enjoyed the video
@esantirulo7218 ай бұрын
With some Marvel books! (on right of the Matlab book). Anyway, nobody is perfect, Thanks for the video!!!
@foreignconta6 ай бұрын
Got recommended and immediately subscribed.
@JoelRosenfeld5 ай бұрын
Welcome!
@mohdherwansulaiman51317 ай бұрын
I have tried KAN in 2 prediction problems: SOC estimation and chiller energy consumption prediction. It works.
@avyuktamanjunathavummintal88108 ай бұрын
Great video! Eagerly awaiting your next one! :)
@JoelRosenfeld8 ай бұрын
Me too lol! I’ve been digging through the proofs to find one that is digestible for a KZbin video. I actually had a whole recording done only to realize it wouldn’t work too well. Trying to give the best video possible :)
@avyuktamanjunathavummintal88107 ай бұрын
@@JoelRosenfeld , (I understand your desire for perfection, but) you know, I'd rather you upload that recorded video. 😅:)
@JoelRosenfeld7 ай бұрын
@@avyuktamanjunathavummintal8810 I appreciate that. I should have a new video up this weekend. Still need time to make that video breaking down the theory, but I have something that’ll hopefully bridge the gap a little.
@peterhall66568 ай бұрын
This sort of high level discussion is always seductive. I look forward to the nuts and bolts to see whether the first date performance can be sustained.....
@JoelRosenfeld8 ай бұрын
Absolutely, we are still seeing the beginnings of this method. I’m optimistic, but we will see!
@Jay-di7nl8 ай бұрын
I have doubts about KAN’s ability to solve real-world or complex problems. One concern is that, although this method shows its capability with simple known functions, real-world problems might involve more complex equations or may not have a clearly defined equation at all.
@JoelRosenfeld8 ай бұрын
I think the interpretability claims of the paper are largely bogus for this exact reason. That’s why I didn’t mention it here in this video. However, the real contribution of this method is towards resolving the curse of dimensionality, and since every continuous function has a representation like this, there is a real chance it can work out.
@rjbaw8 ай бұрын
@@JoelRosenfeld why not just use kernels instead
@JoelRosenfeld8 ай бұрын
@@rjbaw kernels work great for approximations using small to moderate sized data sets. For instance, interpolation using Gaussian RBFs converges extremely fast as the data points become more and more concentrated. When it comes to truly large data sets, training kernel based networks can get prohibitively expensive. That’s why machine learning engineers have turned to deep neural networks with ReLU activation functions, where the identification of good weights can be computed much faster. KAN promises to be even faster than neural networks for training by magnitudes. They aren’t yet, but we have just started using them.
@rjbaw8 ай бұрын
@@JoelRosenfeld I am curious as to what is the problem with very large datasets. Doesn't this largely depend on the approximation methods to compute the kernel. Especially as the dimensions increases, the data concentrates. Although it is true that ReLU activation functions are easy compute the main reason people switch to ReLU is because of the gradient vanishing problem.
@JoelRosenfeld8 ай бұрын
@@rjbaw Computing the kernel functions themselves is often not really an issue, unless you are working directly with features. That's actually their advantage, is that he computation of a kernel itself is simple, you just evaluate a function. The complication is when you are trying to do approximations in kernel spaces (using linear combinations of kernel functions), like for SVMs. Ultimately, you will need to invert a Gram matrix to get your weights, and that matrix inversion is costly. Moreover, as your data gets denser, you are going to get some ill conditioning to your algebra problem, and that is going to make it difficult to find accurate weights. So for really big data problems, you have to invert really large dense and possibly ill conditioned matrices. Large scale problems use a combination of neural networks and sparsity to find good solutions. Those aren't always available in the kernel setting. The linear systems you get from kernels are often not sparse, and that leads to difficulties. It could be I'm overlooking something right now. Just trying to get you a response right before a meeting. But that's my quick two cents on it.
@agranero65 ай бұрын
I have read Arnold book a few (a lot) of time. It seems trivial at the first pages and then you notice that is not. It is dense. Makes you think. And I realize I must read it again.
@qusaikhaled96573 ай бұрын
would you consider KAN similar to ANFIS?
@BonheurMaison3 ай бұрын
I may be wrong but the KART seems to work for functions taking inputs in the [0,1] range, have I understood correctly?
@wlatol65123 ай бұрын
Can anyone give me an insight on how to use these for image classification?
@Lilina34567 ай бұрын
Hii thank you so much for the video. Can you please do the implementation of KANs?
@JoelRosenfeld7 ай бұрын
That’s the plan! Grant writing and travel have slowed me down this summer, but it’s coming. I have three videos in the pipeline and that I think is the third one.
@kev25828 ай бұрын
"may not be learnable" is the core problem as you seem to be well aware. Function approximation at collapsed dimension with nice properties is very very hard. There are many problem beyond this level as well.
@Morrning_Group7 ай бұрын
Thank you for this incredibly informative video on Kolmogorov Arnold Networks! 🤯💻 It's such a deep dive into machine learning concepts, and I appreciate how you break down complex ideas into understandable explanations. 🌟 I'm curious about the future direction of your channel. 🚀 Are there plans to delve deeper into specific machine learning architectures like Kolmogorov Arnold Networks, or will you explore a broader range of topics within the field? 🤔📈 Additionally, do you have any upcoming collaborations or special projects in the pipeline that your viewers can look forward to? 🌐🔍 Keep up the fantastic work!
@naninano88138 ай бұрын
4:19 i have exact same book in my library but never made a connection tbh that current KAN hype in AI world is connected the same Arnold
@JoelRosenfeld8 ай бұрын
It’s really a great textbook. Arnold did a lot of great work, and was nominated for a Fields medal back in 1974. I’ll talk more about it in my next video.
@nias26318 ай бұрын
Maybe I am somehow pattern matching, but the KAN has a strong similarity to the encoding layer of a convolutional conditional neural process (CCNP), which is built upon the reproducing kernel theorem. In particular I have the sense that it is related to the Lorenz variant of the K-A theorem. I'd be curious on your thoughts in this. Good channel btw!
@JoelRosenfeld8 ай бұрын
I’ll have to take some time to look into it. I haven’t encountered CCNP, but I do love kernels!
@beaverbuoy30118 ай бұрын
Awesome, thank you.
@JoelRosenfeld8 ай бұрын
You’re welcome! Thanks for watching!
@pierce83088 ай бұрын
Hi, What prevents us from using a linear Relu layer (deep/wide) instead of splines ? That seems much more simpler and efficient right ? Specifically, why did authors choose splines over simple linear layers (relu activated), as the authors themselves were concerned about computational efficiences. It seems to me that this simply adds feature expansion to deep nets
@JoelRosenfeld8 ай бұрын
To be honest, nothing stops you there. Splines are just really good one dimensional approximators, have clear convergence rates, and form a partition of unity. Another reason is probably a concern of messaging. If they included a neural network within the approximation scheme of KAN, it could have led to a lot of confusion between the NN layers and the KAN layers.
@JoelRosenfeld8 ай бұрын
Also, if you are working with ReLU activation functions in a single layer, then that is just a first order spline
@pierce83088 ай бұрын
@@JoelRosenfeld I see, thanks !
@agranero65 ай бұрын
Splines makes it more numerically stable. Using other functions like polynomials make a small change in X's cause great changes. If you try to apply backpropagation on that you probably won't get it to converge. The splines choice was not arbitrary. This is a well know property of splines. Numerical stability and splines are intimately connected. B splines are differentiable and their derivatives are too; more important, B splines can be locally controlled to adjust you can change the curve only locally (you can see why they are stable?) by changing just the control point related to that part, this is important because avoids the problem of catastrophic forgetting because global parameters changes can propagate explosively to far regions of the network; and they can be easily written on a parametric form ideal for this purpose because they change the curves and can be trained. Using splines also allows make the networks sparse that is a very important property neared to biological neural networks and very important to save resources and allow you retrain parts of the network, MLPs are a big monolith and it is difficult to locally make point wise changes and find internal representations (it is all in the article with a few additions by myself). Undertand that the activation function are learnable from their parameters no the weights. I learned many of those things because I was writhing a video game and fell into a rabbit hole of numerical stability, splines Catmull-Rom, etc, etc, but that is beyond the scope of my answer. Just a thought experiment to make you think: take an angle each line of the angle make equally spaced points draw lines between all points on each line why do you get that smooth curve doing that, a parabola for instance. Ignore the cynical trolls. The main problem was ignored in the video and the comments (including the trolls: you can't run that on GPUs)
@agranero65 ай бұрын
@elirane85 Please read the article and understand the relation of B splines and stability before posting statements without any base. Just because he didn't explained in the video it doesn't mean they didn't addressed the issue. The B spline choice was far from arbitrary. Try to keep this in mind.
@johnfinlayson75598 ай бұрын
Awesome video man
@JoelRosenfeld8 ай бұрын
Thank you!
@kabuda19498 ай бұрын
Great video. Can you do an example of solving, let's say, pdes using kernel methods?. for example, kernel as the Gaussian function?
@JoelRosenfeld8 ай бұрын
Collocation methods for PDEs are in the plans. First we should build up convergence rates with kernel approximations too.
@kabuda19498 ай бұрын
@@JoelRosenfeld makes sense. Can't wait
@tablen28968 ай бұрын
Hey, great video. If you take constructive(?) criticism (and you may have already noticed when editing), you should reduce the sensitivity of focus adjustment of your recording device, or set it to a fixed value. It tries to focus the background (computer screen) and foreground (your hands) instead of you.
@JoelRosenfeld8 ай бұрын
lol yeah I noticed. Actually put b roll over some especially bad spots. I almost recorded again. I’ll look into it. Something I struggle with
@agranero65 ай бұрын
Oh that what was making the background blink and the focus continuously change. I got puzzled by it.
@chanjohn54668 ай бұрын
Do you have a playlist on machine learning series?
@JoelRosenfeld8 ай бұрын
There is an older playlist for Data Driven Methods in Dynamical Systems. There I talk about how to use machine learning and operator theory to understand time series. I have a newer playlist focused on the Theory of Machine Learning. So far, it has a bunch of videos going into various aspects of Hilbert spaces and Fourier Series. But it's the one I'm working on right now. kzbin.info/aero/PLldiDnQu2phtAR82SxYoEB46_3BUuvJKe&si=yiyvY-BZydP3tUAg
@DJWESG18 ай бұрын
I thought the point was to embed a core logic to reduce size.
@Charliethephysicist7 ай бұрын
KAN draws inspiration from the Kolmogorov-Arnold representation theorem, though it significantly diverges from and falls below the theorem's original intent and content. It confines its form to compositions of sums of single-variable smooth functions, representing only a tiny subset of all possible smooth functions. This confinement eliminates, by design, the so-called curse of dimensionality. However, there is no free lunch. It is seriously doubtful that this subset is dense within the entire set of smooth functions --- though I have not come up with an example yet. If it is indeed not dense, KAN will not serve as a universal function approximator, unlike the multilayer perceptron. Nonetheless, it may prove valuable in fields such as scientific research, where many explicitly defined functions tend to be simple, even if they do not approximate all possible smooth functions.
@JoelRosenfeld7 ай бұрын
Since I saw your message last night, I have been thinking about this. I think universality is going to be fine. The inner functions in the Kolmogorov Arnold Representation are each continuous. Splines can arbitrarily well approximate continuous functions over a compact set, so we could approximate each one to say some epsilon >0. The triangle inequality tells us that the overall error between those approximations is bounded by n times epsilon, where n is the dimension of the space. So the approximation of the inner function is fine. The only twist comes with the outer function. The inner functions are all continuous images of compact sets, so if we look at the outer functions restricted to these sets, we are looking for an approximation on the compact image of the inner functions. We can get a spline approximation of the outer functions like that as well that is within some prescribed epsilon. To make sure everything meshes together well, you need something like Lipschitz continuity on the outer functions. That has never been included in the description of them, because the theorems are for general functions, rather than being restricted to smooth functions or other classes. Picking through the proofs, I think it'd be straightforward to get Lipschitz conditions on the outer functions, when the function you are representing is also Lipschitz. With all of those together, I think that basically takes care of what you would need for universality.
@Charliethephysicist7 ай бұрын
@@JoelRosenfeld You rationale is based on multidimensional smooth function approximation. This is precisely what does not apply in this situation. Each function in KAN, no matter which layer, is one-variable and smooth. The latter property prevents the original proof of the Kolmogorov-Arnold theorem, and the former prevents the Taylor expansion proof for Sobolev space functions approximation which I think you are talking about, to go through. Moreover, there is no essential distinction in KAN between the inner and outer function unlike in the KA theorem. The layers are simply recursive stacking. You are trading off between the curse of dimension from universality and simplicity. There is no free lunch.
@JoelRosenfeld7 ай бұрын
@@Charliethephysicist Ok, I'll give it some more thought. I personally think that there is a good chance of universality pulling through here. But, you never know until you have a proof or a counter example.
@Charliethephysicist7 ай бұрын
@@JoelRosenfeld Examine theorem 2.1, which is the crux theorem, of the KAN paper and see what the premise is as well as its proof steps to see universality is nowhere to be found. To be honest the authors of the paper should have made this point much clearer instead of letting only the experts decipher their claim.
@jmw15008 ай бұрын
Interesting. What are your thoughts on neural operators and operator learning?
@JoelRosenfeld8 ай бұрын
I haven't looked too deeply into Neural Operators, so it's hard for me to really say much in that direction. As far as Operator Learning goes, that's pretty much what I have been doing for the past 6 years or so. I think there are a lot of advantages of working operators into an approximation framework, where it can give you more levers to pull from.
@mrpocock8 ай бұрын
I don't think there's anything in principle preventing a KAN layer or layers from being put inside a normal deep network. So there may be a space of interesting hybrids that do interesting things. For example, the time-wise sum of a time- wise convolution with a (2 layer?) can learn to perform attention, without needing all those horrible fourier features.
@agranero65 ай бұрын
They say that in the article.
@tamineabderrahmane2488 ай бұрын
i think that KAN will be stronger than MLP in physics informed neural networks
@RSLT8 ай бұрын
Very Cool!
@sashayakubov69248 ай бұрын
Hold on, I'm reading Wikipedia, turns out that Karatsuba (the one who invented his multiplication algorithm) is a student of Kolmogorov!!!
@JoelRosenfeld8 ай бұрын
That’s really cool!
@InstaKane8 ай бұрын
Right…..
@petersuvara8 ай бұрын
Who needs maths when AI does it all for us. Oh... yeah... it's not AI. :) It's MATHS! :)
@Adventure18448 ай бұрын
its an old method from 2021: kzbin.info/www/bejne/m4TCnGmCa5hroZI
@JoelRosenfeld8 ай бұрын
The use of Kolmogorov Arnold Representations as a foundation for Neural Networks goes back at least as far as the 1980s and 1990s. In fact, I think the Kolmogorov Arnold Representation as a two layer neural network appeared in the first volume of the Journal Neural Networks. You can find more if you look into David Sprecher’s work, who has worked on this problem for 60 years. The innovation for this work comes in the form of layering, and the positions it as an alternative to deep neural networks, but with learnable activation functions.
@sashayakubov69248 ай бұрын
Both Russian scientists? Kolmogorov is OK, but "Arnold" does not sound like a Russian surname at all.
@JoelRosenfeld8 ай бұрын
Vladimir Arnold was indeed a citizen of the USSR. The Soviet government actually interceded to prevent him from getting a fields medal because Arnold spoke out against their treatment of dissidents.
@mishaerementchouk4 ай бұрын
@@sashayakubov6924 it’s of the Prussian origin. They were citizens of the Russian Empire since (at least) the first half of the 19-th century.