This is a game changer! (AlphaTensor by DeepMind explained)

  Рет қаралды 182,175

Yannic Kilcher

Yannic Kilcher

Күн бұрын

Пікірлер: 271
@YannicKilcher
@YannicKilcher 2 жыл бұрын
OUTLINE: 0:00 - Intro 1:50 - Sponsor: Assembly AI (link in description) 3:25 - What even is Matrix Multiplication? 6:10 - A very astounding fact 8:45 - Trading multiplications for additions 12:35 - Matrix Multiplication as a Tensor 17:30 - Tensor Decompositions 20:30 - A formal way of finding multiplication algorithms 31:00 - How to formulate this as a game? 39:30 - A brief primer on AlphaZero / MCTS 45:40 - The Results 48:15 - Optimizing for different hardware 52:40 - Expanding fundamental math 53:45 - Summary & Final Comments
@CosmiaNebula
@CosmiaNebula 2 жыл бұрын
TLDR: ## The problem Given two matrices A, B of size mxn, nxl, find a way to compute AB, such that we do as few scalar multiplications as possible (add and subtract are free). Reformulate the problem to make it amenable to AI: given the 3D "matrix multiplication tensor" T of size (mn)x(nl)x(ml), find a decomposition into T = ∑i ui⊗vi⊗wi, where each ui, vi, wi is a vector with entries from {-1, 0, 1}. If this decomposition has r terms, then we have a corresponding way to compute AB using only r scalar multiplications. ### TensorGame Now create the 1-player TensorGame: given a 3D tensor T, ask it to make moves of the form ui⊗vi⊗wi, where each ui, vi, wi is a vector with entries from {-1, 0, 1}. The state of the game after the move is (T - ui⊗vi⊗wi). The game ends either when the game state is 0, or if the player has played K rounds and thus "ran out of time" (where K is an upper bound to the rank of T). The reward at each step is (-1), to encourage the agent to play as fast as possible. If the agent runs out of time, then the reward for the last step is (-r), where r is an upper bound to the game state in the end. ### Agent The agent is a RL agent "AlphaTensor". It, like AlphaZero, uses a deep neural network to guide a Monte Carlo tree search (MCTS) planning procedure. It is trained on randomly generated 3D tensors constructed by adding randomly generated ∑i ui⊗vi⊗wi. TensorGames are easy to make like that. ## Results After some training, the resulting agent is able to play the TensorGame very well. When applied to matrix multiplication tensors, it found some ways to do better than previous human-found techniques. For example, when A, B, are both 2x2, the previous human-found technique is Strassen algorithm, taking 7 multiplications. The agent replicated it. When A, B are both 4x4, the previous best is 49 multiplications (recursive Strassen algorithm). The agent found a 47 multiplication method. By recursively applying the Strassen algorithm, one can compute (nxn) times (nxn) matrix multiplication in O(n^{log_2(7)}) = O(n^{2.8074}) time. I went through their table of results, and found that the best exponent they have found is the "47 scalar multiplications for 4x4 matrix multiplication". This one achieves asymptotic complexity of O(n^{log_4(47)}) = O(n^{2.777}). Consider what they achieved: It took humans 10 years to go from O(n^{2.8074}) in Strassen (1969) to O(n^{2.780}) in Bini, Capovani, Romani (1979). and they managed to do it by an entirely general RL search. I have no doubt that if they can find a way to cast the Coppersmith-Winograd algorithm as a game, the RL will out-perform it as well. I scoured their papers and couldn't find out how much time it took to train the agent, other than somewhere around 1 million steps?. By Google standards, I guess that means it took them a month. So, 1 month vs 10 years. BUT WAIT THERE"S MORE They also found the optimal algorithm for multiplying a nxn skew-symmetric matrix and a vector. Previous SOTA (2018), it was known to take at most n^2 multiplications, (see "Fast structured matrix computations: tensor rank and Cohn-Umans method"). They found a method taking only 0.5(n-1)(n+2) multiplications. This is obviously the asymptotically optimal number (since a nxn skew-symmetric matrix has ~ 0.5 n^2 different entries). BUT WAIT THERE'S MORE They ran the same algorithm but this time scoring its strategy based not on how many scalar multiplications it takes, but how much time it takes to run on a particular hardware (GPU or TPU). It found improvements as well, and these improvements are specific to GPU or TPU (using the optimized algorithm for GPU on TPU would yield a much lower improvement, and vice versa). > AlphaTensor finds algorithms with a larger number of additions compared with Strassen-square (or equivalently, denser decompositions), but the discovered algorithms generate individual operations that can be efficiently fused by the specific XLA grouping procedure and thus are more tailored towards the compiler stack we use. The algorithms found by AlphaTensor also provide gains on matrix sizes larger than what they were optimized for.
@tlf4354
@tlf4354 2 жыл бұрын
@@CosmiaNebula Thanks for this TL;DR~
@tlf4354
@tlf4354 2 жыл бұрын
@@CosmiaNebula TL;DR 🖥 🧠 🔢 ⏫ 😱
@pouet4608
@pouet4608 2 жыл бұрын
@@CosmiaNebula but are the found algorithms proven afterwards? Because finding an algorithm is half the work. Proving it works, not only by some results but conceptually is the rest. I Don t understand how they can use discovered algorithms if they cannot proof it is true.
@CosmiaNebula
@CosmiaNebula 2 жыл бұрын
@@pouet4608 verifying the algorithms is trivial algebra
@ztangent3188
@ztangent3188 2 жыл бұрын
Actually when the fast matrix multiplication algorithm is applied recursively (to big matrix), not just the multiplication number reduces, so is the addition. This is because each sub-matrix multiplication requires super-linear O(m^(2+eps)) operations, while sub-matrix addition still takes O(m^2). So we are not trading multiplications for faster additions - both operations decrease.
@Neptutron
@Neptutron 2 жыл бұрын
Yes - this!! Please upvote the OP's comment! It should be emphasized that these improvements are not just constant factors - they improve the asymptotic complexity of matrix multiplication for certain matrix sizes in a practical way!
@nonamehere9658
@nonamehere9658 2 жыл бұрын
+1, Yannic should've at least mentioned Strassen's recursive matrix multiplications algorithm.
@dr.mikeybee
@dr.mikeybee 2 жыл бұрын
@@nonamehere9658 Yannick shows it in the paper.
@CosmiaNebula
@CosmiaNebula 2 жыл бұрын
For example, when A, B, are both 2x2, the previous human-found technique is Strassen algorithm, taking 7 multiplications. The agent replicated it. When A, B are both 4x4, the previous best is 49 multiplications (recursive Strassen algorithm). The agent found a 47 multiplication method. By recursively applying the Strassen algorithm, one can compute (nxn) times (nxn) matrix multiplication in O(n^{log_2(7)}) = O(n^{2.8074}) time. I went through their table of results, and found that the best exponent they have found is the "47 scalar multiplications for 4x4 matrix multiplication". This one achieves asymptotic complexity of O(n^{log_4(47)}) = O(n^{2.777}). Consider what they achieved: It took humans 10 years to go from O(n^{2.8074}) in Strassen (1969) to O(n^{2.780}) in Bini, Capovani, Romani (1979). and they managed to do it by an entirely general RL search. I have no doubt that if they can find a way to cast the Coppersmith-Winograd algorithm as a game, the RL will out-perform it as well.
@blackpepper2610
@blackpepper2610 2 жыл бұрын
no
@derasor
@derasor 2 жыл бұрын
Could an even more efficient method be discovered by a next iteration of AlphaTensor? That would be truly mindblowing, but if not (meaning there is a hard limit on the most basic operation of machine learning), then there are IMO 2 takes from this achievement. 1- There definitely won't be a fast take off when AgI is implemented, and all the fear-mongering on AI being an eXisTEnTiAL thREat was just the overblown conclusion to a very defficient thought experiment. 2- Hardware engineers have a golden opportunity to design machines that implement this algorithm in the lowest levels.
@muskyoxes
@muskyoxes Жыл бұрын
Let's train on 3SAT just in case P=NP (although, if it does, seems like we'd have to train a super long time)
@yonikremer4135
@yonikremer4135 2 жыл бұрын
Did they try finding the algorithem for large metrices?
@gorojo1
@gorojo1 2 жыл бұрын
Why has no one figured this out before? It seems so obvious.
@JamesAwokeKnowing
@JamesAwokeKnowing 2 жыл бұрын
This is dangerous and you should be ashamed for enabling people to multiply so efficiently. Now any clown can multiply some really harmful things.
@sn0wbr33z3
@sn0wbr33z3 2 жыл бұрын
The end is near
@NoNameAtAll2
@NoNameAtAll2 2 жыл бұрын
so you got sponsored by voice-to-text company... and your video STILL has no subtitles???
@antopolskiy
@antopolskiy 2 жыл бұрын
thank you, Yannic. I love these paper explaining videos. it is really hard for me to focus on reading papers. You explain the main points very clearly in a fraction of time it would take me to read it.
@LarryPanozzo
@LarryPanozzo 2 жыл бұрын
We literally don’t have time to read every great paper these days. Summaries like these do save the day.
@DRealHammer
@DRealHammer 2 жыл бұрын
-2 and 2 make sense as you can use a bitshift for the multiplication, so it is basically free and gives a bigger space of possible solutions.
@DRealHammer
@DRealHammer Жыл бұрын
@@h..h can you reformulate the question?
@AlexanderShamov
@AlexanderShamov Жыл бұрын
​@@h..h Why not? Just add 1 to the exponent, it's still much faster than general floating point multiplication.
@vladimirtchuiev2218
@vladimirtchuiev2218 Жыл бұрын
I guess so is -4, 4, -8, 8 or any n with 2^n and -(2^n). But I guess the capacity to search in this space is more limited.
@DRealHammer
@DRealHammer Жыл бұрын
@@vladimirtchuiev2218 I think so too! However, my intuition also is, that the benefits of these high factors are not as great as adding the factors 2/-2.
@cmilkau
@cmilkau 2 жыл бұрын
"A matrix is a square box of numbers" - AlphaTensor can also multiply rectangular matrices, and the paper gives an example. Of course 2x2 and maybe 4x4 matrices are of specific interest because they can be easily generalized and implemented in algorithms for large matrices.
@Zed_Oud
@Zed_Oud 2 жыл бұрын
Received October 2021 Published October 2022 A year and 3 days.
@cmilkau
@cmilkau 2 жыл бұрын
The reason that addition is "cheap" is not so much CPU's, these can do number multiplication in a single cycle, too. It's because the entries are usually matrices themselves (instead of numbers) and adding matrices is just adding their entries. Full story: Its when you use these results to multiply huge matrices. You do this using "block matrix multiplication". For instance, to multiply two 1000x1000 matrices, you can reinterprete these as 2x2 matrices, but each of the eight entries are 500x500 matrices ("blocks") themselves. Obviously, if you use a 2x2 multiplication formula, each multiplication in that formula actually corresponds to multiplying two 500x500 matrices (which you can do recursively applying your formula). The naive formula takes n×n×n multiplications to multiply two n×n matrices, but only n×n additions to add two such matrices. As you can see, this is a huge difference for large n. This difference stays huge even for "clever" versions of multiplication.
@cmilkau
@cmilkau 2 жыл бұрын
@@leeroyjenkins0 Ok, maybe, but I don't like emphasis put on that because it really isn't as relevant. The matrices that are actually relevant in state-of-the-art ANN have n in the thousands to tens of thousands. A direct Tensor decomposition would be in the order of millions to billions actions. So, the block multiplication case is the real relevant case. Of course, if you want to do actual graphics, 4x4 is your most common size and directly improving that can vastly improve the characteristics of your graphics chip or driver.
@apothum
@apothum 2 жыл бұрын
Production quality on this video was fantastic. For instance, the simple things like the blue background and circle are nice touches.
@BjarkeEbert
@BjarkeEbert 2 жыл бұрын
One thing I have wondered: we use multiplications because it's mathematically convenient. But what if we use some cheaper operations on the layer inputs, and "just learn" to parametrize them correctly. Who cares if it's exact multiplication - what matters is that the NN gets the job done. The learning could maybe work around the "incorrect but fast" multiplication
@JurekOK
@JurekOK 2 жыл бұрын
"Analogue computers" do that -- they exploit the non-linearity of silicon diode in the 0..1V region of conduction. As long as you have some nonlinear operation, you can have a neural network. Multiplication happens to be a convenient one for **digital** computers.
@Fordance100
@Fordance100 2 жыл бұрын
No, you have to go through nn to estimate it which would is composed a bunch of matrix multiplications, end up doing much more work than just doing the matrix multiplication straight.
@appletree6741
@appletree6741 2 жыл бұрын
You would need a “incorrect but fast” backprop as well, it’s not clear to me that backprop still works if you break multiplication
@adityay525125
@adityay525125 2 жыл бұрын
Thank you Yannic, I missed these older style paper explainers
@vanderkarl3927
@vanderkarl3927 2 жыл бұрын
This almost sounds too good to be true. Improving the speed of matrix multiplication... that basically speeds up EVERYTHING nowadays.
@hellokingston-z7g
@hellokingston-z7g 2 жыл бұрын
The progress is really far smaller than the hype. Those in the theory community are not quite excited because the improvement happens not in real numbers, but in a field called F2, which only contains 0 and 1 and is not used by most people. Putting this restriction aside, the modest improvement is with respect to a result more than 50 years ago..... So why cares. From a practical perspective, the value is even smaller. If you really want to do matrix multiplication in real life, you have to consider numerical stability, memory allocation, and a bunch of other issues, and this is why Strassen algorithm is not used in the first place. So the experiment is really unconvincing to me and those who have experience in high-performance computing. It is definitely a cool result, but really not as big as most people think.
@StoianAtanasov
@StoianAtanasov 2 жыл бұрын
Does this mean that the small speedups in matrix multiplications compound to huge speedups when running a Deep Network model with hundreds of matrix multiplications?
@PieGuyX1000
@PieGuyX1000 2 жыл бұрын
Oh shiiii…
@mujtabaalam5907
@mujtabaalam5907 Жыл бұрын
27:40 I think there's a mistake: the modification comes from the 5th column, not the 7th, because we're looking at the tensor where the top row of u is non-zero
@mgostIH
@mgostIH 2 жыл бұрын
I think it's also important besides the domain they applied it on (matrix multiplication algorithms), since it shows that reinforcement learning via deep learning is strong enough to explore extremely large discrete spaces to optimize. I can expect that in the future most NP-Hard problems we'll want to tackle will be attacked by these approaches rather than heuristics.
@Jiftoo
@Jiftoo 2 жыл бұрын
real
@AufBerghofNAM
@AufBerghofNAM Жыл бұрын
nphard has incomplete m-sets, but solutions are provable in post
@ScottSummerill
@ScottSummerill 2 жыл бұрын
Received October 2, 2021?! Published a year later?
@mildpass
@mildpass 2 жыл бұрын
I would have liked to hear your comments on the structured matrix multiplication part of the paper. This is potentially the biggest speedup to linear algebra presented in the paper. Their skew symmetric matrix algorithm beats the state of the art by a factor of 2. Not 2 multiply operations, half as many as the previous best! Unfortunately the structured matrix section of the paper read a bit vague to me. Particularly the "extrapolation" part of the skew symmetric algorithm to larger matrices looked like it was done manually by a human. Would be really cool to see an ML algorithm be able to automate extrapolation of those algorithms. To me that would be the holy grail of AI optimized linear algebra.
@andrewdunbar828
@andrewdunbar828 Жыл бұрын
Hey thanks for explaining this in a way I can almost understand. English tip for you in appreciation: "long-winded" is not about winding it's about the wind blowing. Dunno why.
@Obe_Qwaet
@Obe_Qwaet 2 жыл бұрын
I'm actually a little bit mad right now because I proposed doing this during high school and college and was told that's not the correct way to multiply matricies. Seeing an academic paper on it... I'm not sure how to feel atm.
@HUEHUEUHEPony
@HUEHUEUHEPony 2 жыл бұрын
You had low IQ instructors
@AlexanderBukh
@AlexanderBukh 2 жыл бұрын
Some faster ways were found decades back, now they only automated finding new ones.
@AlexanderBukh
@AlexanderBukh 2 жыл бұрын
Also, isn't it always "the wrong way to do it" until someone comes forward and _proves_ them(sayers) wrong?
@kikc
@kikc Жыл бұрын
If I were you I would show this to every human that ever wronged you and then kill them brutally for costing your future
@rmajdodin
@rmajdodin Жыл бұрын
Next time you have a great idea, just implement it. People love to see sth working.
@-E42-
@-E42- 2 жыл бұрын
amazing how they find out about this now, 1000 trillion matrix multiplications and thousands of math lib implementations later :D
@serta5727
@serta5727 2 жыл бұрын
This adds up 😉
@khoda81
@khoda81 2 жыл бұрын
I can already see articles titled "DeepMind made an ai that can make itself faster!".
@rxphi5382
@rxphi5382 2 жыл бұрын
As a new CS student this kind of content is very interesting, even tough I don't understand everything. It motivates me to keep learning. Thank you Yannic!
@w花b
@w花b Жыл бұрын
@Greg LeJacques w-what..?
@janasandeep
@janasandeep 2 жыл бұрын
"These algorithms multiply large matrices 10-20% faster than the commonly used algorithms on the same hardware" as per deepmind blog.
@isaacandrewdixon
@isaacandrewdixon 2 жыл бұрын
AI accelerating AI. Every day we are closer to the singularity.
@michaelnurse9089
@michaelnurse9089 2 жыл бұрын
Strictly speaking we get closer but the whole idea of singularity is that every step gets easier on the path. Rather the real situation is that we have made 27 steps on a 100 step path and each step gets harder with costs ballooning exponentially in some way. In other words, there will be no singularity with evolution of neural nets as after 32 steps any further steps costs more than nation states can afford.
@isaacandrewdixon
@isaacandrewdixon 2 жыл бұрын
@@michaelnurse9089 We're so close. The rate of progress is rapidly increasing.
@minimath5882
@minimath5882 2 жыл бұрын
Karatsuba's algorithm trades additions with multiplication with numbers back in 1963 and now DeepMind did with it matrices with AlphaTensors in 2022. Amazing
@NoNameAtAll2
@NoNameAtAll2 Жыл бұрын
analog of karatsuba's algo for matrixes existed for decades already deep mind found even better one
@siquod
@siquod 2 жыл бұрын
How much energy would have been saved if these multiplication algorithms had been discovered and universally implemented 3 years ago?
@robrobbins
@robrobbins 2 жыл бұрын
I tried to write a Python program to implement their algorithm for multiplying 4x4 matrices in 47 steps but I suspect they left out the subtractions in their algorithm. There were nothing but additions and multiplications which is unusual. The numbers I got were way too high. You can find two of the algorithms they found in the PDF accompanying the article.
@PrzemyslawDolata
@PrzemyslawDolata 2 жыл бұрын
The 47-step algorithm is for *modular arithmetic*. In this system all the numbers are "modulo p" for some number p, i.e. the numbers "wrap around". E.g. in modulo 5 arithmetic, 4+4 = 3 (it would be 8, but 8 mod 5 is 3). That's why subtractions aren't necessary - numbers will stay confined to the (0;p-1) range by definition. And that's why numbers blew up when you were doing normal arithmetic.
@NoNameAtAll2
@NoNameAtAll2 Жыл бұрын
@@PrzemyslawDolata can't one just take really big p?
@MagicGonads
@MagicGonads Жыл бұрын
@@NoNameAtAll2 I take a really big p sometimes if I drink too much water
@cedricvillani8502
@cedricvillani8502 2 жыл бұрын
The first solution for partial differential equations was a Matrix function that was developed by the founder of MathWorks, in fact the logo of math works is that very matrix 😮😮 better looking 👀 now but go back to the original logo.. and ya fun Snapple Fact 🎉 whatever
@alefratat4018
@alefratat4018 2 жыл бұрын
That's great, but I would still bet that clever product quantization algorithms for approximated matmul are the way to go for faster deep learning.
@mailoisback
@mailoisback Жыл бұрын
So we trade multiplications with additions? But why are additions faster than multiplications? I think on modern CPUs this is no longer true.
@mohaofa1544
@mohaofa1544 Жыл бұрын
try to multiple 2 numbers in binary system and add two numbers in binary system which one will take more operations to give desire result ? remember computers work in Binary system
@videowatching9576
@videowatching9576 2 жыл бұрын
I appreciate the content, so when I feel that I am not having time well spent, then watching and learning about AI on this channel is super useful and interesting.
@drdca8263
@drdca8263 2 жыл бұрын
The action space is larger than for go? Ok, so if each number entry is in {-2,-1,0,1,2} and each vector has n^2 entries, ah, ok, I was initially thinking only n entries, so I was thinking 4 because I was thinking 4x4 matrices. But the vectors representing coefficients of each matrix entry would have n^2 entries. Ok. So, for 2x2 matrices, then there would be (5^(2^2))^3 = (5^4)^3 = 5^12 possible moves, which... is more than 2^24 , which, ok, yeah that’s many more possible moves than there are in Go.
@MagicGonads
@MagicGonads Жыл бұрын
the general action space is of size 5^(3((nmp)^2)) where we are taking an n x m matrix by an m x p matrix
@nadavmihov
@nadavmihov 2 жыл бұрын
In 28:00 isn't it supposed to be the other collum because you check the first row in U?
@mattiasfagerlund
@mattiasfagerlund 2 жыл бұрын
I thought so too, but by pure luck it worked out.
@paxdriver
@paxdriver 2 жыл бұрын
Why the hell didn't those geniuses at numpy figure this out years ago?! 😭
@AlexanderBukh
@AlexanderBukh 2 жыл бұрын
They did, see the table towards the end of the video, the smaller matrices were optimized by people decades ago. This improves larger ones, and by only a few steps. 52:10
@paxdriver
@paxdriver 2 жыл бұрын
@@AlexanderBukh they did but they didn't find the most optimal ones 😜 47 steps vs 48 is a huge difference when working with hundreds of thousands of parameters across many layers and repeating. I'm surprised they didn't use numpy itself to find more optimizations for numpy. There are only what, like 7 or 11 of them or something, right?
@GandalfTheBrown117
@GandalfTheBrown117 Жыл бұрын
Thank you for your videos. You're the Agadmator of ML/AI.
@geld420
@geld420 Жыл бұрын
nice, aber hart am schwitzen junge :). mathe is hart.
@Ou8y2k2
@Ou8y2k2 2 жыл бұрын
It's been so many decades that I don't even remember what a matrix is. Is it that movie with John Wick?
@markdaoust4598
@markdaoust4598 2 жыл бұрын
If you don’t know what matrix multiplication is 😂
@russelldicken9930
@russelldicken9930 2 жыл бұрын
Er . . . you lost me partway through, but I think I got the gist of it. I'm getting too old for this. The implications are huge. I just can't wait to see the compiler implementations!
@zepho17
@zepho17 2 жыл бұрын
Does anyone know the pdf reader Yannic is using? I never have enough space on my margins
@stevey7997
@stevey7997 Жыл бұрын
Really great video, thanks! Just a small mistake at 27:48. You look at the wrong entry in U (but I see how that easily happens of course).
@airazure2050
@airazure2050 2 жыл бұрын
Will new factorizations found in this paper actually make their own ways into existing optimized BLAS (Basic Linear Algebra Subprograms) libraries?
@mareko8940
@mareko8940 2 жыл бұрын
I don't think so. Matrix multiplication is mainly limited due to data loading as far as I'm aware, but please correct me if I'm wrong
@himanshupalve3454
@himanshupalve3454 2 жыл бұрын
I don't think so since BLAS libraries don't use strassen algorithm for Matrix multiplication
@kobi2187
@kobi2187 Жыл бұрын
You'll end up boring the AI... it'll start looking for more interesting games: let's hack this computer, and connect to the global network -- what's that? a nuclear facility in Iran? hmmm... how fast can I spin....
@Alexandre-ur1oi
@Alexandre-ur1oi 2 жыл бұрын
Your presentation does not take into account computer's architecture. OK, you could make less multiplication with fast matrix multiplication but you have to move more data. Unfortunately, moving data takes times... This is why Strassen algorithm (more than 50 years old) is not currently used.
@wenaolong
@wenaolong Жыл бұрын
It's going to make the whole world better off, huh? Well, that's about as naive as you can get.
@nathandfox
@nathandfox Жыл бұрын
Unfortunately, this kind of accelerated matrix multiplications that partition across both rows and columns will have poor numerical stability, so they are not that useful in practice.
@drewduncan5774
@drewduncan5774 2 жыл бұрын
0:11 "I know that sounds a bit boring." This really made me cringe. I understand this is youtube, but at least some of the audience is intelligent enough to understand the importance of this.
@darkmatter9583
@darkmatter9583 2 жыл бұрын
can it be possible an example calculating manually this method? this is a gamechanger also for students in linear algebra, +is the code published?
@drewduncan5774
@drewduncan5774 2 жыл бұрын
Is there any mention of numerical stability of the resulting algorithms in the paper? I've only skimmed it, but I couldn't find anything.
@TheSinTube
@TheSinTube Жыл бұрын
As far as i understand the resulting algorithm is a simple adition and multiplication of the elements in the matices and therefor there shouldn't be any stability issues.
@drewduncan5774
@drewduncan5774 Жыл бұрын
@@TheSinTube There definitely can be. Strassen's two-step algorithm is less stable than the naive algorithm, which is why it is often not used in practice despite being asymptotically faster.
@MagicGonads
@MagicGonads Жыл бұрын
he mentioned only using coefficients in the set {-2,-1,0,1,2} for numerical stability
@drewduncan5774
@drewduncan5774 Жыл бұрын
@@MagicGonads Strassen's algorithm in fact only uses {-1, 0 ,1}.
@aurielklasovsky1435
@aurielklasovsky1435 Жыл бұрын
AI is improving AI related algorithm? Watch out for the singularity!
@user-ml8ld1kl6b
@user-ml8ld1kl6b Жыл бұрын
I get it’s cool and amazing, but what are examples of stuff able to be accomplished thru this
@Myrslokstok
@Myrslokstok Жыл бұрын
Cool with a matrix cube, when you see 2D matrix, you think what about a 3D matrix.
@avantan1980
@avantan1980 Жыл бұрын
It looks like Fast Fourier Transform (FFT) to me. Anyone can confirm this?
@simdimdim
@simdimdim Жыл бұрын
Looking forward to compilers development in 2023
@gobl-analienabductedbyhuma5387
@gobl-analienabductedbyhuma5387 2 жыл бұрын
1 year to public the stuff?? 1 year the world had to wait for it when it was already ready for prime time?
@cmilkau
@cmilkau 2 жыл бұрын
The relationship between matrix multiplication and the Tensor T can be written as C = AB = A:T:B. Here, we use double contractions because the "axis" of T corresponding to A (the one indexing its entries a1,a2,a3,a4) is actually two axis (a1=a11, a2=a12, a3=a21, a4=a22) indexing the rows and columns of A, respectively. Same for B. The result has two remaining axis, indexing the rows and columns of the product C. Decomposition means A:T:B = A:(u1 ⊗ v1 ⊗ w1 + u2 ⊗ v2 ⊗ w2 + ...):B = A:(u1 ⊗ v1 ⊗ w1):B + A:(u2 ⊗ v2 ⊗ w2):B + ... Here u,v,w are actually matrices the same size as A,B,C, respectively. Now A:(u ⊗ v ⊗ w):B can actually be computed using a single multiplication. First, compute A:u, which adds/subtracts entries of A (u only has entries 1,-1 and 0). Second, compute v:B, acting in the same way. Multiply the two numbers to obtain the number m = (A:u)(v:B) = A:(u ⊗ v):B. Now you only have to multiply m and w to get m ⊗ w = A:(u ⊗ v ⊗ w):B. Because w also only has entries +1,-1 and 0, m ⊗ w essentially means replacing the +1's in w by m and the -1's by -m.
@AlericResident
@AlericResident Жыл бұрын
Error at 27:45 ... its the top row of U, not the bottom row, so column 5 causes the cancel, not column 7.
@miniminerx
@miniminerx Жыл бұрын
And now alpha tensor can run faster to find more improvements. That part is crazy
@MenkoDany
@MenkoDany 2 жыл бұрын
What a time to be alive!
@AlexanderBukh
@AlexanderBukh 2 жыл бұрын
Hold on to your GPUs!
@MenkoDany
@MenkoDany 2 жыл бұрын
@@AlexanderBukh Funny you say that, yesterday I sold my 3080 to get a 4090
@cbuchner1
@cbuchner1 2 жыл бұрын
Well great! Now the rabbits will be multiplying even faster.
@fcasadome
@fcasadome 2 жыл бұрын
Wish I could spend time just "thinking" on stuff like this at my job... and get paid for it :)
@michaelnurse9089
@michaelnurse9089 2 жыл бұрын
There are likely jobs that get paid for this type of work: Speeding up 3D rendering with AI (e.g. DLSS) or speeding up compiler e.g. for Microsoft C# come to mind. Speeding up cloud servers at AWS. Very narrow applications but they exist.
@Anders01
@Anders01 2 жыл бұрын
That's a great result. I have been thinking of machine learning training other machine learning algorithms, but to improve other algorithms is also an interesting approach which now is shown to be practically possible.
@philippgerbert1390
@philippgerbert1390 Жыл бұрын
I guess easy to be confused - such as a1/a4 in min 28...
@damarh
@damarh 2 жыл бұрын
I have no idea what this is what is does and why it has done it and what it means that it has done it.
@DamBlairFam
@DamBlairFam 2 ай бұрын
This feels like it should be hardware-based.
@DeanWebbDeveloper
@DeanWebbDeveloper 2 жыл бұрын
Arbitrary dimensional embeddings
@IAMschizoaffective
@IAMschizoaffective 2 жыл бұрын
Nice mohawk I'm rocking one too bro turok for life🧬
@RoboticusMusic
@RoboticusMusic 2 жыл бұрын
I don't get why this wasn't discovered before.
@AlexanderBukh
@AlexanderBukh 2 жыл бұрын
It was, they only built system that finds new and better algorithms.
@TheGenerationGapPodcast
@TheGenerationGapPodcast Жыл бұрын
Exactly, I thought this was what SVD is doing and PCA
@iestynne
@iestynne 2 жыл бұрын
This kind of fine-grained explanation is such a great service - thank you!!
@therealAQ
@therealAQ 2 жыл бұрын
i wouldn't say alphago was not a *real life* application
@realcollinsKE
@realcollinsKE Жыл бұрын
Does this make back propagation faster
@kaeptnbaloo
@kaeptnbaloo 2 жыл бұрын
Getting teached from Niko Bellic about Alpha Tensor is quite exciting.
@enantiodromia
@enantiodromia Жыл бұрын
Your annotations are hardly legible.
@chrismullarkey3181
@chrismullarkey3181 Жыл бұрын
....and who ever said math is boring?
@yilei1051
@yilei1051 2 жыл бұрын
You wanted to explain a1*b1*c4 would be cancelled out, and started from a1*b4*c1, then later changed to a4*b4*c1... and it still worked out, good for you XD
@nohrocks
@nohrocks 2 жыл бұрын
Thank you for making this easy to understand
@blinded6502
@blinded6502 2 жыл бұрын
Understanding matrix multiplication in terms of rows and columns is a bad idea. You should instead realize that each column is a vector stored in a matrix. And when you multiply some vector by matrix, you just get a sum of matrix vectors, where each vector is proportional to the coordinates of the multiplied vector. And matrix by matrix multiplication is just a batch of matrix-vector multiplications. And that's it! It's intuitive as hell
@kinngrimm
@kinngrimm Жыл бұрын
I would have been a bit more interested in actual usecases later on in the real world. (or speculations about them as these may not have been established yet) Things that came to mind formost are prediction models. Flight path calculations to safe fuel including temperature and density of current fligth hight environment, maybe with weather forcast data. Then shape, weight and engines used within a plane/copter and many other factors which may play a role, but i would suppose could overall be improved by this approach of alpha tensors. Then there is space flight with orbital dynamics. Maybe tests of tensile strength objects with different materials or combination of materials? Pressure and dynamic tests within fluids. ... next to endless seem reallife applications as tensors are usable to adress so many things.
@i4m3rr0r
@i4m3rr0r Жыл бұрын
I was wondering a lot why multiplication is actually so important as naturally in hardware multiplying numbers isn't that much worse than adding them (in comparison to for example division). The magic seems to be when we start thinking about big matrices as blocks of smaller matrices, like a 20x20 matrix can be though of as a 4x4 matrix where each entry is a 5x5 matrix. And you can then apply your matrix multiplication algorithms in the same way, but now every multiplication which was a scalar multiplication will become a MATRIX multiplication, whereas an addition will just be matrix addition. And as matrix addition is so much cheaper than multiplication, it is really important to have as few of them as possible. So these savings are relevant in the asymptotic case where we get to multiply two very large matrices and instead of using the n^3 algorithm we split the matrices smartly into block to apply the more optimized algorithms.
@Wobbothe3rd
@Wobbothe3rd 2 жыл бұрын
Thanks for this video! Had just heard of this development but didn't understand it, this explains it well.
@90benj
@90benj 2 жыл бұрын
I am a bit confused whene you talk about rank 1 tensor, because I thought rank 1 tensors are just vectors. Time to brush up my linear algebra 101 30:27 The tensor shown on the left is not a 3D tensor, it is a 4D rank 3 tensor, as far I as understand it. imagine one row there is one vector, so a rank 1 tensor. If a vector has 2 numbers, it's a 2D vector, if it has 3 it's a 3D vector, so a 3D rank 1 tensor. Just because visually it is represented as a 3D object doesn't make it mathematical 3D I think.
@peterszilvasi752
@peterszilvasi752 2 жыл бұрын
"Neural network architecture that they analyze things with here. It is TRANSFORMER based... who would have thought that?"😅
@blanamaxima
@blanamaxima Жыл бұрын
This tricks are typically marginally affecting performance. There are other factors in data-flow that have a huge impact. The 8b multiplier is not the one that consumes energy or space and this techniques make the hw less uniform and kill utilization.
@LouisChiaki
@LouisChiaki 2 жыл бұрын
Holy, already 23k views!
@hichamalaoui8024
@hichamalaoui8024 Жыл бұрын
Instead of getting multiplication of two numbers by converting to binary digits and doing long addition operation , we can define a table for simple numbers multiplication from 1x1 to 9x9
@Qbabxtra
@Qbabxtra Жыл бұрын
why are you wearing sunglasses inside?
@Derrekito
@Derrekito 2 жыл бұрын
This is a fantastic coincidence! I was trying to figure out similar generalized forms you mentioned in the beginning, but from the perspective of a computer architecture data dependency issue several months ago. I shelved the idea because after a few weeks I found it difficult to justify the time to spend on it. This solution might be a much better fit if it really works! I'm so excited to read this paper and start work towards implementing it.
@maloxi1472
@maloxi1472 2 жыл бұрын
Are you a descendant of the great Rolf Landauer ?!
@Derrekito
@Derrekito 2 жыл бұрын
@@maloxi1472 I sometimes fantasize that I have some connection to the Landauer limit. Sadly, I'm unaware of any familial connection. 😅
@SuperAlijoon
@SuperAlijoon 2 жыл бұрын
Just because I like you channel, please do you homework and don’t sacrifice quality for being the first to explain alpha tensor
@AlericResident
@AlericResident Жыл бұрын
Maybe they should use the diffuse algorithm to get the tensor decompositions instead of finding it column by column; that gave a tremendous improvement for making drawings too.
@RobotProctor
@RobotProctor 2 жыл бұрын
27:46 row 1 of U is what we need to look at.
@cmilkau
@cmilkau 2 жыл бұрын
16:00 dude, it's c1 = a1b1 + a2b3, it's written literally under the picture...
@alpers.2123
@alpers.2123 2 жыл бұрын
Next year they will probably generalize AlphaTensor to bunch of other problems. Neural compiler optimizer maybe
@appletree6741
@appletree6741 2 жыл бұрын
You look like Operator Starsky
ChatGPT: This AI has a JAILBREAK?! (Unbelievable AI Progress)
31:55
Yannic Kilcher
Рет қаралды 436 М.
How AI Discovered a Faster Matrix Multiplication Algorithm
13:00
Quanta Magazine
Рет қаралды 1,4 МЛН
Officer Rabbit is so bad. He made Luffy deaf. #funny #supersiblings #comedy
00:18
Funny superhero siblings
Рет қаралды 12 МЛН
Ozoda - Lada (Official Music Video)
06:07
Ozoda
Рет қаралды 14 МЛН
Spongebob ate Michael Jackson 😱 #meme #spongebob #gmod
00:14
Mr. LoLo
Рет қаралды 10 МЛН
iPhone or Chocolate??
00:16
Hungry FAM
Рет қаралды 41 МЛН
Space-Time: The Biggest Problem in Physics
19:42
Quanta Magazine
Рет қаралды 222 М.
I used to hate QR codes. But they're actually genius
35:13
Veritasium
Рет қаралды 105 М.
Why Does Diffusion Work Better than Auto-Regression?
20:18
Algorithmic Simplicity
Рет қаралды 326 М.
Hands-On Power BI Tutorial 📊Beginner to Pro [Full Course] ⚡
3:05:45
Pragmatic Works
Рет қаралды 2,2 МЛН
The Home Server I've Been Wanting
18:14
Hardware Haven
Рет қаралды 166 М.
Will AI Spark the Next Scientific Revolution?
40:01
World Science Festival
Рет қаралды 97 М.
Officer Rabbit is so bad. He made Luffy deaf. #funny #supersiblings #comedy
00:18
Funny superhero siblings
Рет қаралды 12 МЛН