OUTLINE: 0:00 - Intro 1:50 - Sponsor: Assembly AI (link in description) 3:25 - What even is Matrix Multiplication? 6:10 - A very astounding fact 8:45 - Trading multiplications for additions 12:35 - Matrix Multiplication as a Tensor 17:30 - Tensor Decompositions 20:30 - A formal way of finding multiplication algorithms 31:00 - How to formulate this as a game? 39:30 - A brief primer on AlphaZero / MCTS 45:40 - The Results 48:15 - Optimizing for different hardware 52:40 - Expanding fundamental math 53:45 - Summary & Final Comments
@CosmiaNebula2 жыл бұрын
TLDR: ## The problem Given two matrices A, B of size mxn, nxl, find a way to compute AB, such that we do as few scalar multiplications as possible (add and subtract are free). Reformulate the problem to make it amenable to AI: given the 3D "matrix multiplication tensor" T of size (mn)x(nl)x(ml), find a decomposition into T = ∑i ui⊗vi⊗wi, where each ui, vi, wi is a vector with entries from {-1, 0, 1}. If this decomposition has r terms, then we have a corresponding way to compute AB using only r scalar multiplications. ### TensorGame Now create the 1-player TensorGame: given a 3D tensor T, ask it to make moves of the form ui⊗vi⊗wi, where each ui, vi, wi is a vector with entries from {-1, 0, 1}. The state of the game after the move is (T - ui⊗vi⊗wi). The game ends either when the game state is 0, or if the player has played K rounds and thus "ran out of time" (where K is an upper bound to the rank of T). The reward at each step is (-1), to encourage the agent to play as fast as possible. If the agent runs out of time, then the reward for the last step is (-r), where r is an upper bound to the game state in the end. ### Agent The agent is a RL agent "AlphaTensor". It, like AlphaZero, uses a deep neural network to guide a Monte Carlo tree search (MCTS) planning procedure. It is trained on randomly generated 3D tensors constructed by adding randomly generated ∑i ui⊗vi⊗wi. TensorGames are easy to make like that. ## Results After some training, the resulting agent is able to play the TensorGame very well. When applied to matrix multiplication tensors, it found some ways to do better than previous human-found techniques. For example, when A, B, are both 2x2, the previous human-found technique is Strassen algorithm, taking 7 multiplications. The agent replicated it. When A, B are both 4x4, the previous best is 49 multiplications (recursive Strassen algorithm). The agent found a 47 multiplication method. By recursively applying the Strassen algorithm, one can compute (nxn) times (nxn) matrix multiplication in O(n^{log_2(7)}) = O(n^{2.8074}) time. I went through their table of results, and found that the best exponent they have found is the "47 scalar multiplications for 4x4 matrix multiplication". This one achieves asymptotic complexity of O(n^{log_4(47)}) = O(n^{2.777}). Consider what they achieved: It took humans 10 years to go from O(n^{2.8074}) in Strassen (1969) to O(n^{2.780}) in Bini, Capovani, Romani (1979). and they managed to do it by an entirely general RL search. I have no doubt that if they can find a way to cast the Coppersmith-Winograd algorithm as a game, the RL will out-perform it as well. I scoured their papers and couldn't find out how much time it took to train the agent, other than somewhere around 1 million steps?. By Google standards, I guess that means it took them a month. So, 1 month vs 10 years. BUT WAIT THERE"S MORE They also found the optimal algorithm for multiplying a nxn skew-symmetric matrix and a vector. Previous SOTA (2018), it was known to take at most n^2 multiplications, (see "Fast structured matrix computations: tensor rank and Cohn-Umans method"). They found a method taking only 0.5(n-1)(n+2) multiplications. This is obviously the asymptotically optimal number (since a nxn skew-symmetric matrix has ~ 0.5 n^2 different entries). BUT WAIT THERE'S MORE They ran the same algorithm but this time scoring its strategy based not on how many scalar multiplications it takes, but how much time it takes to run on a particular hardware (GPU or TPU). It found improvements as well, and these improvements are specific to GPU or TPU (using the optimized algorithm for GPU on TPU would yield a much lower improvement, and vice versa). > AlphaTensor finds algorithms with a larger number of additions compared with Strassen-square (or equivalently, denser decompositions), but the discovered algorithms generate individual operations that can be efficiently fused by the specific XLA grouping procedure and thus are more tailored towards the compiler stack we use. The algorithms found by AlphaTensor also provide gains on matrix sizes larger than what they were optimized for.
@tlf43542 жыл бұрын
@@CosmiaNebula Thanks for this TL;DR~
@tlf43542 жыл бұрын
@@CosmiaNebula TL;DR 🖥 🧠 🔢 ⏫ 😱
@pouet46082 жыл бұрын
@@CosmiaNebula but are the found algorithms proven afterwards? Because finding an algorithm is half the work. Proving it works, not only by some results but conceptually is the rest. I Don t understand how they can use discovered algorithms if they cannot proof it is true.
@CosmiaNebula2 жыл бұрын
@@pouet4608 verifying the algorithms is trivial algebra
@ztangent31882 жыл бұрын
Actually when the fast matrix multiplication algorithm is applied recursively (to big matrix), not just the multiplication number reduces, so is the addition. This is because each sub-matrix multiplication requires super-linear O(m^(2+eps)) operations, while sub-matrix addition still takes O(m^2). So we are not trading multiplications for faster additions - both operations decrease.
@Neptutron2 жыл бұрын
Yes - this!! Please upvote the OP's comment! It should be emphasized that these improvements are not just constant factors - they improve the asymptotic complexity of matrix multiplication for certain matrix sizes in a practical way!
@nonamehere96582 жыл бұрын
+1, Yannic should've at least mentioned Strassen's recursive matrix multiplications algorithm.
@dr.mikeybee2 жыл бұрын
@@nonamehere9658 Yannick shows it in the paper.
@CosmiaNebula2 жыл бұрын
For example, when A, B, are both 2x2, the previous human-found technique is Strassen algorithm, taking 7 multiplications. The agent replicated it. When A, B are both 4x4, the previous best is 49 multiplications (recursive Strassen algorithm). The agent found a 47 multiplication method. By recursively applying the Strassen algorithm, one can compute (nxn) times (nxn) matrix multiplication in O(n^{log_2(7)}) = O(n^{2.8074}) time. I went through their table of results, and found that the best exponent they have found is the "47 scalar multiplications for 4x4 matrix multiplication". This one achieves asymptotic complexity of O(n^{log_4(47)}) = O(n^{2.777}). Consider what they achieved: It took humans 10 years to go from O(n^{2.8074}) in Strassen (1969) to O(n^{2.780}) in Bini, Capovani, Romani (1979). and they managed to do it by an entirely general RL search. I have no doubt that if they can find a way to cast the Coppersmith-Winograd algorithm as a game, the RL will out-perform it as well.
@blackpepper26102 жыл бұрын
no
@derasor2 жыл бұрын
Could an even more efficient method be discovered by a next iteration of AlphaTensor? That would be truly mindblowing, but if not (meaning there is a hard limit on the most basic operation of machine learning), then there are IMO 2 takes from this achievement. 1- There definitely won't be a fast take off when AgI is implemented, and all the fear-mongering on AI being an eXisTEnTiAL thREat was just the overblown conclusion to a very defficient thought experiment. 2- Hardware engineers have a golden opportunity to design machines that implement this algorithm in the lowest levels.
@muskyoxes Жыл бұрын
Let's train on 3SAT just in case P=NP (although, if it does, seems like we'd have to train a super long time)
@yonikremer41352 жыл бұрын
Did they try finding the algorithem for large metrices?
@gorojo12 жыл бұрын
Why has no one figured this out before? It seems so obvious.
@JamesAwokeKnowing2 жыл бұрын
This is dangerous and you should be ashamed for enabling people to multiply so efficiently. Now any clown can multiply some really harmful things.
@sn0wbr33z32 жыл бұрын
The end is near
@NoNameAtAll22 жыл бұрын
so you got sponsored by voice-to-text company... and your video STILL has no subtitles???
@antopolskiy2 жыл бұрын
thank you, Yannic. I love these paper explaining videos. it is really hard for me to focus on reading papers. You explain the main points very clearly in a fraction of time it would take me to read it.
@LarryPanozzo2 жыл бұрын
We literally don’t have time to read every great paper these days. Summaries like these do save the day.
@DRealHammer2 жыл бұрын
-2 and 2 make sense as you can use a bitshift for the multiplication, so it is basically free and gives a bigger space of possible solutions.
@DRealHammer Жыл бұрын
@@h..h can you reformulate the question?
@AlexanderShamov Жыл бұрын
@@h..h Why not? Just add 1 to the exponent, it's still much faster than general floating point multiplication.
@vladimirtchuiev2218 Жыл бұрын
I guess so is -4, 4, -8, 8 or any n with 2^n and -(2^n). But I guess the capacity to search in this space is more limited.
@DRealHammer Жыл бұрын
@@vladimirtchuiev2218 I think so too! However, my intuition also is, that the benefits of these high factors are not as great as adding the factors 2/-2.
@cmilkau2 жыл бұрын
"A matrix is a square box of numbers" - AlphaTensor can also multiply rectangular matrices, and the paper gives an example. Of course 2x2 and maybe 4x4 matrices are of specific interest because they can be easily generalized and implemented in algorithms for large matrices.
@Zed_Oud2 жыл бұрын
Received October 2021 Published October 2022 A year and 3 days.
@cmilkau2 жыл бұрын
The reason that addition is "cheap" is not so much CPU's, these can do number multiplication in a single cycle, too. It's because the entries are usually matrices themselves (instead of numbers) and adding matrices is just adding their entries. Full story: Its when you use these results to multiply huge matrices. You do this using "block matrix multiplication". For instance, to multiply two 1000x1000 matrices, you can reinterprete these as 2x2 matrices, but each of the eight entries are 500x500 matrices ("blocks") themselves. Obviously, if you use a 2x2 multiplication formula, each multiplication in that formula actually corresponds to multiplying two 500x500 matrices (which you can do recursively applying your formula). The naive formula takes n×n×n multiplications to multiply two n×n matrices, but only n×n additions to add two such matrices. As you can see, this is a huge difference for large n. This difference stays huge even for "clever" versions of multiplication.
@cmilkau2 жыл бұрын
@@leeroyjenkins0 Ok, maybe, but I don't like emphasis put on that because it really isn't as relevant. The matrices that are actually relevant in state-of-the-art ANN have n in the thousands to tens of thousands. A direct Tensor decomposition would be in the order of millions to billions actions. So, the block multiplication case is the real relevant case. Of course, if you want to do actual graphics, 4x4 is your most common size and directly improving that can vastly improve the characteristics of your graphics chip or driver.
@apothum2 жыл бұрын
Production quality on this video was fantastic. For instance, the simple things like the blue background and circle are nice touches.
@BjarkeEbert2 жыл бұрын
One thing I have wondered: we use multiplications because it's mathematically convenient. But what if we use some cheaper operations on the layer inputs, and "just learn" to parametrize them correctly. Who cares if it's exact multiplication - what matters is that the NN gets the job done. The learning could maybe work around the "incorrect but fast" multiplication
@JurekOK2 жыл бұрын
"Analogue computers" do that -- they exploit the non-linearity of silicon diode in the 0..1V region of conduction. As long as you have some nonlinear operation, you can have a neural network. Multiplication happens to be a convenient one for **digital** computers.
@Fordance1002 жыл бұрын
No, you have to go through nn to estimate it which would is composed a bunch of matrix multiplications, end up doing much more work than just doing the matrix multiplication straight.
@appletree67412 жыл бұрын
You would need a “incorrect but fast” backprop as well, it’s not clear to me that backprop still works if you break multiplication
@adityay5251252 жыл бұрын
Thank you Yannic, I missed these older style paper explainers
@vanderkarl39272 жыл бұрын
This almost sounds too good to be true. Improving the speed of matrix multiplication... that basically speeds up EVERYTHING nowadays.
@hellokingston-z7g2 жыл бұрын
The progress is really far smaller than the hype. Those in the theory community are not quite excited because the improvement happens not in real numbers, but in a field called F2, which only contains 0 and 1 and is not used by most people. Putting this restriction aside, the modest improvement is with respect to a result more than 50 years ago..... So why cares. From a practical perspective, the value is even smaller. If you really want to do matrix multiplication in real life, you have to consider numerical stability, memory allocation, and a bunch of other issues, and this is why Strassen algorithm is not used in the first place. So the experiment is really unconvincing to me and those who have experience in high-performance computing. It is definitely a cool result, but really not as big as most people think.
@StoianAtanasov2 жыл бұрын
Does this mean that the small speedups in matrix multiplications compound to huge speedups when running a Deep Network model with hundreds of matrix multiplications?
@PieGuyX10002 жыл бұрын
Oh shiiii…
@mujtabaalam5907 Жыл бұрын
27:40 I think there's a mistake: the modification comes from the 5th column, not the 7th, because we're looking at the tensor where the top row of u is non-zero
@mgostIH2 жыл бұрын
I think it's also important besides the domain they applied it on (matrix multiplication algorithms), since it shows that reinforcement learning via deep learning is strong enough to explore extremely large discrete spaces to optimize. I can expect that in the future most NP-Hard problems we'll want to tackle will be attacked by these approaches rather than heuristics.
@Jiftoo2 жыл бұрын
real
@AufBerghofNAM Жыл бұрын
nphard has incomplete m-sets, but solutions are provable in post
@ScottSummerill2 жыл бұрын
Received October 2, 2021?! Published a year later?
@mildpass2 жыл бұрын
I would have liked to hear your comments on the structured matrix multiplication part of the paper. This is potentially the biggest speedup to linear algebra presented in the paper. Their skew symmetric matrix algorithm beats the state of the art by a factor of 2. Not 2 multiply operations, half as many as the previous best! Unfortunately the structured matrix section of the paper read a bit vague to me. Particularly the "extrapolation" part of the skew symmetric algorithm to larger matrices looked like it was done manually by a human. Would be really cool to see an ML algorithm be able to automate extrapolation of those algorithms. To me that would be the holy grail of AI optimized linear algebra.
@andrewdunbar828 Жыл бұрын
Hey thanks for explaining this in a way I can almost understand. English tip for you in appreciation: "long-winded" is not about winding it's about the wind blowing. Dunno why.
@Obe_Qwaet2 жыл бұрын
I'm actually a little bit mad right now because I proposed doing this during high school and college and was told that's not the correct way to multiply matricies. Seeing an academic paper on it... I'm not sure how to feel atm.
@HUEHUEUHEPony2 жыл бұрын
You had low IQ instructors
@AlexanderBukh2 жыл бұрын
Some faster ways were found decades back, now they only automated finding new ones.
@AlexanderBukh2 жыл бұрын
Also, isn't it always "the wrong way to do it" until someone comes forward and _proves_ them(sayers) wrong?
@kikc Жыл бұрын
If I were you I would show this to every human that ever wronged you and then kill them brutally for costing your future
@rmajdodin Жыл бұрын
Next time you have a great idea, just implement it. People love to see sth working.
@-E42-2 жыл бұрын
amazing how they find out about this now, 1000 trillion matrix multiplications and thousands of math lib implementations later :D
@serta57272 жыл бұрын
This adds up 😉
@khoda812 жыл бұрын
I can already see articles titled "DeepMind made an ai that can make itself faster!".
@rxphi53822 жыл бұрын
As a new CS student this kind of content is very interesting, even tough I don't understand everything. It motivates me to keep learning. Thank you Yannic!
@w花b Жыл бұрын
@Greg LeJacques w-what..?
@janasandeep2 жыл бұрын
"These algorithms multiply large matrices 10-20% faster than the commonly used algorithms on the same hardware" as per deepmind blog.
@isaacandrewdixon2 жыл бұрын
AI accelerating AI. Every day we are closer to the singularity.
@michaelnurse90892 жыл бұрын
Strictly speaking we get closer but the whole idea of singularity is that every step gets easier on the path. Rather the real situation is that we have made 27 steps on a 100 step path and each step gets harder with costs ballooning exponentially in some way. In other words, there will be no singularity with evolution of neural nets as after 32 steps any further steps costs more than nation states can afford.
@isaacandrewdixon2 жыл бұрын
@@michaelnurse9089 We're so close. The rate of progress is rapidly increasing.
@minimath58822 жыл бұрын
Karatsuba's algorithm trades additions with multiplication with numbers back in 1963 and now DeepMind did with it matrices with AlphaTensors in 2022. Amazing
@NoNameAtAll2 Жыл бұрын
analog of karatsuba's algo for matrixes existed for decades already deep mind found even better one
@siquod2 жыл бұрын
How much energy would have been saved if these multiplication algorithms had been discovered and universally implemented 3 years ago?
@robrobbins2 жыл бұрын
I tried to write a Python program to implement their algorithm for multiplying 4x4 matrices in 47 steps but I suspect they left out the subtractions in their algorithm. There were nothing but additions and multiplications which is unusual. The numbers I got were way too high. You can find two of the algorithms they found in the PDF accompanying the article.
@PrzemyslawDolata2 жыл бұрын
The 47-step algorithm is for *modular arithmetic*. In this system all the numbers are "modulo p" for some number p, i.e. the numbers "wrap around". E.g. in modulo 5 arithmetic, 4+4 = 3 (it would be 8, but 8 mod 5 is 3). That's why subtractions aren't necessary - numbers will stay confined to the (0;p-1) range by definition. And that's why numbers blew up when you were doing normal arithmetic.
@NoNameAtAll2 Жыл бұрын
@@PrzemyslawDolata can't one just take really big p?
@MagicGonads Жыл бұрын
@@NoNameAtAll2 I take a really big p sometimes if I drink too much water
@cedricvillani85022 жыл бұрын
The first solution for partial differential equations was a Matrix function that was developed by the founder of MathWorks, in fact the logo of math works is that very matrix 😮😮 better looking 👀 now but go back to the original logo.. and ya fun Snapple Fact 🎉 whatever
@alefratat40182 жыл бұрын
That's great, but I would still bet that clever product quantization algorithms for approximated matmul are the way to go for faster deep learning.
@mailoisback Жыл бұрын
So we trade multiplications with additions? But why are additions faster than multiplications? I think on modern CPUs this is no longer true.
@mohaofa1544 Жыл бұрын
try to multiple 2 numbers in binary system and add two numbers in binary system which one will take more operations to give desire result ? remember computers work in Binary system
@videowatching95762 жыл бұрын
I appreciate the content, so when I feel that I am not having time well spent, then watching and learning about AI on this channel is super useful and interesting.
@drdca82632 жыл бұрын
The action space is larger than for go? Ok, so if each number entry is in {-2,-1,0,1,2} and each vector has n^2 entries, ah, ok, I was initially thinking only n entries, so I was thinking 4 because I was thinking 4x4 matrices. But the vectors representing coefficients of each matrix entry would have n^2 entries. Ok. So, for 2x2 matrices, then there would be (5^(2^2))^3 = (5^4)^3 = 5^12 possible moves, which... is more than 2^24 , which, ok, yeah that’s many more possible moves than there are in Go.
@MagicGonads Жыл бұрын
the general action space is of size 5^(3((nmp)^2)) where we are taking an n x m matrix by an m x p matrix
@nadavmihov2 жыл бұрын
In 28:00 isn't it supposed to be the other collum because you check the first row in U?
@mattiasfagerlund2 жыл бұрын
I thought so too, but by pure luck it worked out.
@paxdriver2 жыл бұрын
Why the hell didn't those geniuses at numpy figure this out years ago?! 😭
@AlexanderBukh2 жыл бұрын
They did, see the table towards the end of the video, the smaller matrices were optimized by people decades ago. This improves larger ones, and by only a few steps. 52:10
@paxdriver2 жыл бұрын
@@AlexanderBukh they did but they didn't find the most optimal ones 😜 47 steps vs 48 is a huge difference when working with hundreds of thousands of parameters across many layers and repeating. I'm surprised they didn't use numpy itself to find more optimizations for numpy. There are only what, like 7 or 11 of them or something, right?
@GandalfTheBrown117 Жыл бұрын
Thank you for your videos. You're the Agadmator of ML/AI.
@geld420 Жыл бұрын
nice, aber hart am schwitzen junge :). mathe is hart.
@Ou8y2k22 жыл бұрын
It's been so many decades that I don't even remember what a matrix is. Is it that movie with John Wick?
@markdaoust45982 жыл бұрын
If you don’t know what matrix multiplication is 😂
@russelldicken99302 жыл бұрын
Er . . . you lost me partway through, but I think I got the gist of it. I'm getting too old for this. The implications are huge. I just can't wait to see the compiler implementations!
@zepho172 жыл бұрын
Does anyone know the pdf reader Yannic is using? I never have enough space on my margins
@stevey7997 Жыл бұрын
Really great video, thanks! Just a small mistake at 27:48. You look at the wrong entry in U (but I see how that easily happens of course).
@airazure20502 жыл бұрын
Will new factorizations found in this paper actually make their own ways into existing optimized BLAS (Basic Linear Algebra Subprograms) libraries?
@mareko89402 жыл бұрын
I don't think so. Matrix multiplication is mainly limited due to data loading as far as I'm aware, but please correct me if I'm wrong
@himanshupalve34542 жыл бұрын
I don't think so since BLAS libraries don't use strassen algorithm for Matrix multiplication
@kobi2187 Жыл бұрын
You'll end up boring the AI... it'll start looking for more interesting games: let's hack this computer, and connect to the global network -- what's that? a nuclear facility in Iran? hmmm... how fast can I spin....
@Alexandre-ur1oi2 жыл бұрын
Your presentation does not take into account computer's architecture. OK, you could make less multiplication with fast matrix multiplication but you have to move more data. Unfortunately, moving data takes times... This is why Strassen algorithm (more than 50 years old) is not currently used.
@wenaolong Жыл бұрын
It's going to make the whole world better off, huh? Well, that's about as naive as you can get.
@nathandfox Жыл бұрын
Unfortunately, this kind of accelerated matrix multiplications that partition across both rows and columns will have poor numerical stability, so they are not that useful in practice.
@drewduncan57742 жыл бұрын
0:11 "I know that sounds a bit boring." This really made me cringe. I understand this is youtube, but at least some of the audience is intelligent enough to understand the importance of this.
@darkmatter95832 жыл бұрын
can it be possible an example calculating manually this method? this is a gamechanger also for students in linear algebra, +is the code published?
@drewduncan57742 жыл бұрын
Is there any mention of numerical stability of the resulting algorithms in the paper? I've only skimmed it, but I couldn't find anything.
@TheSinTube Жыл бұрын
As far as i understand the resulting algorithm is a simple adition and multiplication of the elements in the matices and therefor there shouldn't be any stability issues.
@drewduncan5774 Жыл бұрын
@@TheSinTube There definitely can be. Strassen's two-step algorithm is less stable than the naive algorithm, which is why it is often not used in practice despite being asymptotically faster.
@MagicGonads Жыл бұрын
he mentioned only using coefficients in the set {-2,-1,0,1,2} for numerical stability
@drewduncan5774 Жыл бұрын
@@MagicGonads Strassen's algorithm in fact only uses {-1, 0 ,1}.
@aurielklasovsky1435 Жыл бұрын
AI is improving AI related algorithm? Watch out for the singularity!
@user-ml8ld1kl6b Жыл бұрын
I get it’s cool and amazing, but what are examples of stuff able to be accomplished thru this
@Myrslokstok Жыл бұрын
Cool with a matrix cube, when you see 2D matrix, you think what about a 3D matrix.
@avantan1980 Жыл бұрын
It looks like Fast Fourier Transform (FFT) to me. Anyone can confirm this?
@simdimdim Жыл бұрын
Looking forward to compilers development in 2023
@gobl-analienabductedbyhuma53872 жыл бұрын
1 year to public the stuff?? 1 year the world had to wait for it when it was already ready for prime time?
@cmilkau2 жыл бұрын
The relationship between matrix multiplication and the Tensor T can be written as C = AB = A:T:B. Here, we use double contractions because the "axis" of T corresponding to A (the one indexing its entries a1,a2,a3,a4) is actually two axis (a1=a11, a2=a12, a3=a21, a4=a22) indexing the rows and columns of A, respectively. Same for B. The result has two remaining axis, indexing the rows and columns of the product C. Decomposition means A:T:B = A:(u1 ⊗ v1 ⊗ w1 + u2 ⊗ v2 ⊗ w2 + ...):B = A:(u1 ⊗ v1 ⊗ w1):B + A:(u2 ⊗ v2 ⊗ w2):B + ... Here u,v,w are actually matrices the same size as A,B,C, respectively. Now A:(u ⊗ v ⊗ w):B can actually be computed using a single multiplication. First, compute A:u, which adds/subtracts entries of A (u only has entries 1,-1 and 0). Second, compute v:B, acting in the same way. Multiply the two numbers to obtain the number m = (A:u)(v:B) = A:(u ⊗ v):B. Now you only have to multiply m and w to get m ⊗ w = A:(u ⊗ v ⊗ w):B. Because w also only has entries +1,-1 and 0, m ⊗ w essentially means replacing the +1's in w by m and the -1's by -m.
@AlericResident Жыл бұрын
Error at 27:45 ... its the top row of U, not the bottom row, so column 5 causes the cancel, not column 7.
@miniminerx Жыл бұрын
And now alpha tensor can run faster to find more improvements. That part is crazy
@MenkoDany2 жыл бұрын
What a time to be alive!
@AlexanderBukh2 жыл бұрын
Hold on to your GPUs!
@MenkoDany2 жыл бұрын
@@AlexanderBukh Funny you say that, yesterday I sold my 3080 to get a 4090
@cbuchner12 жыл бұрын
Well great! Now the rabbits will be multiplying even faster.
@fcasadome2 жыл бұрын
Wish I could spend time just "thinking" on stuff like this at my job... and get paid for it :)
@michaelnurse90892 жыл бұрын
There are likely jobs that get paid for this type of work: Speeding up 3D rendering with AI (e.g. DLSS) or speeding up compiler e.g. for Microsoft C# come to mind. Speeding up cloud servers at AWS. Very narrow applications but they exist.
@Anders012 жыл бұрын
That's a great result. I have been thinking of machine learning training other machine learning algorithms, but to improve other algorithms is also an interesting approach which now is shown to be practically possible.
@philippgerbert1390 Жыл бұрын
I guess easy to be confused - such as a1/a4 in min 28...
@damarh2 жыл бұрын
I have no idea what this is what is does and why it has done it and what it means that it has done it.
@DamBlairFam2 ай бұрын
This feels like it should be hardware-based.
@DeanWebbDeveloper2 жыл бұрын
Arbitrary dimensional embeddings
@IAMschizoaffective2 жыл бұрын
Nice mohawk I'm rocking one too bro turok for life🧬
@RoboticusMusic2 жыл бұрын
I don't get why this wasn't discovered before.
@AlexanderBukh2 жыл бұрын
It was, they only built system that finds new and better algorithms.
@TheGenerationGapPodcast Жыл бұрын
Exactly, I thought this was what SVD is doing and PCA
@iestynne2 жыл бұрын
This kind of fine-grained explanation is such a great service - thank you!!
@therealAQ2 жыл бұрын
i wouldn't say alphago was not a *real life* application
@realcollinsKE Жыл бұрын
Does this make back propagation faster
@kaeptnbaloo2 жыл бұрын
Getting teached from Niko Bellic about Alpha Tensor is quite exciting.
@enantiodromia Жыл бұрын
Your annotations are hardly legible.
@chrismullarkey3181 Жыл бұрын
....and who ever said math is boring?
@yilei10512 жыл бұрын
You wanted to explain a1*b1*c4 would be cancelled out, and started from a1*b4*c1, then later changed to a4*b4*c1... and it still worked out, good for you XD
@nohrocks2 жыл бұрын
Thank you for making this easy to understand
@blinded65022 жыл бұрын
Understanding matrix multiplication in terms of rows and columns is a bad idea. You should instead realize that each column is a vector stored in a matrix. And when you multiply some vector by matrix, you just get a sum of matrix vectors, where each vector is proportional to the coordinates of the multiplied vector. And matrix by matrix multiplication is just a batch of matrix-vector multiplications. And that's it! It's intuitive as hell
@kinngrimm Жыл бұрын
I would have been a bit more interested in actual usecases later on in the real world. (or speculations about them as these may not have been established yet) Things that came to mind formost are prediction models. Flight path calculations to safe fuel including temperature and density of current fligth hight environment, maybe with weather forcast data. Then shape, weight and engines used within a plane/copter and many other factors which may play a role, but i would suppose could overall be improved by this approach of alpha tensors. Then there is space flight with orbital dynamics. Maybe tests of tensile strength objects with different materials or combination of materials? Pressure and dynamic tests within fluids. ... next to endless seem reallife applications as tensors are usable to adress so many things.
@i4m3rr0r Жыл бұрын
I was wondering a lot why multiplication is actually so important as naturally in hardware multiplying numbers isn't that much worse than adding them (in comparison to for example division). The magic seems to be when we start thinking about big matrices as blocks of smaller matrices, like a 20x20 matrix can be though of as a 4x4 matrix where each entry is a 5x5 matrix. And you can then apply your matrix multiplication algorithms in the same way, but now every multiplication which was a scalar multiplication will become a MATRIX multiplication, whereas an addition will just be matrix addition. And as matrix addition is so much cheaper than multiplication, it is really important to have as few of them as possible. So these savings are relevant in the asymptotic case where we get to multiply two very large matrices and instead of using the n^3 algorithm we split the matrices smartly into block to apply the more optimized algorithms.
@Wobbothe3rd2 жыл бұрын
Thanks for this video! Had just heard of this development but didn't understand it, this explains it well.
@90benj2 жыл бұрын
I am a bit confused whene you talk about rank 1 tensor, because I thought rank 1 tensors are just vectors. Time to brush up my linear algebra 101 30:27 The tensor shown on the left is not a 3D tensor, it is a 4D rank 3 tensor, as far I as understand it. imagine one row there is one vector, so a rank 1 tensor. If a vector has 2 numbers, it's a 2D vector, if it has 3 it's a 3D vector, so a 3D rank 1 tensor. Just because visually it is represented as a 3D object doesn't make it mathematical 3D I think.
@peterszilvasi7522 жыл бұрын
"Neural network architecture that they analyze things with here. It is TRANSFORMER based... who would have thought that?"😅
@blanamaxima Жыл бұрын
This tricks are typically marginally affecting performance. There are other factors in data-flow that have a huge impact. The 8b multiplier is not the one that consumes energy or space and this techniques make the hw less uniform and kill utilization.
@LouisChiaki2 жыл бұрын
Holy, already 23k views!
@hichamalaoui8024 Жыл бұрын
Instead of getting multiplication of two numbers by converting to binary digits and doing long addition operation , we can define a table for simple numbers multiplication from 1x1 to 9x9
@Qbabxtra Жыл бұрын
why are you wearing sunglasses inside?
@Derrekito2 жыл бұрын
This is a fantastic coincidence! I was trying to figure out similar generalized forms you mentioned in the beginning, but from the perspective of a computer architecture data dependency issue several months ago. I shelved the idea because after a few weeks I found it difficult to justify the time to spend on it. This solution might be a much better fit if it really works! I'm so excited to read this paper and start work towards implementing it.
@maloxi14722 жыл бұрын
Are you a descendant of the great Rolf Landauer ?!
@Derrekito2 жыл бұрын
@@maloxi1472 I sometimes fantasize that I have some connection to the Landauer limit. Sadly, I'm unaware of any familial connection. 😅
@SuperAlijoon2 жыл бұрын
Just because I like you channel, please do you homework and don’t sacrifice quality for being the first to explain alpha tensor
@AlericResident Жыл бұрын
Maybe they should use the diffuse algorithm to get the tensor decompositions instead of finding it column by column; that gave a tremendous improvement for making drawings too.
@RobotProctor2 жыл бұрын
27:46 row 1 of U is what we need to look at.
@cmilkau2 жыл бұрын
16:00 dude, it's c1 = a1b1 + a2b3, it's written literally under the picture...
@alpers.21232 жыл бұрын
Next year they will probably generalize AlphaTensor to bunch of other problems. Neural compiler optimizer maybe