Is the Future of Linear Algebra.. Random?

  Рет қаралды 358,680

Mutual Information

Mutual Information

Күн бұрын

Пікірлер: 524
@charilaosmylonas5046
@charilaosmylonas5046 7 ай бұрын
Great video! I want to add a couple of references to what you mentioned in the video related to neural networks: 1. Ali Rahimi got the Neurips 2017 "test of time" award for a method called - Random kitchen sinks (kernel method with random features). 2. Choromansky (from Google) made a variation of this idea to alleviate the quadratic memory cost of self-attention in transformers (which also works like a charm - I tried it myself, and I'm still perplexed how it didn't become one of the main efficiency improvements for transformers.). Check "retrinking attention with performers". Thank you for the great work on the video - keep them coming please! :)
@howuhh8960
@howuhh8960 7 ай бұрын
it didn't because all efficient variations have significantly worse performance on retrieval tasks (associative recall for example), as all recent papers demonstrated
@Arithryka
@Arithryka 7 ай бұрын
The Quadratic Memory Cost of Self-Attention in Transformers is my new band name
@theo1103
@theo1103 6 ай бұрын
Is this a similar idea compared with the latent space in the transformer?
@hyperplano
@hyperplano 6 ай бұрын
Rahimi got the award for the "Random Features for Large-Scale Kernel Machines" paper, not the random kitchen sinks one
@rileyjohnmurray7568
@rileyjohnmurray7568 6 ай бұрын
@@howuhh8960 do you have specific references for this claim? I'm not doubting you, I'm just really interested in learning more, and the literature is vast.
@octavianova1300
@octavianova1300 7 ай бұрын
reminds me of that episode of veggie tales when larry was like "in the future, linear algebra will be randomly generated!"
@NoNameAtAll2
@NoNameAtAll2 7 ай бұрын
W E E D E A T E R
@rileymurray7437
@rileymurray7437 7 ай бұрын
Reminds you of what???
@jedediahjehoshaphat
@jedediahjehoshaphat 7 ай бұрын
xD
@vyrsh0
@vyrsh0 7 ай бұрын
I thought it would be some nice science show, but it turns out to be some kids show : (
@notsojharedtroll23
@notsojharedtroll23 7 ай бұрын
​@@rileymurray7437 he means this video: kzbin.info/www/bejne/oGWzmWNonN-ko7ssi=wb2atwfoSQaefrjL
@BJ52091
@BJ52091 7 ай бұрын
As a mathematician specializing in probability and random processes, I approve this message. N thumbs up where N ranges between 2.01 and 1.99 with 99% confidence!
@Mutual_Information
@Mutual_Information 7 ай бұрын
Great to have you here!
@purungo
@purungo 7 ай бұрын
So you're saying there's a 1 chance in roughly 10^16300 that you're giving him 3 thumbs up...
@frankjohnson123
@frankjohnson123 7 ай бұрын
My brother in Christ, use a discrete probability distribution.
@nile6076
@nile6076 7 ай бұрын
Only if you assume a normal distribution! ​@@purungo
@sylv512
@sylv512 7 ай бұрын
Is this just one big late april fool's? What the hell
@laurenwrubleski7204
@laurenwrubleski7204 7 ай бұрын
As a developer at AMD I feel somewhat obligated to note we have an equivalent to cuBLAS called rocBLAS, as well as an interface layer hipBLAS designed to compile code to make use of either AMD or NVIDIA GPUs.
@sucim
@sucim 7 ай бұрын
but can your cards train imagenet without crashing?
@389martijn
@389martijn 7 ай бұрын
​@@sucimsheeeeeeeeesh
@johnisdoe
@johnisdoe 7 ай бұрын
Are you guys hiring?
@Zoragna
@Zoragna 7 ай бұрын
OP forgot about BLAS being a standard so most implementations have been forgotten, it's weird to point at Nvidia
@cannaroe1213
@cannaroe1213 7 ай бұрын
As an AMD customer who recently bought a 6950XT for €600, I am disappointed to learn rocBLAS is not supported on my outdated 2 year old hardware.
@TimL_
@TimL_ 7 ай бұрын
The part about matrix multiplication reminded me of studying cache hit and miss patterns in university. Interesting video.
@piedepew
@piedepew 3 ай бұрын
Dynamic programming question
@charlesloeffler333
@charlesloeffler333 7 ай бұрын
Another tidbit about LinPack: One of its major strengths at the time it was written was that all of its double precision algorithms were truly double precision. At that time other packages often had double precision calculations hidden within the single precision routines where as their double precision counter parts did not have quad-precision parts anywhere inside. The LinPack folks were extraordinarily concerned about numerical precision in all routines. It was a great package. It also provided the basis for Matlab
@ILoveTinfoilHats
@ILoveTinfoilHats 3 ай бұрын
And it's so good at using CPU resources as optimally as possible that Intel used it for stress and stability testing their CPUs for years (and still do to some degree AFAIK)
@scottmiller2591
@scottmiller2591 7 ай бұрын
Brunton, Kutz et al. in the paper you mentioned here "Randomized Matrix Decompositions using R," recommended in their paper using Nathan Halko's algo, developed at the CU Math department. B&K give some timing data, but the time and memory complexity were already computed by Halko, and he had implemented it in MATLAB for his paper - B&K ported it to R. Halko's paper from 2009 "FINDING STRUCTURE WITH RANDOMNESS: STOCHASTIC ALGORITHMS FOR CONSTRUCTING APPROXIMATE MATRIX DECOMPOSITIONS" laid this all out 7 years before the first draft of the B&K paper you referenced. Halko's office was a mile down the road from me at that time, and I implemented Python and R code based on his work (it was used in medical products, and my employer didn't let us publish). It does work quite well.
@Mutual_Information
@Mutual_Information 7 ай бұрын
Very cool! The more I researched this, the more I realized the subject was deeper (older too) than I had realized with the first few papers I read. It's interest to hear your on-the-ground experience of it, and I'm glad the video got your attention.
@ajarivas72
@ajarivas72 7 ай бұрын
@@Mutual_Information Has anyone tried genetic algorithms instead of purely random approches? In my experience, genetic algorithms are 100 faster than Monte Carlo simulations to obtain an optimum.
@skn123
@skn123 6 ай бұрын
Halko's algorithm helped me start my understanding of Laplacian eigenmaps and other dimensionality reduction methods.
@danielsantiagoaguilatorres9973
@danielsantiagoaguilatorres9973 7 ай бұрын
I'm writing a paper on a related topic. Didn't know about many of these papers, thanks for sharing! I really enjoyed your video
@richardyim8914
@richardyim8914 7 ай бұрын
Golub and Van Loan’s textbook is goated. I loved studying and learning numerical linear algebra for the first time in undergrad.
@pietheijn-vo1gt
@pietheijn-vo1gt 7 ай бұрын
I have seen a very similar idea in compressed sensing. In compressed sensing we also use a randomized sampling matrix, because the errors can be considered as white noise. We can then use a denoising algorithm to recover the original data. In fact I know Philips MRI machines use this technique to speed up scans, because you have to take less pictures. Fascinating
@tamineabderrahmane248
@tamineabderrahmane248 7 ай бұрын
random sampling to reconstruct the signal
@pietheijn-vo1gt
@pietheijn-vo1gt 7 ай бұрын
@@tamineabderrahmane248... what?
@MrLonelyrager
@MrLonelyrager 7 ай бұрын
Compressed sensing is also useful for wireless comunications. I studied its usage for sampling ultra wideband signals and indoor positioning. It only works accurately under certain sparsity assumptions. In MRI scans , their "fourier transform" can be considered sparse, then we can use l1 denoising algorithms to recover the original signal.
@pietheijn-vo1gt
@pietheijn-vo1gt 7 ай бұрын
@@MrLonelyrager yes correct, that's exactly what I used. In the form of ISTA (iterative shrinkage and thresholding) algorithms and its many (deep-learning) derivatives
@pr0crastinatr
@pr0crastinatr 7 ай бұрын
Another neat explanation for why the randomized least-squares problem works is the Johnson-Lindenstrauss lemma. That lemma states that most vectors don't change length a lot when you multiply them by a random gaussian matrix, so the norm of S(Ax - b) is within (1-eps) to (1+eps) of the norm of Ax-b with high probability.
@makapaka8247
@makapaka8247 7 ай бұрын
I'm finally far enough in education to see how well made your stuff is. Super excited to see a new one from you. Thanks for expanding people's horizons!
@Mutual_Information
@Mutual_Information 7 ай бұрын
Glad to have you watching!
@deltaranged
@deltaranged 7 ай бұрын
It feels like this video was made to match my exact interests LOL I've been interested in NLA for a while now, and I've recently studied more "traditional" randomized algorithms in uni for combinatorial tasks (e.g. Karger's Min-cut). It's interesting to see how they've recently made ways to combine the 2 paradigms. I'm excited to see where this field goes. Thanks for the video and for introducing me to the topic!
@Rockyzach88
@Rockyzach88 7 ай бұрын
KZbin has you in its palms. _laughs maniacally_
@Sino12
@Sino12 7 ай бұрын
where do you study?
@noahgsolomon
@noahgsolomon 7 ай бұрын
You discussed all the priors incredibly well. I didn’t even understand the premise of random in this context and now I leave with a lot more. Keep it up man ur videos are the bomb
@mgostIH
@mgostIH 7 ай бұрын
I started reading this paper when you mentioned it on Twitter, forgot it was you who I got it from and was now so happy to see a video about it!
@Mutual_Information
@Mutual_Information 7 ай бұрын
Yes! And good to see you here mgost
@yeetmaster8050
@yeetmaster8050 24 күн бұрын
As a published computer scientist, your videos are awesome. Appreciate the honesty that you caveat details on topics you're not 100% sure on
@Mutual_Information
@Mutual_Information 23 күн бұрын
Getting the experts to approve is the standard I aim for, so it means a lot to hear this from you - thank you!
@bluearctik3980
@bluearctik3980 7 ай бұрын
My first thought was "this is like journal club with DJ"! Great stuff - well researched and crisply delivered. More of this, if you please.
@charlesity
@charlesity 7 ай бұрын
As always this is BRILLIANT. I started following your videos since I saw the GP regression video. Great content! Thank you very much.
@firefoxmetzger9063
@firefoxmetzger9063 6 ай бұрын
I realize that YT comments are not the best place to explain "complex" ideas, but here it goes anyway: The head bending relative difference piece reply is "just" a coordinate transformation. At 29:45, you lay ellipses atop each other and show the absolute approximation difference between the full sample and the sketch. The "trick" is to realize that this happens in the common (base) coordinate system and that nothing stops you from changing this coordinate system. For example, you can (a) move the origin to the centroid of the sketch, (b) rotate so that X and Y align with the semi-axis of the sketch, and (c) scale (asymmetrically) so that the sketches semi-axis have length 1. What happens to the ellipsoid of the full sample in this "sketch space"? Two things happen when plotting in the new coordinate system: (1) the ellipsoid of the sketch becomes a circle around the origin (semi-axes are all 1) by construction. (2) the ellipsoid of the full sample becomes an "almost" circle depending on the quality of the approximation of the full sample by the sketch. As sample size increases, centroids converges, semi-axes start aligning, and (importantly) semi-axes get stretched/squashed until they reach length 1. Again, this is for the full sample - the sketch is already "a perfect circle by construction". In other words, as we increase the sample size of the sketch the full sample looks more and more like a unit circle in "sketch space". We can now quantify the quality of the approximation using the ratio of the full sample's semi-axis in "sketch space". If there are no relative errors (perfect approximation), these become the ratio of radii of a circle which is always 1. Any other number is due to (relative) approximation error, lower is better, and it can't be less than 1. The claim now is that, even for small samples, this ratio is already low enough for practical use, i.e., sketches with just 10 points already yield good results.
@firefoxmetzger9063
@firefoxmetzger9063 6 ай бұрын
If you understand the above, then the high-dimensional part becomes clear as well: In N dimensions a "hyper-ellipsoid" has N semi-axes, and the claim is that for real (aka. sparse) problems some of these semi-axes are really large and some are really small when measured in "problem space". This relationship applied to the 2D ellipsis you show at 29:45 means that the primary axis becomes really large (stretches beyond the screen size) and the secondary axis becomes really small (squished until the border lines touch each other due to line thickness). This will make the ellipsis plot "degenerate" and it will look like a line - which is boring to visualize.
@technoguyx
@technoguyx 3 ай бұрын
Thanks for taking the time to type this, it's clear(er) to me now.
@Apophlegmatis
@Apophlegmatis 6 ай бұрын
The nice thing is, with continuous systems (and everything in experienced life is continuous) the question is not "is it linear," but "on what scale is it functionally linear," which makes calculations of highly complex situations much simpler.
@Mutual_Information
@Mutual_Information 6 ай бұрын
YES!
@woosix7735
@woosix7735 5 ай бұрын
what about the Weierstrass function that isn't linear on any scale?
@Apophlegmatis
@Apophlegmatis 5 ай бұрын
That is an excellent example as to why engineering uses approximations - we can closely model on a given scale any system using known functions. I did not know about that function before though, that's super interesting!
@pavlopanasiuk7297
@pavlopanasiuk7297 Ай бұрын
​@@woosix7735 that isn't practical so far. What is practical though is phase transition evaluation, there you cannot approximate linearly
@grewech
@grewech 3 ай бұрын
Was looking for a nerdy video to fall asleep to, but couldn’t take my eyes off the screen. Excellent presentation and very well done video!
@AntonTkachuk-s1s
@AntonTkachuk-s1s 6 ай бұрын
I used one time random matrices for eigenvalue counts on intervals and it was amazing! Di Napoli, E., Polizzi, E., & Saad, Y. (2016). Efficient estimation of eigenvalue counts in an interval. Numerical Linear Algebra with Applications, 23(4), 674-692.
@marcegger7411
@marcegger7411 7 ай бұрын
Damn... your videos are getting beyond excellent!
@ernestoherreralegorreta137
@ernestoherreralegorreta137 7 ай бұрын
Amazing explanation of a complex topic! You've got yourself a new subscriber.
@Mutual_Information
@Mutual_Information 7 ай бұрын
Glad to have you!
@GeorgeDole
@GeorgeDole Ай бұрын
Bravo! ! As an LA math teacher and Linear Algebra student in college, you confirmed why children need to learn Algebra-1 with the infamous Quadratic formula to understand how Linear Algebra works and is necessary to understand A.I. Please do more Linear Algebra videos for high school and college students and other interested lay people. Many Thanks.
@zyansheep
@zyansheep 7 ай бұрын
Dang, I absolutely love videos and articles that summarize the latest in a field of research and explain the concepts well!
@scottmiller2591
@scottmiller2591 7 ай бұрын
This was a nice walk down memory lane for me, and a good introduction to the beginner. It's nice to see SWE getting interested in these techniques, which have a very long history (like solving finite elements with diffusion decades ago, and compressed sensing). I enjoyed your video. A few notes: It's useful to note that "random" projections started out as Gaussian, but it turns out very simple, in-memory, transformations let you use binary random numbers at high speed with little to no loss of accuracy. I think you had this in mind when talking about the random matrix S in sketch-and-solve. BLAS sounds like blast, but without the t. I'm sure there's people who pronounce it like blahs. Software engineers mangle the pronunciation of everything, including other SWE packages, looking at you, Ubuntu users. However the first pronunciation is the pronunciation I have always heard in the applied linear algebra field. FORTRAN doesn't end like "fortune," but rather ends with "tran," but maybe people pronounce "fortran" (uncapitalized) that way these days - IDK (see note above re: mangling; FORTRAN has been decapitalized since I started working with it). Cholesky starts with a hard "K" sound, which is the only pronunciation you'll ever hear in NLA and linear algebra. It certainly is the way Cholesky pronounced it. Me, I always pronounce Numpy to sound like lumpy just to tweak people, even though I know better ☺. I've always pronounced CQRRPT as "corrupt," too, but because that's what the acronym looks like (my eyes are bad). One way to explain how these work intuitively is to look at a PCA, similar to what you touched on with the illustration of covariance. If you know the rank is low, then there will be, say, k large PCA directions, and the rest will be small. If you perform random projection on the data, those large directions will almost certainly show up in your projections, with the remaining PCA directions certainly being no bigger than they were originally (projection is always non-expanding). This means the random projections will still contain large components of the strong PCA directions, and you only need to make sure you took enough random projections to avoid being unlucky enough to accidentally be very nearly normal with the strong PCA directions every time. The odds of you being unlucky go down with every random projection you add. You'd have to be very unlucky to take a photo of a stick from random directions, and have every photo of the stick be taken end-on. In most photos, it will look like a stick, not a point. Similarly, taking a photo of a piece of paper from random directions will look like a distorted rectangle, not a line segment It's one case where the curse of dimensionality is actually working in your favor - several random projections almost guarantees they won't all be projections to an object that's the thickness of the paper. I've been writing randomized algos for a long time (I have had arguments w engineers about how random SVD couldn't possibly work!), and love seeing random linear algebra libraries that are open and unit tested. I agree with your summary - a good algorithm is worth far more than good hardware. Looking forward to you tracking new developments in the future.
@Mutual_Information
@Mutual_Information 7 ай бұрын
This is the real test of a video. When an expert watches it and, with some small corrections, agrees that it gets the bulk of the message right. It's a reason I try to roll in an subject matter expert where I can. So I'm quite happy to have covered the topic appropriately in your view. (It's also a relief!) And I also wish I had thought of the analogy: "You'd have to be very unlucky to take a photo of a stick from random directions, and have every photo of the stick be taken end-on. In most photos, it will look like a stick, not a point." I would have included that if I had thought of it!
@scottmiller2591
@scottmiller2591 7 ай бұрын
@@Mutual_Information Agree absolutely!
@rileyjohnmurray7568
@rileyjohnmurray7568 7 ай бұрын
Jim Demmel and Jack Dongarra pronounced it "blahs" the last time I spoke with each of them. (~This morning and one month ago, respectively.) 😉
@Mutual_Information
@Mutual_Information 7 ай бұрын
@@rileyjohnmurray7568 lol
@scottmiller2591
@scottmiller2591 7 ай бұрын
@@rileyjohnmurray7568 I hope they perk up ☺
@aleksszukovskis2074
@aleksszukovskis2074 7 ай бұрын
its always a pleasure to watch this channel
@wiktorzdrojewski890
@wiktorzdrojewski890 7 ай бұрын
this feels like a good presentation topic for numerical methods seminar
@nikita_x44
@nikita_x44 7 ай бұрын
linearity @ 4:43 is diffirent linearity. linear functions in the sense of linear algebra must always pass through (0,0)
@sufyanali3992
@sufyanali3992 7 ай бұрын
I thought so too, the 2D line shown on the right is an affine function, not a linear function in the rigorous sense.
@KepleroGT
@KepleroGT 7 ай бұрын
Yep, otherwise the linearity of addition and multiplication which he just skipped over wouldn't apply and thus wouldn't be linear functions, or rather the correct term is linear map/transformation. Example: F(x,y,z) = (2x+y, 3y, z+5), (0,0,0) = F(0,0,0) is incorrect because F(0,0,0) = (0,0,5). The intent is to preserve the linearity of these operations so they can be applied similarly. If I want 2+2 or 2*2 I can't have 5
@Dagobah359
@Dagobah359 7 ай бұрын
3:03 Linear algebra professor, here. Please stop teaching that it's the rows of matrices which are vectors. Yes, both rows and columns of matrices correspond to vectors in separate vector spaces, but when they don't have the full picture yet, beginning students should be thinking of the columns of the matrix as 'the' vectors. I've had to spend so much work fixing the perspective of engineers in their masters program who only think of the rows as vectors. It's much easier to broaden a student's perspective from columns to also rows, than it is to broaden their perspective from rows to also columns.
@rileyjohnmurray7568
@rileyjohnmurray7568 6 ай бұрын
Thanks for sharing this perspective! I've heard something similar from a professor when I did my PhD, and I generally agree with it. That said, I think introducing row-wise is not so bad *in the specific context of this video.* It seems like the natural thing to do if we want to compare scalar-valued nonlinear functions to scalar-valued linear functions. So if you're in a time crunch and you need to explain the concept of linearity in one minute (and with few equations), then this approach seems not so bad.
@glad_asg
@glad_asg 5 ай бұрын
idk, seems like skill issue.
@user-vr3ex1fj7c
@user-vr3ex1fj7c Ай бұрын
Being a comp sci undergrad, i felt like this vid was exceptionally explained. Really gives me interest in diving deeper in the topic. Thanks!
@StratosFair
@StratosFair 7 ай бұрын
As a grad student in theoretical machine learning, I have to say i'm blown away by the quality of your content, please keep videos like these coming !
@plfreeman111
@plfreeman111 5 ай бұрын
"And if you aren't, you're probably doing something wrong." So very very true. Don't roll your own NLA code. You won't get it right and it certainly won't be faster. The corollary is "If you're inverting a matrix, you're probably doing something wrong." But that's a different problem I have to solve with newbies.
@mohamedalmahditantaoui8422
@mohamedalmahditantaoui8422 5 ай бұрын
I think you made the best numerical linear algebra in the world, we really need more content like this. Keep up the good work.
@DawnOfTheComputer
@DawnOfTheComputer 7 ай бұрын
The math presentation and explanation alone was worth a sub, let alone the interesting topic.
@austincarter2177
@austincarter2177 3 ай бұрын
I love how you explain things with no assumptions, but also don’t assume we know nothing. You let us walk in the field without getting lost in the weeds🤙
@ЕрмаханСабыржан
@ЕрмаханСабыржан 7 ай бұрын
it's really mind-blowing how random numbers can achieve something such fast
@lbgstzockt8493
@lbgstzockt8493 7 ай бұрын
Very good video on a very interesting topic. Who would have thought that there is this much to gain in such a commonly used piece of mathematics.
@damondanieli
@damondanieli 7 ай бұрын
Great video! One thing: “processor registers” not “registries”
@Mutual_Information
@Mutual_Information 7 ай бұрын
I know.. lol damn it
@jamesedwards6173
@jamesedwards6173 7 ай бұрын
lol, I caught that same thing.
@Otakutaru
@Otakutaru 7 ай бұрын
Adequate density of new information, and sublime narrative. Also, on point visuals
@JoeBurnett
@JoeBurnett 7 ай бұрын
You are an amazing teacher! Thank you for explaining the topic in this manner. It really motivates me to continue learning about all things linear algebra!
@chazhovnanian6897
@chazhovnanian6897 5 ай бұрын
you've GOT to post more, this stuff is amazing, im still in high school but learning about so-called 'mature' processes which become completely revolutionised really inspires me, thanks for this :)
@oceannuclear
@oceannuclear 7 ай бұрын
Oh my god, this forms a small part of my PhD thesis where I've been trying to understand LAPACK's advantage/disadvantage when it comes to inverting matrices. Having this video really helps me put things into contex! Thank you very much for making this!
@AjaniTea
@AjaniTea 6 ай бұрын
This is a world class video. Thanks for posting this and keep it up!
@WhiteGandalfs
@WhiteGandalfs 7 ай бұрын
Let me try to phrase it for people who have no math degree education, but rather engineering level: You effectively select the best fitting equations of the linear problem which is originally highly overdefined for your x vector to sufficiently represent the complete system with a small subset of the original equations. - correct? That's not directly "inducing random noise" but rather a simplification by omission of probably irrelevant equations. This reminds me of how we did such a scheme for a "bundle block adjustment" application: We used the drastic performance boost from simplification to do multiple simple bba within each reaction step of the system with different drastically simplified subsets from the data, to then compare the results with the expected outcome (low rest error, good alignment with the continuation of the coordinates of our x vector from the previous step), then performing a final selection based on those outcomes and then performing a final error minimizing solving with those perfectly selected equations. That gives the best from both worlds: Speed up but without sacrificing correctness. And there is no magic at all (and no "introduced random noise"). Just a "try simple" first iteration, then based on that a selected final iteration. Basically engineering optimization based on working standard linear algebra systems.
@peterhall6656
@peterhall6656 Ай бұрын
It is decades since I delved into this stuff. You have excited me to crawl out from under the rock and read the papers.
@hozaifas4811
@hozaifas4811 7 ай бұрын
We need more content creators like you ❤
@Mutual_Information
@Mutual_Information 7 ай бұрын
Thank you. These videos take awhile, so I wish I could upload more. But I'm confident I'll be doing KZbin for a long time.
@hozaifas4811
@hozaifas4811 7 ай бұрын
@@Mutual_Information Well ,This news made my day !
@AMA_RILDO
@AMA_RILDO Ай бұрын
I don’t know how long it took for you to create this video with fancy slides and examples, but everything is so well explained that I wish I had you as my teacher
@h.b.1285
@h.b.1285 7 ай бұрын
Excellent video! This topic is not easy for the layperson (admittedly, the layperson that likes Linear Algebra), but it was clearly and very well structured.
@tanithrosenbaum
@tanithrosenbaum 7 ай бұрын
"They're quite good" - Understatement of the decade 😄
@antiguarocks
@antiguarocks 6 ай бұрын
Reminds me of what my high school maths teacher said about being able to assess product quality on a production line with high accuracy by only sampling a few percent of the product items.
@bn8ws
@bn8ws 7 ай бұрын
Outstanding content, instant sub. Keep up the good work!
@aminelahlou1272
@aminelahlou1272 7 ай бұрын
As hobbyist mathematician, please, don’t say that f(x) is a function or worse : a linear function. f(x) is a number in most cases you described. f on the other hand is a function and f can be linear
@aminelahlou1272
@aminelahlou1272 7 ай бұрын
f(x) can be a matrix or even a function (that is what we call in computer programming higher order function) but I don’t think that was the intended message
@the_master_of_cramp
@the_master_of_cramp 7 ай бұрын
Great and clear video! Makes me wanna study more numerical LA...combined with probability theory because it shows how likely inefficient many algorithms use currently are, and that randomized algorithms are usually insanely much faster, while being approximately correct. So those randomized algorithms basically can be used anywhere when we don't need to be 100% sure about the result (which is basically always, because our mathematical models are only approximations of what's going on in the world and thus are inaccurate anyways and as you mentioned, if data is used, it's noisy).
@mohammedbelgoumri
@mohammedbelgoumri 7 ай бұрын
No better way to start the day than with an MI upload 🥳
@Mutual_Information
@Mutual_Information 7 ай бұрын
Thank you, love hearing that!
@gaussology
@gaussology 7 ай бұрын
Wow, so much research went into this! It makes me even more motivated to read papers and produce videos 😀
@MachineLearningStreetTalk
@MachineLearningStreetTalk 7 ай бұрын
Great video brother! 😍
@Mutual_Information
@Mutual_Information 7 ай бұрын
Thank you MLST! You're among a rare bunch providing non-hyped or otherwise crazy takes on AI/ML, so it means a lot coming from you.
@jondor654
@jondor654 7 ай бұрын
Lovely type, great clarity .
@robharwood3538
@robharwood3538 7 ай бұрын
Even just the history section of this video is *incredibly* valuable, IMHO. Thank you so much!
@ShivaTD420
@ShivaTD420 7 ай бұрын
If you take white noise. And put a filter on it. You can produce every note, because every tone and semi tone is in the noise.
@Francis-gg4rn
@Francis-gg4rn 3 ай бұрын
this channel is GOLD, please keep it up we love you
@Ohmriginal722
@Ohmriginal722 7 ай бұрын
Whenever randomness is involved you got me wanting to use Analogue processors for fast and low-power processing
@braineaterzombie3981
@braineaterzombie3981 7 ай бұрын
This is exactly what i needed. Subscribed
@DocM221
@DocM221 7 ай бұрын
I've been through some basic linear algebra courses, but really the covariance problem struck me as one obviousness to a statician. A statician would never go and sample everybody, they would first determine how accurate they needed to be in their certainty, and then go about sampling exactly the number of people that satisfies that equation. I actually had to do this in my job! I can totally see how this will be a great tool used with data prediction and maybe hardware accelerators to make MASSIVE gains. We are in for a huge wild ride! Thanks for the video!
@cupidonihack
@cupidonihack Ай бұрын
why did i not found your channel before ! some channel steal my view time, but yours get me 10 years younger discovering a new domain :) i feel i m going to buy some books and papers again ! thank you!
@CamiKite
@CamiKite Ай бұрын
I'm really impressed to see a real game-changer in such an old and mature domain. I guess it won't take time before we have random optimized NPU on our devices
@jliby1708
@jliby1708 4 ай бұрын
Masterful explaination, going through the math and providing high-level abstractions of concepts. Really helps seeing how someone could invent a major discovery.thanks
@Stephen_Kelley
@Stephen_Kelley 7 ай бұрын
Excellent video, really well paced.
@piyushkumbhare5969
@piyushkumbhare5969 7 ай бұрын
This is a really well made video, nice!
@Duviann
@Duviann 7 ай бұрын
the quality on this editing is top notch, congratulations!!!
@robmorgan1214
@robmorgan1214 7 ай бұрын
Of course. This isn't a surprise. I've been using these techniques for optimization for a long time. Simulated annealing was proven (decades ago) to scale better than many optimization algorithms. If your big O is bigger than Sim annealing, use sim annealing! Always calculate your big O and THEN measure your implementation to make sure you hit it. Same thing goes for your error... and controlling that can blow out your big O and that's data not algorithm dependent! ALWAYS MEASURE! If you have to pre sort before accumulating to minimize error you are not going to hit your scaling numbers and you're going to murder your cache and memory pipelining. The key with that 1/e term is to recall that floating point math is going to accumulate rounding errors at a precision of about 0.1-1.0 in 1M. This sets your floor and the sensitivity of your eigenvalues ( if they vary by more than about one part in 1M, your answers will be dominated by errors, so you take the hit and use doubles). This kind of stuff used to be explicitly covered in scientific computing classes when resources were limited and the hardware was MUCH less complex. It's interesting that this complexity has managed to hide potential optimizations of order 20-1000 x. But it makes sense, in order to use the HW efficiently you need to be an expert in so many things that the problems you're actually trying to solve becomes something of an afterthought and resources allocation in universities and other organizations focused on numerical methods face the pressures of silos and hyperspecialization. Conaway's law strikes again, as our software matches the organizational structures that create it.
@modernsolutions6631
@modernsolutions6631 6 ай бұрын
simulated annealing is about something else entirely as it's a black box optimisation problem. You sound a bit unhinged. 😢
@robmorgan1214
@robmorgan1214 6 ай бұрын
@@modernsolutions6631 I've got a PhD and have been using this technique to solve or accelerate various problems like this since I was a student. The ORIGIN of simulated annealing is metropolis hastings, where you try accelerating the integration of a stiff differential equation by adjusting the range of the rejection interval in a rapidly changing zone of the equation. If you adjust this on the fly algorithmically and familiarize yourself with the mathematical properties of the logistic distribution you got simulated annealing. This is a similar process to how they approach solving many problems in courses on convex optimization by reframing the form of the problem. This is a useful but unnecessary step. In this case they are exploiting their ability to do a "fast" step along with NlogN scaling instead of doing N^3 calculations where the mismatch in the scale of variou eigenvalues can lead to error accumulation. In the guess and check approach you don't accumulate error at the same rate so it can lead to faster solutions at higher precision with less polishing. Long story short its the same stuff as sim annealing... just seen from a different vantage, like solvig a problem using duality.
@AlexGarel-xr9ri
@AlexGarel-xr9ri 7 ай бұрын
Incredible video with very good animations and script. Thank you !
@pedroteran5885
@pedroteran5885 5 ай бұрын
I love how Volker Strassen did things so different from each other.
@maxheadrom3088
@maxheadrom3088 6 ай бұрын
Nice video! Nice channel! The complicated part isn't multiplying ... it's inverting!
@wafikiri_
@wafikiri_ 7 ай бұрын
The first program I fed a computer was one I wrote in FORTRAN IV. It almost exhausted the memory capacity of the IBM machine, which was about 30 KBytes for the user (it used memory overloads, which we'd call banked memory today, in order to not exceed the available memory for programs).
@pythonguytube
@pythonguytube 7 ай бұрын
Worth pointing out that there is a modern sparse linear algebra package called GraphBLAS, that can be used not just for graphs (which generalize to sparse matrices) but also to any sparse matrix multiplication operation.
@tiwiatg2186
@tiwiatg2186 6 ай бұрын
Loving it loving it loving it!! Amazing video, amazing topic 👏
@razeo7068
@razeo7068 5 ай бұрын
Amazing video. Had me hooked from start to finish. You gained a new subscriber
@jonmichaelgalindo
@jonmichaelgalindo 7 ай бұрын
"Rasterizing triangles to pixels--gone." I was like, "Unreal's not using triangles???" LOL but it was just a very confusingly worded statement.
@RepChris
@RepChris 6 ай бұрын
Of course i get this in my recommended a few days after my first numerical analysis lecture
@RepChris
@RepChris 6 ай бұрын
Which is a course i picked up (its semi-required) since it seems like a very useful thing to understand properly, even though i am not the best at advanced linear algebra and have PTSD from a previous professor and get a visceral reaction every time i see an epsilon, both of which are integral to most of the course
@Mutual_Information
@Mutual_Information 6 ай бұрын
Well I hope math KZbin serves as a bit of PTSD therapy. I hope a shit professor doesn't get the way of you enjoying a good thing.
@cannaroe1213
@cannaroe1213 7 ай бұрын
Nearly 7 years ago when I was still a practicing geneticist, sequenced DNA would usually only be a few nucleotides long, maybe 50, and it would have to get mapped to a genome with billions of possible locations to test. The fastest algorithms ended up being used in the most published papers, so competition was pretty fierce to be the fastest. The gold standard was a deterministic program called BWA/Bowtie, but just before I left the field a new breed of non-deterministic aligners with mapping times orders of magnitude faster were developed, and it really split opinions. Different deterministic programs would give different results (i.e. they had noise/error too, even if they're consistent about it), so in many ways who cared if a program gave different results every time you ran it, particularly if you only intend to run it once... But there were other problems. You couldn't create definitive analyses anymore, you couldn't retrace someone else's steps, you couldn't rely on checksums, total nightmare. The "hidden structures" aspect of the paper was interesting, the structures are in the data, and how the algorithm interacts with the data, which as the programmer you don't have access to by definition - but you also kinda know all you need to know about it too. It feels very similar to making a good meme.
@HelloWorlds__JTS
@HelloWorlds__JTS 6 ай бұрын
Phenomenal! But I have one correction for (25:33): Full rank isn't restricted to square [invertible] matrices, it just means rank = min(m,n) rather than rank = k < min(m,n).
@chunheichau7947
@chunheichau7947 6 ай бұрын
You brought up a good point though. Maybe the current limitation is not the speed of compute; rather, it is the speed of data transfer.
@moisesbessalle
@moisesbessalle 7 ай бұрын
Amazing video!
@johannguentherprzewalski
@johannguentherprzewalski 7 ай бұрын
Very interesting content! I did find that the video felt longer than expected. I was intrigued by the thumbnail and the promise of at least 10x speed improvement. However, it took quite a while to get to the papers and even longer to get to the explanation. The history definitely deserves its own video and most chapters could be much shorter.
@from_my_desk
@from_my_desk 7 ай бұрын
thanks a ton! this was eye-opening 😊
@billbez7465
@billbez7465 6 ай бұрын
Amazing video with great presentation. Thank you
@culan_SCP
@culan_SCP 7 ай бұрын
NEW MATH UPDATE JUST DROPPED
@Mutual_Information
@Mutual_Information 7 ай бұрын
lol
@Alexander-pk1tu
@Alexander-pk1tu Ай бұрын
you are very talented. I like your videos a lot. Keep up the hard work man!
@KipIngram
@KipIngram 7 ай бұрын
Fascinating. Thanks very much for filling us then on this.
@dr.gordontaub1702
@dr.gordontaub1702 Ай бұрын
So weird when I see a video about such high level math, and start looking for and find reference to people I know. I TAed for Nathan Kutz when I was in grad school and am currently reading Nathan's and Steven's book.
@Mutual_Information
@Mutual_Information Ай бұрын
That must've been a cool experience. Excellent teachers, and their book is top quality. I was similarly surprised to see their names. Once you niche down on a technical topic, you start running into the same names frequently. I guess they both have a taste for randomization
@iamr0b0tx
@iamr0b0tx 7 ай бұрын
This is a really good video 💯
@wcdcio
@wcdcio 7 ай бұрын
wow this is an amazingly well produced and scripted video and delivered perfectly, how long did it take you to plan and execute it?
@Mutual_Information
@Mutual_Information 7 ай бұрын
I was working on it since November, mostly on the weekends and sometimes in the evenings. I'd guess it took me over 150 hours. The stages are reading research, script writing, creating the on screen animations, re-writing the script with feedback (e.g. from Riley here), shooting the video, editing it, adding music, cleaning it up, sharing the video for feedback. It takes a lot longer than I like to admit.
@JHillMD
@JHillMD 6 ай бұрын
What a terrific video and channel. Great work! Subbed.
@JonathanPlasse
@JonathanPlasse 7 ай бұрын
Awesome presentation, thank you!
@mohammadalaaelghamry8010
@mohammadalaaelghamry8010 5 ай бұрын
Great video, very useful and very well presented. Thank you.
@sherifffruitfly
@sherifffruitfly 7 ай бұрын
That's cool as hell - thanks! 1) an interesting thing you didn't address/answer: why is data generally expected to contain so much redundancy, that a "small" subset suffices for 2) seems like LLM/NN would be the place randNLA evangelists would want to go. If they can convert the drivers of LLM/NN to randNLA, pretty much everyone else will likely follow
@Mutual_Information
@Mutual_Information 7 ай бұрын
Glad you like it. To your points: 1) Yea, good Q. If A is m-by-n, then solving Ax = b (special case: distance to b is zero) only requires n rows of A (assumes it's full rank). So you could say, the extra m - n rows are redundant. So necessarily, A has a lot of redundancy. 2) Yes! I'm sure the researchers are thinking of this. But things get tricky (it's hard to prove theorems) when you go from the pure linear algebra questions to the messy and wild west of NNs and LLMs.
@cahdoge
@cahdoge 7 ай бұрын
@@Mutual_Information the way I understand it now: The redundancy is a result of using the method of least squares to compute a function that describes the result of your matrix multiplication and using computers to calculate them. Since it's a type of regression, it is already an approximation. If you use a subset it gets faster but also becomes les precise. Next thing is, your computer has an upper limit for precision, so as long as you choose a subset that gives results within the limit you are fine. The tricky bit is finding a way to choose the subset and optimizing your error to be as close to the computers "natural" one as possible.
@rr00676
@rr00676 7 ай бұрын
I've been hoping some advances in probabilistic numerics and random matrix theory bring PGM's some love. Computing matmuls/inverses every iteration of MCMC makes me sad :(. As expected, great video!
@novadea1643
@novadea1643 2 ай бұрын
Very nice video and it's indeed a very interesting promising direction for many applications where it doesn't matter if it's not exact as long as the answer is correct to an acceptable error margin, I especially like the start with UE5 because games and especially graphics has been one area where using randomness and shortcuts to get "close enough fast enough" has always been a priority. It'd be absolutely amazing to have a RandNLA library with basically a "Speed Accuracy" slider.
@Mutual_Information
@Mutual_Information 2 ай бұрын
I'm rooting for it. It may be awhile but considering the gains, I suspect it must arrive eventually.
New Breakthrough on a 90-year-old Telephone Question
28:45
Eric Rowland
Рет қаралды 170 М.
How AI Discovered a Faster Matrix Multiplication Algorithm
13:00
Quanta Magazine
Рет қаралды 1,5 МЛН
The Ultimate Sausage Prank! Watch Their Reactions 😂🌭 #Unexpected
00:17
La La Life Shorts
Рет қаралды 7 МЛН
Amazing remote control#devil  #lilith #funny #shorts
00:30
Devil Lilith
Рет қаралды 16 МЛН
What happens at the Boundary of Computation?
14:59
Mutual Information
Рет қаралды 69 М.
The Strange Physics Principle That Shapes Reality
32:44
Veritasium
Рет қаралды 6 МЛН
The Concept So Much of Modern Math is Built On | Compactness
20:47
Morphocular
Рет қаралды 439 М.
The Most Important Material Ever Made
22:23
Veritasium
Рет қаралды 3,4 МЛН
How on Earth does ^.?$|^(..+?)\1+$ produce primes?
18:37
Stand-up Maths
Рет қаралды 408 М.
Why 4d geometry makes me sad
29:42
3Blue1Brown
Рет қаралды 916 М.
AI can't cross this line and we don't know why.
24:07
Welch Labs
Рет қаралды 1,3 МЛН
Math's Fundamental Flaw
34:00
Veritasium
Рет қаралды 27 МЛН
The Ultimate Sausage Prank! Watch Their Reactions 😂🌭 #Unexpected
00:17
La La Life Shorts
Рет қаралды 7 МЛН