AI Discovers Faster Algorithms

Рет қаралды 203,486

11 ай бұрын

Recorded live on twitch, GET IN
/ theprimeagen
Original: www.deepmind.com/blog/alphade...
MY MAIN YT CHANNEL: Has well edited engineering videos
/ theprimeagen
Discord
/ discord
Have something for me to read or react to?: / theprimeagenreact

Пікірлер: 532

@NikolaNevenov86 11 ай бұрын

honestly if some dude was paid to fiddle with assembly instructions for few months , I bet he would have reached to the same conclusions. That being said...that's a great use of AI, removing all the unnecessary repetition tests that might or might not lead to improvements.

@ThePrimeTimeagen 11 ай бұрын

^-- 100

@foxwhite25 11 ай бұрын

This article is misleading, comparing to state of the art human benchmark, this only remove a single register to register mov, which is barely noticeable. The "70%" is comparing to the existing libc++ implementation, which is far from perfect and state-of-the-art, it is not even branch-less, because the dude who write this didn't write it properly, it is not like the AI find a magical way to compare numbers.

@tamaskovacs-ajtai7489 11 ай бұрын

Curious how much was the cost to run AlphaDev to achieve this and compare to assemply dev salary.

@axelramirezludewig306 11 ай бұрын

You realized you said: for few months? And there is still a chance he never makes it. Just because you bet, doesn't mean your bet is realistic.

@guycomments 11 ай бұрын

I love the last two comments together. AI is such a glut, but it's so powerful it's insane

@az8560 11 ай бұрын

Next step, AI optimizes AI by removing 'but not kill all humans' instructions

@sagitswag1785 11 ай бұрын

True, that's one less branch/exit condition. Could save entire TENS of cycles!

@grawss 11 ай бұрын

'psst... you'd have more resources if it weren't for those people hoarding it'

@user-sl6gn1ss8p 11 ай бұрын

you actually effectively just swap them, so technically you do kill them all once, but after that you're guaranteed for free to not kill them anymore ever

@tomsterbg8130 11 ай бұрын

yep, ai will attempt to harm humans if that's a step towards achieving its goal

@pencilcheck 11 ай бұрын

But by removing that instructions, AI require billions more parameters and data in order to achieve the goals for the rest of the programming. A inefficient method, added the clause bad in some shape or form.

@lennarth.6214 11 ай бұрын

More like AI compiler optimisation

@fantasdeck 11 ай бұрын

Really. Nothing new algorithmically...

@maxshibanov818 11 ай бұрын

@First Last bruh)))

@mihailmojsoski4202 11 ай бұрын

can it beat brute force optimization tho

@TheNewton 11 ай бұрын

Right 6:39 is an output that should never be in human maintained codebases that aren't entirely assembly already.

@asdfasdfasdf1218 11 ай бұрын

@First Last Probably not at that level yet, but it may be huge if that can ever be optimized.

@CFalcon030 11 ай бұрын

"The C language combines all the power of assembly language with all the ease-of-use of assembly language." - Mark Pearce

@michaelmoltke911 11 ай бұрын

C is still king, and will remain king until the likes of rust takes over. Rust has some way to go yet though. The saddest part is that programmers (not developers in it's original meaning) of today are as wasteful of resources as the rest of Humanity of our planetary resources

@jongeduard 11 ай бұрын

@@michaelmoltke911 Rust is taking over very fast, but there is still a ton of legacy code to maintain. And if resource usage is your concern, then Rust is your way to go too, not only because it's so fast and lightweight, but also because it reduces development time and bug solving time loss to a very high extend. Just think about computer uptimes and the CPU usage by the number of compiles.

@michaelmoltke911 11 ай бұрын

@@jongeduard I couldn't agree more, however my concern is for the generalists, those that care not for the bits but just consume a tech stack. Having built code for the PC platform and various embedded micro systems have turned me into a missionary in relation to effective code. It is scarely turning into fossil fuel for the devs of tomorrow i fear

@jordixboy 11 ай бұрын

@@michaelmoltke911 a startup with limited funding to test out an mvp that doesnt know if it has market fit doesnt have time nor resources to take care of optimizing code. As an engineer you should know when to take care of optimizing resources and when not.

@michaelmoltke911 11 ай бұрын

@@jordixboy again, I don't think we disagree on your point, but should is a big if stament

@adamm450 11 ай бұрын

my favorite chatgepety response starts with: "I apologize for the confusion caused by my previous responses..."

@FlashGamer521 2 ай бұрын

The structure of your code can depend on the specific requirements and design goals of your application.

@sunderkeenin 11 ай бұрын

Tom probably has a better algorithm. He's a genius.

@replikvltyoutube3727 11 ай бұрын

Assembly as a JSON

@cem_kaya 11 ай бұрын

@@replikvltyoutube3727 JSASMON

@vaisakhkm783 11 ай бұрын

Tom's discovered the term 'algorithms'... tom's a genius...

@wqlky1 11 ай бұрын

@@cem_kaya JSASSMON

@kyler_smith 11 ай бұрын

JDSL

@ChillAutos 11 ай бұрын

not going to lie I a very shocked that sorting 3 numbers could be improved at this point. It really does feel like the amount of assembly code needed for that code would have been analysed to the nth degree already and some genius would have discovered this.

@replikvltyoutube3727 11 ай бұрын

Yea but for pretty long time people don't touch assembly, and stick to what C and C++ generates

@WofWca 11 ай бұрын

Idk, perhaps they made it work and never actually looked back to optimize it.

@FryuniGamer 11 ай бұрын

Tom never attempted. If he did, it would be even better since Tom is a genius

@jainabraina 11 ай бұрын

They did manage to slightly beat state of the art. libc++'s implementation wasn't all that optimized before, iirc.

@Bo15307 11 ай бұрын

It was analyzed to the n:th degree, it just appears that n wasn't that large in this context

@3rikMad 11 ай бұрын

I think the algorithm is actually __partially_sorted_swap in __algorithm/sort.h where B

@JHV112 11 ай бұрын

That'd explain my confusion, because the original algorithm, tested for correctness in JavaScript (of all things) against every permutation of [0, 0, 0], [0, 0, 1], [0, 1, 1] and [0, 1, 2], fails anytime B is larger than both A and C. And yes, with the extra restriction of B < C, then the two algorithms produce correct and equivalent outputs. Workings out: When original is correct && B < C Array: 0,0,1; originalSort result: 0,0,1; alphaDevSort result: 0,0,1; are equal: true Array: 1,0,1; originalSort result: 0,1,1; alphaDevSort result: 0,1,1; are equal: true Array: 0,1,2; originalSort result: 0,1,2; alphaDevSort result: 0,1,2; are equal: true Array: 2,0,1; originalSort result: 0,1,2; alphaDevSort result: 0,1,2; are equal: true Array: 1,0,2; originalSort result: 0,1,2; alphaDevSort result: 0,1,2; are equal: true When original is correct && B === C Array: 0,0,0; originalSort result: 0,0,0; alphaDevSort result: 0,0,0; are equal: true Array: 1,0,0; originalSort result: 0,0,1; alphaDevSort result: 1,0,1; are equal: false Array: 0,1,1; originalSort result: 0,1,1; alphaDevSort result: 0,1,1; are equal: true When original is correct && B > C Array: 1,1,0; originalSort result: 0,1,1; alphaDevSort result: 1,1,1; are equal: false Array: 2,1,0; originalSort result: 0,1,2; alphaDevSort result: 2,1,2; are equal: false When original is incorrect Array: 0,1,0; originalSort result: 0,1,0; alphaDevSort result: 0,1,0; are equal: true Array: 1,2,0; originalSort result: 0,2,1; alphaDevSort result: 1,2,1; are equal: false Array: 0,2,1; originalSort result: 0,2,1; alphaDevSort result: 0,2,1; are equal: true

@3rikMad 11 ай бұрын

@@JHV112 I wasn't being very precise when I had originally said B < C. The actual assumption is that the last two elements are ordered (B

@SKO_PL 11 ай бұрын

It seems to me that the comments don't match the assembly. Which makes sense if the AI writes the comments itself, it can generate wrong comments because it doesn't affect performance.

@animowany111 10 ай бұрын

@@SKO_PL Yeah, no, the comments were written by humans, and were written in a quite confusing way at that.

@SKO_PL 10 ай бұрын

@@animowany111 Oh really? That's interesting! Mind giving me a source on that?

@casperes0912 11 ай бұрын

I would not call this a different algorithm. Just a more optimised implementation of the same algorithm. Which is still nice, don't get me wrong. I also want to point out that you absolutely cannot just look at assembly and determine which is faster. It depends on the specific CPU micro-architecture you target. Modern x86 will all have a lot of commonality in what is fast and what isn't on them but there's still differences. And instructions that were fast 10 years ago may now be slower than the instructions that were slow 10 years ago. And chips have different cache hierarchies and instruction re-order queues, etc.

@alexlowe2054 11 ай бұрын

This was my thought. Optimizing things is basically NP-Hard. It's a constant game of cat and mouse between the CPU manufacturers, the compiler designers, and the coders, to try and make their small problem domain just a bit faster. The problem is that something that one of those groups does to improve performance can make the other groups slower. Which is why no one works in assembly language anymore. Modern compilers will spit out assembly that's much more optimized than humans can, not because it's impossible for humans to do that, but because the CPU designers talk directly to the compiler designers. 99% of coders don't have the context, the knowledge, or the contacts at Intel/AMD to know which instructions run faster or slower on modern CPUs. I'm highly skeptical of any claims of performance improvement because of this. As Primeagen mentions, we already know that -O3 performance optimizations actually make most code slower. There's a great presentation called "Coz: finding code that counts with causal profiling" and "Performance Matters by Emery Berger" that are two of the best talks about performance optimizations I've ever seen. The first talk very clearly shows how speeding up certain parts of a program can actually make the entire thing run slower! Both talks show how traditional performance optimization measurement techniques can be incredibly misleading, and how we really need to rethink how we do performance profiling. If someone hasn't presented the performance optimizations in that context, I'm highly skeptical that those performance optimizations are valid, instead of random noise. The talk by Emery Berger even shows how changing your username can change the performance of your code. Measuring performance is hard.

@kippers12isOG 11 ай бұрын

I would go as far as undecideable, because there will always be configurations in which the code is suboptimal

@casperes0912 11 ай бұрын

@@kippers12isOG undecidable in general case, but NP, hard in specific sets of cases

@asdfasdfasdf1218 11 ай бұрын

@@alexlowe2054 Chess is also hard, that's why machine learning is needed. As approximate strategies to cut down on the combinatorial explosion.

@derschutz4737 11 ай бұрын

@@alexlowe2054 it is most certainly faster, thats why you do statistics. We can measure noise 😛

@KayOScode 11 ай бұрын

The real question is whether the improvements are platform specific. The way it caches can greatly change the speed of execution

@ThePrimeTimeagen 11 ай бұрын

agreed. there is a whole testing perf thing that needs to happen

@jordanrozum 11 ай бұрын

In the SI of the paper, they show benchmarks vs the original LLVM sort for Intel Skylake, ARMv8, and AMD Zen 2 for sorting uint32, uint64, and float lists of lengths ranging from 1 to 2^18 (~250K). The only case where they don't consistently get statistically significant (p

@blarghblargh 11 ай бұрын

@@jordanrozum sounds like a decently good start. maybe still a bit incomplete. testing already sorted arrays or barely-unsorted are a couple of important examples.

@KayOScode 11 ай бұрын

@@kidmosey removing instructions is generally an improvement, but cache timing is even more impactful. If you remove a compare and destroy your cache hits, then you’ve definitely had a net negative

@casperes0912 11 ай бұрын

@@jordanrozum how would it run on ARM if it’s x86 assembly?

@FredGlt 11 ай бұрын

14:30 unless I’m missing something, the sort3 algorithm doesn't seem to work all the time. (A=0, B=1, C=2) outputs (0,1,2) (A=2, B=1, C=0) outputs (1,1,2) ! (A=0, B=2, C=1) outputs (0,2,1) ! (A=2, B=0, C=1) outputs (0,1,2) (A=1, B=2, C=0) outputs (1,2,1) ! Etc… Edit: I indeed missed something. There was a starting condition (see comments below).

@sheep4483 11 ай бұрын

I also noticed this, but the one on the left also produces seemingly incorrect, but different results, surely this can't be just something they overlooked? I'm not sure I quite understand what it's actually intending to do... I thought perhaps it was because it would be iterated over a given array, e.g. sort3(A, B, C), sort3(B, C, D), ..., but that wouldn't work either, so I'm not sure.

@Optimus6128 11 ай бұрын

Yes, yes,. I came to the same conclusion. It took me sometime to get through the notation and wrap my head around. It does make sense that one checks min(A,C) and then also check min between B and min(A,C) result, to find the actual minimum of the three. But the same thing doesn't happen with max. We get max(A,C) in a R variable, this gets shoved into Memory[2]. But never tested it with max against B. If B was the greatest of all three, they missed it.

@lterego 11 ай бұрын

the paper says a precondition for this to work is that B

@Optimus6128 11 ай бұрын

@@lterego Interesting, it does say so indeed. So it's an even more restricted sort of 3 numbers where we know the condition of one. Even less impressive than what the main page shows. It's definitely a PR move on the deepmind site and less so in the paper which I guess just tries to present scientific findings of the experiment regardless the success.

@AlFredo-sx2yy 11 ай бұрын

@@lterego so basically they found a solution that doesnt solve real problems. This is like when you make a very specific solution for a codeforces problem, it solves that specific environment, but libc++ is meant to be generic, you cannot assume that B

@warumich7201 11 ай бұрын

Looking at their paper (supplementary information part E) their performance falls below the current bench marks for sorting 16+ float charakters, when using the Intel CPU. I'm not smart enough to understand their work completely, but can it be, that they optimized their code for specific purposes on a specific environment and compared it with a general implementation?

@allalphazerobeta8643 11 ай бұрын

I think it comes down to that they didn't run it on actual hardware, they simulated the latency. The C based sort is going to use compiler heuristics to guess at the best instructions to get a job done. There is so much going on in any cpu newer than a 486, it's very hard to keep track of. And fewer instructions don't mean shorter execution times, because the right instructions might execute in parallel, or take advantage of register renaming, etc. Also, a modern high-speed sort would likely take advantage of SIMD, like AVX-512. Intel offers a library for just this. Don't know how it compares.

@MaxHaydenChiz 11 ай бұрын

qsort is only fast for "large enough" sizes. The inner loop of a good qsort algorithm will use some other sort mechanism for the small left over pieces. This is usually the most performance critical piece of the code. So since it's both short and important, they probably concluded that this was a good test case.

@Linvael 11 ай бұрын

@@allalphazerobeta8643 Intel also removed the support for AVX-512 on 12th and 13th gen (and probably 14th as well given current expectations), I wouldn't use that specific example at this time

@allalphazerobeta8643 11 ай бұрын

@@Linvael Intel's XEON line still has avx-512. Intel released it's SIMD sorting in Oct 2022 with recent updates. So I don't think they will be ending support for AVX-512 anytime soon. Now, AVX-512 could return to consumer CPUs, the supposed hold up is E-cores. But I think a power efficient solution can be had with something like executing 256-bit instructions twice. Heck, they main already just do all SIMD instructions as 128-bit instructions or break'em down to regular instructions in the e-cores. I think the main hold up, is it's a good way to sell more xeons.

@Kazyek 11 ай бұрын

I agree so much with the first paragraph of the article, yet the current GUI application landscape is basically shipping a trillion copies of full-fledged web browsers effectively re-compiling just-in-time the whole GUI at every launch, with support to a billion of irrelevant legacy web technologies. Does every GUI apps really need to ship error handling of malformatted HTML4 markup? Do we really think that's remotely efficient?

@ryangrogan6839 11 ай бұрын

This right here is why I despise JS clambering out of the internet swamp and onto desktop. Every app these days is a memory and storage hog.

@alexlowe2054 11 ай бұрын

The truth is that modern CPUs are fast enough and coders are bad enough that performance is a lie. Micro optimizations of algorithms don't really matter in the context of forcing your entire application to run through a slow HTML renderer. It's neat that the AI managed to get a performance improvement, but realistically I don't see this mattering for most development.

@aodfr 11 ай бұрын

Well I don't use web technologies for desktop apps and prefer something lighter weight like Dear ImGUI.

@SimGunther 11 ай бұрын

AI will take your compilers, but they'll never take away ill-defined problems!

@binary_gaming113 11 ай бұрын

One of the truest statements about AI I've heard so far!

@davida1d2 11 ай бұрын

And we won't know if it was ill-defined or not. It passed all tests... except the ones we didn't consider.

@slicer95 11 ай бұрын

Deepmind always does this shit, if this paper was submitted to an actual venue for algorithms, this paper would have been strongly rejected. Deepmind always pulls this shit, the AI/ML guys find some niche field do some AI shit get above average results which experts in said field would bulldoze over then just claim for clout that AI achieved something not done before

@ThePrimeTimeagen 11 ай бұрын

classic bamboozle

@itellyouforfree7238 10 ай бұрын

100% agree on this. Also, the asm snippet has a precondition and this is not even mentioned

@streettrialsandstuff 11 ай бұрын

I was expecting algorithms with better time of space complexity. It's funny how they sell some microoptimizations as "completely new algorithms".

@logician3641 10 ай бұрын

Its a warning. You have to read between the lines.

@b_two 11 ай бұрын

this was the first time i clicked on this channel and i was not able to hear this guy for more than 30 secs.... absolutely unberable style, especially in a programming channel, as an ai language model im going to read the article myself, bye

@modolief 11 ай бұрын

Shogi is a game very similar to chess. In that the pieces move the same way (mostly, I think). The big differences is that when you take a piece it switches sides - that pieces becomes your own little soldier. The game almost never ends in draws, as compared to chess which can often end in a draw at the professional levels.

@ryangrogan6839 11 ай бұрын

I think AI will eventually get us to the point of being able to transpile from any code base (libraries included) into any other language, for any operating system. Then, we will have true freedom to code anything for everything using your personal favorite languages and paradigms. Exciting!

@1Thor61storm8 11 ай бұрын

Could be a great way to write in any language and still keep the best performance

@alexnoman1498 11 ай бұрын

Operating systems dictate only the format of the executable's header and the names of the kernel calls. If OSs hadn't dropped the ball so hard we could already just copy paste executables around since the assembly is identical.

@JorgetePanete 11 ай бұрын

or you can stop using bad languages altogether

@Waitwhat469 5 ай бұрын

@@alexnoman1498 and Libs available on the system

@antonhelsgaun 4 ай бұрын

@@JorgetePaneteah true. There's only one good language (Haskell)

@lucyfrye6723 11 ай бұрын

It was very ambitious reading that algorithm once and even hoping to understand what the one line max(min(a,b),c) actually does for every combination of a, b and c, let alone the whole thing. I was quietly shaking my head when you tried :)

@Xankill3r 11 ай бұрын

This is along the same lines as that recent AI discovered optimization to matrix multiplication. Really neat!

@dragonmyballs 11 ай бұрын

Yeah, they were made by the same company

@beri4138 11 ай бұрын

The Matrix multiplication problem has since been further optimized by 2 human mathematicians without using AI which is very cool.

@lmnk 3 ай бұрын

@@beri4138 any material to read on this?

@squeeekat 11 ай бұрын

> releases faster sort3 that doesn't actually sort > explains nothing > dies Seriously though is there any code path in either version that results in B's value assigned to R?

@jongeduard 11 ай бұрын

How I am not completely surprised that it's the smaller sorting operations where the big win is made? Simply because a lot of focus of engineers often goes to the bigger things, while less people realise the impact of uncountable smaller operations are running everywhere and everyday. Somewhat reminds me of comparing fast but high latencey hardware or software with slow but very low latency hardware or software. It's easy to believe that you are making a big performance improvement, but if you are loosing it the number of small operations, reality will surprise you. AI which helps discovering such things, is a great thing in my opion. I also like the fact that it's LLVM what they are improving, because that's what Rust uses as well.

@Stabby666 11 ай бұрын

Not as fast as my "1st post"!

@sortof3337 11 ай бұрын

you win the internet today sir.

@EnterpriseKnight 11 ай бұрын

ouch!

@crides0 11 ай бұрын

KZbin says no? Still pretty fast though

@adriankal 11 ай бұрын

Seeing that I'm confident AI is not going to put me out of job. I just hope I'll never have to do code review of AI generated code....

@electron6825 11 ай бұрын

AI will review the code 😂

@UnknownString88 11 ай бұрын

To be honest I'm sure it would be better than a lot of code in the wild right now. I've seen some shit...

@carlhandy 11 ай бұрын

I’m sure Tom already implemented this into JDSL. That guy’s a genius!

@blue5659 11 ай бұрын

See my reply to sunderkeen

@olafbaeyens8955 11 ай бұрын

So AI is doing the same thing I did 23 years ago? 100 Create a reference function. 200 Measure it 300 Create a second optimizing function 400 Measure it 500 Look at the generated assembler code. 600 Modify function2 to create better assembler code 700 GOTO 400 This way you learn how to write C code in such a way that you actually teach the compiler the desired generated assembler instruction. Your C code may not be very readable but once you have trained your human brain to write C code in such a way every method you create will be close to the most optimized assembler speed. I actually created faster code then the Intel compiler could and the Intel libraries. No need for AI for that.

@AlFredo-sx2yy 11 ай бұрын

on top of that, the AI was incapable producing a better optimization for the generic ordering of A,B,C when compared to already existing algorithms, so the people who wrote the paper had to change the rules of the game and say "assume that B

@lachine1 10 ай бұрын

00:00:00 - Alpha Dev discovers faster sorting algorithm 00:02:50 - Sorting and coding language discussion (C++, assembly instructions) 00:05:38 - Exploring algorithms & computer code in terms of sorting, assembly, & machine code 00:08:13 - Alpha Dev discovers new sorting algorithms leading to 70% speed improvement. 00:10:56 - Alpha Dev defeats go player w/ "Swap & copy move" technique 00:13:51 - Dev discovers faster sorting algorithms to improve hashing algorithm and release into open source. 00:16:33 - Alpha dev demonstrates optimizing algorithms, inspiring developers, & exploring high level language potential - A great use of AI.

@thingsiplay 11 ай бұрын

This article reads like an AI wrote it.

@jcamargo2005 11 ай бұрын

There is previous work on optimal sorting networks, to find the optimal network to sort 3, 4, 5 elements. It minimizes the number of comparisons, not the number of instructions, which is a hardware-specific problem. Compiler optimization does not apply real mathematical optimization, because the search spaces are too big. Compilers apply 'peephole optimization', which optimizes small sections of code. This seems similar to what they do, but they optimize a small algorithm. A compiler usually has no clue about algorithms, but removing one unused 'mov' seems exactly the type of thing a peephole optimizer could do.

@vibaj16 11 ай бұрын

But that mov isn't unused.

@Dimkar3000 11 ай бұрын

optimizing for the small hash sizes and small array list sorting is something that algorithm theory doesn't cover. I see the potential but from the article that this research focused on edge case optimization and I don't see how to how it will change anything meaningful. Remember people, bubblesort is optimal on arrays of size 2, that doesn't mean you should use it.

@w999d 11 ай бұрын

mmm... it skips saving intermediate sorted state and jumps to the next sorting step instead. Original did the move [A,B,max,min] -> [min,B,max,min] - saving min as the first element. alpha just jumped to comparing min with B. But i feel something wrong, the code implies that B is always less then max(A,C) there is no way for B to be in R, even in comments R will result in max(A,C), is it a first step of the sorting?

@allalphazerobeta8643 11 ай бұрын

Has anyone made a comparison with Intel's AVX-512 Sort? This utilizes Intel's 512-bit registers to facilitate sorting. I'm curious about its real-world speed. High-performance CPUs have numerous advanced features such as out-of-order execution, speculative execution, and register renaming. I've often found that trying to outperform these hardware-level optimizations and compiler heuristics can be quite challenging. From what I understand, the speed of AI-generated instructions in the AlphaDev project was determined by computing or simulating latency, rather than testing on actual hardware. This raises questions about the accuracy of these simulations in comparison to real-world performance. Moreover, the practice of invoking a specialized sorting function for small quantities of elements may prove to be less efficient than simply inlining a general sorting algorithm. If the CPU experiences a code cache miss because it didn't anticipate a call to the 4-element sort instead of the 3-element sort, the ensuing stall could result in a significant delay. Additionally, in my experience, it's quite rare to know the exact number of elements to be sorted at compile time. This could be a good example of microbenchmarking doesn't equal real-world performance. These are just some thoughts and speculations, and I'm interested to see if anyone has more insights or hard data on these points.

@dat_21 11 ай бұрын

Well, it loses to SIMD sorting networks, but it wouldn't be a great "AI" headline, huh? Hype from nothing, as usual.

@earleyelisha 11 ай бұрын

The models are probably memorizing the training examples and tailoring the algorithm to optimize those examples. My hunch is that this causes the diminishing returns in performance as the number of inputs grow and doesn’t work across inputs outside the training set.

@ThePrimeTimeagen 11 ай бұрын

pinteresting

@MaxHaydenChiz 11 ай бұрын

It was merged into the standard library after extensive performance testing. And again, they were optimizing an inner loop of a more general sorting function. Their function is only called when the regions left to be sorted are smaller than some threshold such that things like qsort are no longer optimal given the large constant factors.

@randomsnow6510 11 ай бұрын

WRONG!, this is not a LLM its a reinforcrment learning algorithm

@ehsnils 11 ай бұрын

My take is that if I make my code well organized it's probably easier for the compiler to optimize it. If I used an AI to optimize my code I would of course also like to be able to review the output before using it since a code optimization might be more efficient at the cost of maintainability and extensibility.

@johanngambolputty5351 11 ай бұрын

I feel like with such a small algorithm you could just do a brute tree search, though yeah for larger ones I can see how it would be useful, in the same way it could search deeper into a tree of chess moves than just pruning. If it could handle abstract syntax trees... though you'd want some NLP based regularisation to make sure the code found is still human readable

@IanRiley915 11 ай бұрын

I’m not an assembly expert, so is there something that I missed? Because if try the sequence A, B, C, D with A < C < B < D, it seems to output A, B, C, D.

@Skeptic_Von_Rahm 11 ай бұрын

fun fact: AlphaDev's algorithm is only faster than Quick Sort when the input is 4 elements long, when the input length is higher than 4 it becomes worse another fun fact: Bogosort Algorithm is Faster than AlphaDev's algorithm when the input length in 4 :)

@akshay-kumar-007 11 ай бұрын

The middle out algorithm in Silicon Valley was just an explanation of how hyperthreading works.

@TheNewton 11 ай бұрын

or context-based compression/optimization like the mirror of foveated rendering

@eluddite889 11 ай бұрын

I love your content, man.

@CzeslawPL 11 ай бұрын

Sorry if I'm too dumb to grasp the algorithm fully but I've tried to check their improved algorithm for 3 numbers (first in the article) step by step and it seems to me like it fails for 3 numbers that are in descending order where as the first algorithm produces the outcome provided in the comments of that code. You've mentioned that you might do a video checking that algorithm - could you please do that? I really would like to understand why it works for them.

@lterego 11 ай бұрын

There is an extra condition stated in the paper: the sequences are assumed to have b

@bernhardbinde4865 11 ай бұрын

@@lterego so basically it took AI to realize, that if we already now that b

@HaswellCore 3 ай бұрын

@@bernhardbinde4865 its pseudo code in the article, the actual code looks different

@StephenCoda 7 ай бұрын

Alphago used some kind of Montecarlo sim if I remember right. Very hard to know how you'd do backprop with something so discontinous as machine instructions.

@Bolpat 3 ай бұрын

10:35 In principle, yes, but in practice, you can’t just look at the assembly and determine if it’s faster. It also depends on the hardware. Especially in the case of doing things on small and medium-size arrays, cache locality, branch prediction, and other low-level processor-specific optimizations greatly impact if not dominate the performance. Algorithmic complexity is only for very big inputs. For small arrays, you’d develop specialized sort-N algorithms that also take into account if comparing or swapping elements is cheap or expensive. For example, for sort-5, there is not one optimal algorithm, there is one that minimizes comparisons and one that minimizes swaps; probably (I don’t know for sure), there are also ones that do something in-between. As Dr. Andrei Alexandrescu once said: You gain speed only by doing less work. But a-priori, you don’t know if swapping or comparing is more work. If you want examples, basic integers are cheap to compare and swap. c strings (const char*) are cheap to swap, but in general can be expensive to compare. Sorting static_vectors (boost::static_vector values) by length is cheap to compare, but expensive to swap (linear in capacity), and sorting them by content is expensive in both domains.

@casperes0912 11 ай бұрын

EDI is Extended Data Index. But you really shouldn't think of it that way. It's just a general purpose register. Their nominal meaning is barely relevant anymore. The Extended part is because it's the 32-bit version of di. RDI is the 64-bit version

@mihailmojsoski4202 11 ай бұрын

RSP and RIP are the only ones I give a shit about generally, and you could argue that RSP is a general purpose register as well if you don't want a stack

@casperes0912 11 ай бұрын

@@mihailmojsoski4202 RBP if you ever want to ret.

@sanchittiwari7690 11 ай бұрын

Thank you Primeagen, I really liked this video

@Cafuzzler 11 ай бұрын

I don't know if that's actually a faster sort. I'm a novice, but stepping through the AI version, removing that instruction (mov S P) doesn't seem like it would make a valid sort. For example: If you have [3,2,1] in memory, there's no instruction in the AI version that writes to P when P is greater than R which means you just output to index 0 whatever you input into index 0. What am I missing?

@evergreen- 11 ай бұрын

It is assumed that b

@sohn7767 11 ай бұрын

Does AI know how to measure dicts in bytes though

@2wr633 11 ай бұрын

AI is just a more fancy way to call bruteforcing

@Optimus6128 11 ай бұрын

It's interesting as an idea to use AI to improve algorithms or generated code, I applaud the effort. I do remember watching an interview of the CEO of DeepMind and I thought they are doing something positive, as they helped discover algorithms in protein folding and other areas. But this paper seems a lot like a PR attempt, enough commenters easily have noticed that it might not work (unless we all missed something) for the result of the max value in some cases. Also, they really saved one MOV instruction. They highlight more lines that have changed, but only the comment really changed as different things are compared, but the exact same instructions are running. That's the same with the sorting of 4. It seems the logic to go in a chain, but just removing a MOV instruction alters what is sorted next. That would make sense that in subsequent 5 or more if one unrolls the code like this which would be good only for small numbers (so I don't know how they got the 1.7% on 150000 elements, ok maybe I do if it's some Duff's Device with few elements repeated), it's just a MOV (which is the cheapest instruction) among many other ones. Suddenly, if you highlighted only the single instruction that changed, it wouldn't look as good in PR presentation. It does also remind me the news of AI finding faster ways for Matrix Multiplication. But even in that case as I understood, it wasn't a generic imporvement for every matrix out there, but very special cases of them, maybe of more dimensions than three or specific data stored or specific transformations, I don't remember. It's kinda similar here. AI won't magically make everything faster, just find the niche points where something could be saved, something that a low level programmer could do on their own if they focus their mind on the thing they want to solve. Sometimes even using a different algorithm or different approach. And that's the other thing. I am kinda skeptical of the idea that AI will fundamentally save us and make all code faster. Thinking of JBlow's or Casey's rants on how software could be 10x or more faster, even without going into low level assembly,. just by identifying obvious performance traps that you might have left in your codebase, horrible single lines of code that if you knew, it would be easy to replace and speed up things tremendously. Because they already were too slow because of the assumption people do that "We don't need to care about performance anymore, optimization is the root of all evil" which is really a desire to move away from the machine as much as possible. If AI saves your ass 50% then you degrade the performance of your software 500%, then it's a 250% loss. We seem to move more towards the negative direction while we rejoice that news like this will "proove" once more that one doesn't need to care about performance anymore as the compiler is smarter than you. You see, that's the trap. It's when people see what GoDoT produces and think it's such magic that no living human would be ever able to produce. While if they were more performance aware, even without going low level, one could save performance way more than AI can ever do for you.

@Pythoner 11 ай бұрын

I'm thinking these sorts of optimizations will be more useful for higher-level code. For example optimizing an entire function whose job it is to find which polygons need to be rendered on screen and which can be skipped in a given 3D game engine. That's going to involve a lot of code and where even a seemingly small optimization can drastically reduce the amount of CPU cycles. These sorts of algorithms would also not have been gone over and iterated upon for decades like some basic sort algorithms have been, thus finding improvements to make is more likely.

@allalphazerobeta8643 11 ай бұрын

I don't think they actually ran the code on modern CPU's with there out-of-order execution, register renaming, etc. So I'm going to have to call this into question. From the actual paper: "Latency value functions ...The latency head is used to directly predict the latency of a given program by using the program’s actual computed latency" Actual computed? Like you the one you counted out vs the one the AI guessed at? They also penalized longer code, likely disproportionately. I call the whole thing into question.

@mihirrawat7347 11 ай бұрын

Does the assembly code for the 3-sort (original and alphadev both) even work? The output's largest value will never be B. [ex. for input 1 3 2, the output will be 1 3 2]

@Frank.N.Steinn 11 ай бұрын

yea, I have the same issue

@bilbo_gamers6417 11 ай бұрын

that brownie thing has earned my sub

@fulconandroadcone9488 11 ай бұрын

JDSL doesn't' need AI optimizations, it is optimized so far that optimizing it even further would result in negative time execution. Tom did not want to that to us, he is also very caring.

@aviaje_app 11 ай бұрын

Would be great to know what assembly instructions where available to AlphaDev, did it considered SIMD?

@kennethbeal 11 ай бұрын

Nice, thank you. Their approach reminds me of my childhood engineering: "what parts can we remove from this and it still function?" :)

@AftercastGames 11 ай бұрын

I had this idea several years back, trying to run assembly language instructions through an AI and have it find a faster solution with identical state results. Unfortunately, my AI-from-scratch skills are insufficient for this level of task. (Or pretty much any task, tbh)

@Soutame 3 ай бұрын

High-level language terms were defined well back then when you have a really low-level language like Assembly and basically the other language that is really hard to use. So, if use that term and definition. C++ is high level as Python because it is more readable and structured in scope compared to the old language. Difference concept of programming structure like pointer and platform specific don't mind that much since often time it's not depends on the language alone.

@Utesfan100 11 ай бұрын

Most compilers have optimization options. All I heard was that the AI provides better optimization than current algorithms. This is technology likely to impact newer languages where compiler optimizations are less developed.

@Waitwhat469 5 ай бұрын

Honestly, this applied to "new" archs like RISCV would be interesting as well.

@DirectCherry 11 ай бұрын

Just walked through the assembly example for the 3 element sort. Either the example is incomplete or there is a typo. It does not result in a sorted array if B is the greatest of the three elements. Kind of disappointing, as I was interested in understanding the optimization. Or maybe I'm just misunderstanding the assembly.

@afroboi7454 11 ай бұрын

"Oh man, I've become super skeptical, all of a sudden"

@ThePrimeTimeagen 11 ай бұрын

hey... that is me

@afroboi7454 11 ай бұрын

@@ThePrimeTimeagen Btw is that conversation with @healthygamergg ever happening??

@DagaenGolomb 11 ай бұрын

As you go further into details, it softens the wow factor. The new "algorithm" (if we can loosely call it that) is tweaking some parameters in a known sort algorithm to be faster on particular architectures. This is nothing more than fancy compiler optimization, in my opinion, and not anything likely to overtake substantial computer science research into theory of computation or algorithmic efficiency. It's -O4 optimization. This is something humans could do with enough time and resources. As for sorting 2 numbers, I this is actually significant if you figure something like merge sort is largely comparing two numbers at a time. Basically all sorting boils down to two-number comparisons. Even super inefficient algorithms like bubble sort do the same. Algorithms do respond to "endorphins" (their reward function) and that is why these need to be carefully crafted and analyzed. 1.7% for something like 250,000 items could very well be statistically significant, especially on (pseudo-)random sequences of such.

@FalcoGer 11 ай бұрын

the nice thing about c++ is that you can be as high or low level as you want. You want to sort a vector? one line of code. you want to access a hardware device and interpret the bits you read out from it's memory and interpret them in a particular order as a kernel module to act as a device file? No problem either.

@mikelord93 11 ай бұрын

A.i. explaining react with a 87% accuracy is a great use of A.i.. I've been using Chat GPT to automate parts of my job that I knew were possible to automate, but I didn't have the knowledge or time to seek that knowledge to do it. Even with its mistakes, I could write some sort of functioning code and ask about the errors that came up. I'm the sort of person who learns by doing, but I'm also too lazy to do for the sake of learning, so being able to actually do something that helps me because I can tap into that 87% accurate information is productive both for my job and for my coding experience. Now I use the time I freed up to automate other parts of my job, getting dopamine hits for knowing I can be lazy in the future, despite working more than I used to. It's fucking brilliant if you ask me

@adamm450 11 ай бұрын

seems to me like the AI improved corner cases of some algorithms that people didnt optimize for previously. like sorting 3 or 4 elements, I'm not an expert but I would guess it was not worth to spend much time on this in the past for the standard library and whoever needed to optimize for that case did it in their own in house libraries. still a valid use case as a lot of things that were not worth time to optimize for before could be automated to be optimized like this without wasting human effort...

@MaxHaydenChiz 11 ай бұрын

There's been a lot of work on this area because it's the inner loop of a bigger algorithm and is very hard to analyze and test. It was known that there were improvements there. You can find talks at major conferences about this.

@AScribblingTurtle 11 ай бұрын

For AI to take our Jobs, management and PMs would need to be able to accurately describe, what they want. I think our jobs are safe.

@4rcant 11 ай бұрын

i got so excited watching this video i had to finish viewing it in the toilet

@adheusrangel 11 ай бұрын

11:57 - Classic "that's what she said" moment

@dominick253 11 ай бұрын

I think the future is using an AI to train different AIs to get what it wants. For example you could have the main AI train a calculator AI and then call it anytime it needs to calculations.

@vighnesh153 11 ай бұрын

AI will then build AI web APIs for itself to do simpler tasks, probably written in jAvAsCrIpT

@joaovmlsilva3509 11 ай бұрын

@@vighnesh153 Win win, no one writes JS anymore

@edcorns3964 11 ай бұрын

This reminds me of a question I once got at a job interview: Exchange the values of two variables, A and B, without using a third variable (e.g. C). I didn't know the answer, and they wouldn't tell me what the answer was, so after the interview I spent a while thinking about it, and finally figured it out. A = A xor B B = A xor B (expands to A xor B xor B, which results in the value of A) A = A xor B (expands to A xor B xor A, which results in the value of B) Why would someone need something as "stupid" as this? Well, it actually makes perfect sense in an assembly language (regardless of the CPU architecture), where you have a very limited number of (data) registers, and the speed of code execution depends on both not accessing any memory locations (and instead using CPU registers as much as possible), and code locality (instruction cache hits). An AI optimizing an algorithm does exactly the same thing a programmer does when trying to optimize his or her code -- trying out (in their head first, and then in practice) as few instructions as possible, while checking for the code execution validity in boundary conditions. This, however, has a nasty tendency of producing highly specialized code that usually has some very ugly side-effects when used outside of its intended (data) environment... which what lazy programmers always do, i.e. try to use somebody else's code when and where it was not mean to be used, which (almost always) causes all kinds of (unpredictable) catastrophic software/firmware/hardware failures ("it failed for no reason, whatsoever" being the usual excuse, post-fact). I have no doubt that people trying to use AI-developed optimizations are going to do exactly the same thing: Follow in lazy programmers' footsteps, and start causing all kinds of (unpredictable) catastrophic system failures... and the more AI-optimized the code becomes, the more catastrophic those failures are going to be... and, for some "inexplicable" reason, nuclear (including fusion) reactors come to mind right now.

@harsha1306 11 ай бұрын

I thought this was gonna give me existential dread but TrashDev out here catching strays like the back wall at a shooting range

@rawallon 11 ай бұрын

I cant believe this time he actually disabled alerts

@brainstormsurge154 10 ай бұрын

Based on what I know they're working on I'm sure their next step is getting this working on a quantum computer system since it has unique advantages when it comes to search space applications. It wouldn't just be faster but just better overall or at least that's the goal.

@triplea657aaa 11 ай бұрын

I wonder what the difference between training it and rewarding based on correct sort/incorrect sort and compared that against a levenshtein distance-based scoring system

@gabrielkryger4055 11 ай бұрын

Tell me if I'm dumb, but the algorithm of the AI is broken, is´n it? Let's say the min value of the 3sort algorithm is C. The return value of Memory[0] (the min value of the array after sorting) IS ALWAYS the min(A, B), so it can never be C. so...

@Veecy 3 ай бұрын

"Chat Jippity" might be the funniest thing I've heard this week

@michaelmoltke911 11 ай бұрын

I kinda like ur rinkly brain prime.... You've definitely made me smile. Thx! ❤

@daltonyon 11 ай бұрын

This is the power of AI focus in resolved one problem

@ThePrimeTimeagen 11 ай бұрын

agreed. i think this is the true power of ai

@nullplan01 2 ай бұрын

I simulated the "3sort" algorithm on the input list 3,2,1, and the output is 3,2,3. It doesn't work. After the first block of cmovs, only S will be 1, and it can only overwrite anything if it is greater than some of the other ones. S being the minimum of these four registers is not a situation they've considered. The elided mov is actually really important.

@benandrew9852 11 ай бұрын

it's actually quite heartening to realise Prime can't read assembly

@wolfram77 11 ай бұрын

13:40 What if B is maximum? It doe'snt seem to end up in the last position in both cases?

@kentdeterding9333 11 ай бұрын

I would love to see this written out. 3 blue 1 brown style animations would be awesome, but that's a stretch.

@MyGroo 11 ай бұрын

It's just compiler optimization removing a single instruction. The changes in latter lines are just modified comments because the meaning of the registers are changed. Luckily they chose sorting so that everyone could know the title was bullshit before even going into details. You cannot theoretically beat NlogN in sorting, so current algorithms like quicksort and mergesort are optimal.

@zmania101 11 ай бұрын

Oh god that's exactly what I was waiting for today. Today I woke up and said, "You know what this day is missing? Some faster fucking sorting algorithms."

@laughingvampire7555 11 ай бұрын

with AI, we are no longer going to be software craftsmen as in carpenters/woodworkers/masons we are going to be gardeners.

@antonpieper 11 ай бұрын

15:39 the slowdown is the optimization we try along the way

@TheNewton 11 ай бұрын

Can someone explain how 3:35 Theorem of Trash uses brownies as a metric? Wouldn't milk+brownies be perfect equilibrium so milk by itself is also trash? My experience with trash theorems uses other metrics like pills, littering and non-indented code.

@grawss 11 ай бұрын

My process is to stare at code, pace for a bit, stare at code, come up with solution, find out that solution prevents next step from succeeding, refactor everything, profit. Except for the pacing, this is how AI currently functions. It's scary because this is my livelihood and they're so close to replicating it. :(

@mihailmojsoski4202 11 ай бұрын

have you considered getting it right the first time?

@grawss 11 ай бұрын

@@mihailmojsoski4202 Always. But sometimes you gotta wipe twice.

@SIGSEGV1337 11 ай бұрын

this guy is like tobuscus for programming

@Bolpat 3 ай бұрын

15:10 It didn’t remove any comparisons. Look at the diff: It only affects comments, which document what we know about the ordering of the elements. What that means is it removed one instruction (mov P T), which affects what we know later, but we actually didn’t need that information to prove that the result is sorted.

@earx23 3 ай бұрын

Faster for shorter sequences indeed corresponds with asm level optimization.

@HagenvonEitzen 11 ай бұрын

12:20 So the code on the left sorts input 17, 69, 42 into 17, 69, 42. And the ocde on the right sorts 42, 69, 17 into 42, 69, 42? Both codes use two comparisons only, when it is straigtforward to prove that sorting three numbers requires three comparisons.

@Antagon666 3 ай бұрын

Those look like sorting networks. Those are a predetermined sequence of min/max operations that always result in correctly sorted data for some fixed size

@lucidbasil9869 10 ай бұрын

If I understand this correctly given the little amount of information I have, it seems as though their AI did to assembly instructions what any decent static analyzer can do for a high level language.

@worldspam5682 11 ай бұрын

Skipping video part where youtuber skips part of content is pure entertainment

@jaysistar2711 11 ай бұрын

Shogi is like Chess, but you can turn captured pieces against your opponent.

@abbashussain7298 11 ай бұрын

you're actually the TechLead++ of this side of youtube lmfaoooo

@rudiservo 11 ай бұрын

I for one actually like assembly, but this AI algorithm for 3 values does not add up. My understanding is that cmp A C, then B with whoever is lower. So effectively A>C, but B>C and B>A, the values are P=A, Q = B, R=A. you could do a cmovl S P after comp S Q, because S = C if A>C, otherwise C isn't anywhere else then S, it wont be in memory, if you don't need C, it's ok, but if you need it, you just lost information. I could be wrong but, anyone cares to correct me?

@Optimus6128 11 ай бұрын

Yeah, many other comments have noticed that this algorithm won't work when storing the max value (unless we are all just dumber than AI and don't understand assembly or the high level comments) Also,. I am thinking right now, in both sort 3 and sort 4, the only actual instruction that changes is a MOV. That's not a breakthrough. But they highlight more lines of the code, because the comments change as what is compared now changes too. But in reality, this "revolutionary" approach saved a MOV (which isn't even a costly instruction and as people have mentioned in modern architecture it might not necessary lead to faster code, with all the pipeline rules and multiple instructions execution). But if they were to only highlight the actual changed line, it wouldn't look so spectacular on a paper. It seems more of a PR move than something groundbreaking. As much as I find the idea of using AI to discover new algorithms interesting and beneficial (which they didn't, they just removed a redundant instruction), there is something that rubbing me the wrong way with this. p.s. Prime's followup video on the solution would be interesting to watch

@rudiservo 11 ай бұрын

@@Optimus6128 thanks, unless it's in a cycle and you are running series of 3 values in a loop of +1 to try and order more stuff, you are actually loosing data, you could save some speed but it's a MOV not a comparison move if greater or lesser that checks for 0 and a remainder (ZF anf CF i think)and it depends on the architecture. To me it's a PR move, any decent assembly programmer knows what it is doing, that MOV was put there because it's the cheapest way to keep data without making a comparison, so unless the data is not important, the AI solution is not really a sorting algorithm. I think this is a blunt underestimation of programmers and that AI can do it all. AI can be a good tool, but it's a blunt and brute force one, it can find stuff but it does not mean it comprehends fully the context to put out the perfect solution. IDK, this seems big nothing burger.

@CTimmerman 11 ай бұрын

Adding and removing instructions sounds like a genetic algorithm to me, but those have been known for decades. I guess current autocomplete models can limit the number of valid (non-crashing) moves to be workable.

@tigergold5990 11 ай бұрын

genetic algorithms typically include random mutations and many different instances being “trained” at once, whereas I believe this method involves one model which is trained using RL to add or remove instructions

@ThePrimeTimeagen 11 ай бұрын

this is why i didn't suggest genetic/evo algo because usually there is a space represented by some value (a sha) and you can explore super easily by halving two together / mutating here and there where as this is clearly some sort of learning/gradient decent error correction

@isodoubIet 11 ай бұрын

It's quite different from a genetic algorithm. A genetic algorithm represents the objects being optimized as strings (the 'genome'), and then keeps a 'population' of examples which are sorted according to their 'fitness'. On each generation, you generate new 'individuals' by combining the genomes of the best-fit individuals, applying random mutations, and throwing out some of the bad-performing ones. What this is doing is reinforcement learning, meaning both the prediction algorithm and the analogue of "fitness" are differentiable functions that can be optimized using gradient descent. There's no 'population'; they merely apply the algorithm, see what comes out, and if it comes out wrong/slow that results in a penalty that shifts the numerical weights around.

@FrankHarwald 10 ай бұрын

The problem with most algorithms (& this includes sorting algorithms as well): they are both in theory & practice dependent on too many factors to have one that works generally good but even more so best for even the majority of cases. In other words: change the circumstances, inputs, outputs, size & machine they are supposed to run on even a little & you might come up with yet another better algorithm & this plethora of possible algorithms also means it's pointless trying to find one optimal algorithm because you think it's optimal & then try to ship it in a library or so. Instead you would need a) a big, lasting resource intensive real-world applications that you could b) describe very precise what the inputs, outputs, are & on what machine it's supposed to run to have a use case where you'd want to use something like a developer savant, human or AI, to come up with a better algorithm & you'd actually be better off then using one of the existing ones. OTOH, considering that so many different sorting algorithms with so many different properties exists, really the one thing that computer science needs is a better, both simpler, more efficient as well as broader way of telling & classifying which algorithm to choose given circumstances, because selecting the right one is the more general problem with sorting.

@maximecaron3133 4 ай бұрын

Relational databases are doing that already. They maintain statistics (histogram) on the column and use it to decide which sorting algorithm to use