Sorting Algorithms: Speed Is Found In The Minds of People - Andrei Alexandrescu

Sorting Algorithms: Speed Is Found In The Minds of People - Andrei Alexandrescu - CppCon 2019

Рет қаралды 175,086

Күн бұрын

CppCon.org
Discussion & Comments: / cpp
Presentation Slides, PDFs, Source Code and other presenter materials are available at: github.com/CppCon/CppCon2019
Sorting Algorithms: Speed Is Found In The Minds of People
In all likelihood, sorting is one of the most researched classes of algorithms. It is a fundamental task in Computer Science, both on its own and as a step in other algorithms. Efficient algorithms for sorting and searching are now taught in core undergraduate classes. Are they at their best, or is there more blood to squeeze from that stone? This talk will explore a few less known - but more allegro! - variants of classic sorting algorithms. And as they say, the road matters more than the destination. Along the way, we'll encounter many wondrous surprises and we'll learn how to cope with the puzzling behavior of modern complex architectures.
-
Andrei Alexandrescu
Andrei Alexandrescu is a researcher, software engineer, and author. He wrote three best-selling books on programming (Modern C++ Design, C++ Coding Standards, and The D Programming Language) and numerous articles and papers on wide-ranging topics from programming to language design to Machine Learning to Natural Language Processing. Andrei holds a PhD in Computer Science from the University of Washington and a BSc in Electrical Engineering from University "Politehnica" Bucharest. He is the Vice President of the D Language Foundation.
-
Videos Filmed & Edited by Bash Films: www.BashFilms.com
*-----*
Register Now For CppCon 2022: cppcon.org/registration/
*-----*

Пікірлер: 83

@oleksiimomot4040 4 жыл бұрын

Amazing talk with important conclusions and funny jokes from Andrei. Thank a lot!

@fritsgerms3565 2 жыл бұрын

Andrei is one of the original pall-bearers of c++. Always enjoy his talks. He is creative & unique in his thinking. I wish cppcon would become smaller and increase its quality like it's golden age in +-2014.

@movax20h 4 жыл бұрын

Excellent talk. There is so many unsolved problems in sorting (especially adaptive sorts, parallel and distributed sorting, in place merge-style sorting, etc). People often overlook actual complexity (both theoretical and practical) of sorting. Many years ago I had an interesting sorting requirements, and well, I found out that despite research for 60 years and a lot of progress there is no known optimal solution, nor a well working scheme that works good on average to this day.

@CedricMialaret 4 жыл бұрын

Must-watch presentation, as always with A.A.

@oscarasterkrans9301 3 жыл бұрын

Fantastic talk as always from Andrei.

@JordiMontesSanabria 4 жыл бұрын

25:50 "I am almost sorry there is no rotation there" LOL

@MalcolmParsons 4 жыл бұрын

There's a rotate() in insertion_sort().

@dengan699 3 жыл бұрын

Best CPP speaker And I am not even CPP developer!

@sakuraikamoto3699 4 жыл бұрын

Thank you sir is the best talk ini learning education from engineer 👍👍

@antonpetrenko9176 4 жыл бұрын

fire man and great talk!

@wessmall7957 4 жыл бұрын

This is earth shattering

@chidude16 Жыл бұрын

that was fantastic! brilliant. add that with Carbon and we're rocking....

@greenfloatingtoad 3 жыл бұрын

selecting an algorithm though specifics about types I believe is compatible/part of what Stepanov envisioned. The difference between a Concept and an Algebraic Object is the operations of a concept come with a cost

@andrewzmorris 4 жыл бұрын

Fantastic talk. Andrei is such a great speaker, and the content was very interesting. I'm not sure how to really improve our code in the ways Alexei highlights, but it certainly seems to be something that we should be talking about.

@boguscoder 4 жыл бұрын

*Andrei ;)

@bluecircle56 3 жыл бұрын

Andrei Alexandrescu talk, wheee!...

@6754bettkitty 2 жыл бұрын

42:20 That goto comment is one of the funniest things I have ever seen in programming!

@6754bettkitty 8 ай бұрын

42:40

@Dante3085 4 жыл бұрын

Yeah, this guy is fun :D

@maximilian19931 Жыл бұрын

ASM instructions have cycle costs, mostly defined in CPU Documentation booklet(mostly a table of how many cycles it takes), would be useful at compile time to produce code that uses the smallest instruction block to get the job done. The Architecture definition for compile target should include the ASM instruction cycle cost and make a decision to choose the least costly route available.

@colinmaharaj 2 жыл бұрын

I need to start traveling to meet these giants

@srh80 Жыл бұрын

Guy from Somebody feed Phil, moonlights as a C++ guru. Amazing talk.

@Dante3085 4 жыл бұрын

1:16:58 Thank you

@ruroruro 4 жыл бұрын

I think, it may be interesting to try use a different intermediate structure, than the heap. If I understood it correctly, the key reason, why constructing a heap improves the performance is that a heap is in some sense already partially sorted and so, the linear searches end up being shorter or more uniform, improving both locality and maybe even branch prediction. I think, that there may exist some data structure, that would be simpler to construct than the heap (maybe in terms of big-O or maybe in terms of performance), but would have the same or similar effects for the following linear insertion sort. After all, as Alexandrescu himself said "the heap is a delicate fine-tuned structure", so maybe constructing a heap is actually overkill. After all, we are not actually using all the invariants the heap provides. So maybe we can get away with using some structure with weaker invariants.

@michaelryabko885 4 жыл бұрын

It would be interesting to see performance with different types of heaps, even. Here A.A. only tried with the basic 2 child heap(which makes sense as it is a lot easier to reason with), but using something like a fibonnaci heap might have interesting results. Although this reasoning may go harder in the direction of strong invariants and may lead to worse performance overall.

@arirahikkala 3 жыл бұрын

@@michaelryabko885 Keep in mind that these heaps are still all under a size threshold that went up to, what, 255 elements at most. And that your heap has to live in the array that's getting sorted.

@yash1152 Жыл бұрын

rrlated question at 1:25:53

@03Prashanth 2 жыл бұрын

22:50 That comment on the undergrads in india was hilarious. Quite true but still... hahahaha

@MichaelPohoreski 4 жыл бұрын

_Design by Introspection_ is a sub-set of **Data Orientated Design.** There are 3 types of optimizations (from slowest to fastest): 1. Bit Twiddling 2. Algorithmic 3. Data orientated (cache usage and branch prediction) The FIRST rule for optimization is: **KNOW THY DATA**

@kuhluhOG 4 жыл бұрын

@Ziggi Mon data-orientated design means that you write your algorithms for your data, and don't abstract your data to something general but stay at the specific it ends at: use different algorithms for different data, don't generalize (at least how I understand it, I don't use it though) the first time I saw something about this topic was about by a (professional) video game developer who also said: "If we would have the time, we would write everything in Assembler, but we don't."

@sethsora7963 4 жыл бұрын

Glad to know that someone else recognized this too.

@LEGOOOOOOOOOOOS 2 жыл бұрын

Is there a book or talk on data orientated design?

@monad_tcp 3 жыл бұрын

48:57 I guess there's a way to improve more by exploiting addressing modes

@peceed Жыл бұрын

For every set of parametrical disorders there exist asymptotically optimal algorithm for all elements, unfortunately proof interleaves Turing machines so we can expect linear (according to set size) slowdown. What is surprising there exist algorithms that are optimal and practically quick for many different disorders.

@akshitjangra2471 9 ай бұрын

What sorting algorithm was the speaker trying to improve?

@joseville 2 жыл бұрын

36:12 I went the other way. I thought you'd add an arbitrarily large element (larger than any element in the list) to the list before heapify and then pop after sort. 39:45 Infinite loops FTW

@tyronewilson6463 Жыл бұрын

The arbitrary large element idea is probably hard with the use of function pointers to define the comparisons

@hanyanglee9018 Жыл бұрын

17:55 15 seconds ago I came up with the idea of branch prection. When he asked the question I answer to my screen "Catching!". I'm definitely in poverty.

@KafeinBE 8 ай бұрын

What happens if you repeat the silly idea? Do make_heap twice, once on the entire array as shown, then once again with the array minus the three first elements?

@absurdengineering 4 жыл бұрын

I have a problem with this talk: it presumes a lot. It presumes branch prediction, caches of certain depth, speculative execution, and who knows what else. Sure: modern desktop CPUs have it, as do the large ARMs. Now take this code and start moving it backwards in history. Say, to Raspberry Pi A. To a P-II. To a 486. To a 386. To AVR. Heck, rewrite it in Turbo Pascal for CP/M. I’ve tried: it all falls apart as you pass ARMv7. And the simpler the architecture, the worse the regression. The standard library is supposed to be as universal as possible. It may run on varied architectures. Yet here we micro-optimize in ways that make things much worse as soon as you’re outside the realm of the most modern CPU. These improvements make the sort worse than 2x as slow on a 486, for example. Sure, it’s not exactly a common target. But Arduino supports C++20, and if we try this trick there, we get same performance regressions. So, before anyone sticks this code into any standard libraries, they better examine the supported platforms and make it optional, and enabled only on modern architectures. I also have a problem with sweeping statements like “they don’t teach it in books”. CS introductory texts all presume an uncached in-order architecture where every memory access and comparison has equal cost. We all know this, even if the authors fail miserably at putting this assumption in big bold letters at the start of every chapter. But the standard analysis of algorithms doesn’t somehow fail here: as soon as you make the costs take into account caching and branch prediction, you get somewhat more complex equations, but you recover the approximation of computed costs to costs of running on real hardware. So, given that we all understand what assumptions the classic CS texts make, it’s disingenuous to say “they don’t teach it in the books”. They don’t because even with the simplified logical architecture the texts presume, it’s still hard enough. And there are plenty of theoretical algorithm treatments that do in fact take architectural details into account. Also, it’s not as if the CS classics are completely detached from modernity. There’s this thing that was all the rage back when cores were too small to fit all the business data. Remember tape sorting? It takes excellent advantage of memory caching and prefetch, and you start seeing these advantages as soon as you “upgrade” from 386 to 486. Also, this general admonition against branches but in favor of data-integrated condition tests: all is fine and dandy if you don’t introduce data dependencies. Not a thing back on a 486, but these days short-acting data dependencies deparallelize the CPU, and the worse they get the more the billion transistor CPU starts to work like fast but “dumb” 486, with 1000x less transistors… This talk is a wonderful show and it is full of technical insights, but this is high-grade engineering entertainment, not an engineering project report. Don’t try to apply these things without having a clear understanding of what’s involved but unsaid, and for crying out loud: repeat all the benchmarks in your target application, with your target data.

@Sogartar 3 жыл бұрын

I think his point was exactly that. If you can introspect much of the relevant things like the type of data, hardware, etc. you can compile an algorithm more adapted to the task.

@KafeinBE 8 ай бұрын

Of course the talk should probably have mentioned this caveat, but I don't agree the standard library should optimize for any CPUs other than the most modern. Because the most modern are going to be the fastest, and that's naturally where speed and efficiency matters the most. Additionally, caches, pipelining and speculative execution have been a feature of architecture design for a while now, and a pretty pervasive. Chances are, if you're not coding baremetal then you have to take these things into account. Sure, a very small IoT device won't have the best sort, but chances are the programmer ought to roll their own sort anyway.

@OMGclueless 3 жыл бұрын

How on earth do you write unguarded_insertion_sort for a generic type? It's one thing for std::sort to return elements in an arbitrary order if a programmer overlooks that their less-than predicate doesn't define a total order. It's totally another for the insertion operation to buffer underflow and start writing to arbitrary locations. Actually, I don't even know that you can do it for doubles. At least, his code has a bug: at 29:22, if the element at *begin is NaN, then make_heap will leave it there because greater() is false for all comparisons with that element. Then unguarded_insertion_sort will underflow below that element because all less() comparisons will be false as well. IEEE764 floating point doesn't define a total order so you can't use the default comparison operators in the way Andrei does.

@mxBug 2 жыл бұрын

OK. so let the consumer of the sort function supply a comparison with a total order. if you are concerned about how the compiler can ensure the correctness of the comparison, perhaps dependent typing is your thing?

@Kromaatikse 2 жыл бұрын

First you compare the element being inserted with the element at the very beginning of the array. If it goes before that, you insert it there and you're done. If it doesn't, then the unguarded insertion run is guaranteed to terminate at or before the first element, *regardless* of whether the elements are totally-ordered or not.

@greenfloatingtoad 3 жыл бұрын

Until we have introspection, I think we can fake it with type_traits style structs of constants

@Mage_Chartreux Жыл бұрын

Was this tested on many different computers? I just watched an excellent talk on how memory layout during runtime can account for plus or minus 40% of the variation in the time it takes to run something. I'm wondering if a lot of this isn't just because the data is getting restructured.

@yash1152 Жыл бұрын

1:24:35 > _"different intel machines"_

@NXTangl Жыл бұрын

Anything that lays out doubles differently is going to be pretty damn exotic.

@Attlanttizz 4 жыл бұрын

33:38 The underscore bonus... 🤣

@aretha6360 3 жыл бұрын

I thought infinite loops were a bad thing in C/C++ since the compiler has no easy way of unrolling the loop because it doesn't know when it stops and it would interfere with the compiler's vectorisation ability? Also, in my code if I want an infinite loop I tend to write while(true), but he uses for(;;), is there a difference?

@lewistherin4096 3 жыл бұрын

I guess there is no difference in for and while, but perhaps check godbolt.org/ and test it yourself with the compiler you are using

@darkengine5931 3 жыл бұрын

I'm curious about the first question as well. If the loop unrolling is applied at the IR level, it may make no difference as far as how we write them. But I'm not sure how optimizers typically approach loop unrolling optimizations. For the second question, 'for(;;)' tends to be what's accepted as the standard way to write infinite loops in C++. If you use 'while (true)', some compilers may complain about branching with a compile-time constant for its expression.

@6754bettkitty 2 жыл бұрын

Cpp insights might show a difference, if there is one...

@kylek.3689 2 жыл бұрын

Vectorization is undesirable for sorting an arbitrarily sized list of elements, since you can't predict how many elements you will need to sort ahead of time.

@Sebanisu 3 жыл бұрын

I wonder if GNU has fixed the issues he found.

@yash1152 Жыл бұрын

1:04:35 whats hot/cold code??

@stephenhowe4107 4 жыл бұрын

Slide at 26:56 is wrong. It should be unguarded_insertion_sort(begin +1, end); Because the 2nd smallest element can be at either begin +1 or begin +2 and it is still a valid heap.

@MikhailMatrosov 4 жыл бұрын

Exactly, but he insists that should be +3. Maybe we are missing something?..

@jeffdovalovsky7259 4 жыл бұрын

The slides in the linked github acknowledge that begin+3 was wrong, and correctly use begin+2. Stepanov's unguarded_insertion_sort(first,last) accepts a range of unsorted elements, which it will insert as needed into a preceding pre-sorted sub-sequence. begin and begin+1 are already sorted, begin+2 is not, so it can be unguarded_insertion_sort(begin+2,end)

@JakubNarebski 4 жыл бұрын

@@MikhailMatrosov , Stephen Howe - in the comments to this talk at Reddit (link in the description of the video) Alexei says that it should be +2, not +3 (because the two first elements are already sorted). The +3 was mistake in slides, and he has +2 in the actual code.

@stephenhowe4107 4 жыл бұрын

@@JakubNarebski : No that is not a correct explanation. Andrei explained to me, Your right about the +2. The first element is sorted. The second and third elements can be out of order, because that is still a valid heap. You can have 1, 7, 4 and that is a valid heap as the root is less than 2nd and 3rd element its children. The fact that the 2nd and 3rd element are out of order is neither here and there, it is a valid heap. But it does not matter because this is unguarded insertion sort not insertion sort. So the 2nd and 3rd element get corrected. That is Andrei has +2 (works due to unguarded version) rather than +3 (does not work). In correspondence with Andrei, I was insisting on +1 because of this, but he pointed out this is unguarded insertion sort not insertion sort and therefore +2 is the minimum value which it is guaranted to work.

@yash1152 Жыл бұрын

16:35 why the binary search insertion sort is bad

@rkalle66 Жыл бұрын

The performance order of doing silly things is O(n²). Am I right? It's like gambling with hindsight 20/20. It reminds me in Germany talking about soccer and somebody saying "The game has 90 minutes or till the referee whistles the end". Something you have to put a 5-Mark coin into the piggy bank.

@andralex Жыл бұрын

It is indeed quadratic, but not a gamble. Note that the quadratic algorithm is always invoked on small to medium input sizes (for larger inputs you'd use quicksort).

@CT7ALW 3 жыл бұрын

All that "boolean arithmetic instead of ifs" goes a bit against the "you're not smarter than the compiler", doesn't it? It's almost like that division example where you try to be smart with bit shifts and the compiler puts the division back...

@Theonewhowantsaname 2 жыл бұрын

That’s the first question he answers, at 1:17:25

@HtS643KyS6555GxQ3edA 3 жыл бұрын

I wonder what the benchmarks would be on Apple m1 chip?

@yash1152 Жыл бұрын

> _"am wondering about benchmarks on apple m1 chip"_ did u check?

@childhood1888 2 жыл бұрын

46:11

@JorgeLuis-ts6qp Жыл бұрын

Il leave 1:03:00 for me in the future.

@NXTangl Жыл бұрын

PDQsort would have handled random01.

@ThanhNguyen-rz4tf 2 жыл бұрын

Note for myself(You should ignore it): 15:22 -> 16:43Compare linear-optimistic insertion sort vs binary insertion sort. => Worse O(n) (linear) doesn't mean worse runtime. 22:00 Try to insert condition into artimetic

@qm3ster 2 жыл бұрын

Nah, generic programming should support this. Don't just statically introspect the size of the element. Compile the moving and comparison code just to analyze it, then use the results of that in the body of the sort implementation :v

@tomcheng2437 4 жыл бұрын

I thought quicksort is not stable...

@VioletGiraffe 4 жыл бұрын

Good point, there's std::sort and std::stable_sort.

@greenfloatingtoad 3 жыл бұрын

Idempotency is weaker than stability. If you put in perfectly sorted data, an idempotent sort will leave it alone. But if it's even a little bit unsorted, it might change the order equivalent elements show up in (stability)

@max0x7ba 4 жыл бұрын

That statement that Radix Sort only works for integers is not accurate. See Malte Skarupke brilliant talk "Sorting in less than O(n log n): Generalizing and optimizing radix sort": kzbin.info/www/bejne/sKLWaWqXlJytrtk

@LaurentLaborde 2 жыл бұрын

from 1:09:34 I keep shouting "A DATABASE !!" every 20s but he's not answering :(

@LaurentLaborde 2 жыл бұрын

a database is doing A LOT of work it order to be as fast as possible when needed. Which is 50% of his talk

@monad_tcp 3 жыл бұрын

1:20:27 concepts don't work, the only solution for that is tagging types, and dependently typing, you have to tag directly the AST and attribute types in it, and them forward them, you can't do that in C++, because you can't introspect non-const-expr code, so concepts are useless (to solve that problem), as is SIFINAE, and template metaprogramming. So in a way, Generics Programming (in C++) is why we can't have nice things, you could actually do this optimization in C# thou, because the JIT compiler has that information ! Or you could manually tag your UDTs as he said, but that's not-optimal, it does work, but we can do better, its possible, perhaps C++ is why we can't have nice things, but we can have better languages.