man, Chandler makes the camera man work really hard for his money xD
@kamilziemian995 Жыл бұрын
😂
@kamilziemian995 Жыл бұрын
Chandler Carruth talks are great. I try every one of them available on the YT.
@barrettellisadair8 жыл бұрын
I always enjoy Chandler's talks - this one has great video/audio quality as well. I would like to emphasize the fact that this pointer-packing stuff is implementation-defined, and generally pointless outside of a performance bottleneck. So, some of the code you don't see in the slides likely contains a lot of platform-specific configuration macros (but I haven't looked at these parts of the Clang code base yet). TL;DW: Cater to the cache, pack your pointers.
@MrAbrazildo8 жыл бұрын
I supose you are talking about the 4-bits-packing stuff, instead of the array of unique pointers, for large objects, right?
@10se1ucgo8 жыл бұрын
Agreed, he's definitely my favorite presenter.
@kewqie7 жыл бұрын
Chandler is my favorite speaker after Herb, good talk!
@seditt51465 жыл бұрын
I agree, another one worth catching all his talks is Ansel Sermersheim. He makes hard to understand subjects highly consumable. Timur Doumler also has good talks but he is not the best public speaker, not the worst, he is very good, but not Herb or Chandler levels at all.
@Cons-Cat2 жыл бұрын
I'm quite sure that the paramaterized allocators problem is easily solvable with C++20's `concept` feature now.
@ivanzhechev4215 Жыл бұрын
Slowly learning C++20, but how would you achieve that? The only way I can think of is rejecting incompatible allocators and giving a compiler error, which doesn't really achieve the result of having the SmallVector on API boundaries.
@davidforgeas22358 жыл бұрын
Is there a significant performance benefit to use the hybrid 32-bit pointers and 64-bit registers ABI, namely x32 ABI? I don't mean for the generated code, but specifically for the compiler itself?
@SianaGearz7 жыл бұрын
So you get an expanded register file with 8 extra registers, but register scarcity has not been a major issue since about Pentium Pro, since top of stack is actually aliased to an unknown number of hidden registers and the corresponding operations get removed at the opcode decoding stage. You'd get a slight bottleneck at opcode decoding but it was pretty much removed in the later iterations of Pentium III. There also doesn't seem to be any difference whatsoever in compiler performance between pure 32-bit and 64-bit ABIs, at least from my experience with GCC - don't know whether it extends to Clang, but I strongly suspect so.
@dat_216 жыл бұрын
Can you elaborate on "top of stack" aliasing? I don't find any mentions of it in intel optimization guide or Agner Fog's guides. Maybe you are talking about FP register stack?
@wintermute7018 жыл бұрын
The talk from 2 years ago he mentions in the beginning: kzbin.info/www/bejne/nHmxnoWhr917jdU
@enhex8 жыл бұрын
12:00 Can alias templates be used to avoid those mistakes?
@ChandlerCarruth8 жыл бұрын
Yes, but at the expense of having to come up with a new name for each different small-size-optimized container in a function's API boundary. =/ Doesn't seem to scale very well.
@xfreeman866 жыл бұрын
You can use the same technique you use to remove the size parameter from the type of SmallVectorImpl. Allocators are passed by reference, so no slicing worries, and you can use the pimpl idiom for polymorphic copying. My jury is still out on whether it's better, but it is possible.
@Shockszzbyyous7 жыл бұрын
ok i see so the idea behind smallvector is that you can allocate some space before hand? instead of allocating when pushing something on to the vector array ?
@tomcheng39037 жыл бұрын
No, the point is that you allocate it on the stack, not the heap. If it's too large for the stack, then it'll fallback to the heap. You can allocate space in a vector beforehand with vector::reserve().
@NoNameAtAll23 жыл бұрын
black magic
@rubberduck20782 жыл бұрын
>llvm >Fast
@jonesconrad15 жыл бұрын
i just got lost on xkcd.com for 10 minutes
@soonts8 жыл бұрын
@20:00 Microsoft has very similar implementation in their ATL, since 2003. Their implementation is called CAtlPlex, and is used internally by many ATL containers.
@BlairdBlaird2 жыл бұрын
Apple also uses that at the language level (of both ObjC and Swift IIRC). In fact on 64b they use not only the low bits but also the high bits, because all extant 64b implementations only have 48b address space at the moment (and x86-64 is architecturally limited to 52 bits).
@onosendaiAAB8 жыл бұрын
Hi Chandler, I may be misinterpreting you, but at kzbin.info/www/bejne/rHbPi5Zsr7h8jq8 are you implying that modern CPUs do indirect memory prefetching? e.g. are you implying that CPUs will work out you are iterating through an array of pointers, read one or more indices ahead in that array, and then prefetch the memory pointed to by that pointer?
@deadalnix8 жыл бұрын
Modern Intel do it to some extent. I'm not aware of any other architecture doing it.
@onosendaiAAB8 жыл бұрын
Interesting, do you have any further information or references about that?
@onosendaiAAB8 жыл бұрын
I've just done some tests on my Ivy Bridge, and I'm pretty sure it doesn't do indirect memory prefetching.
@deadalnix8 жыл бұрын
Note that prefetching on pointer load is bound to be slow no matter what. If you want to measure if the CPU does prefetching or not, make sure you use the same algorithm, but one doing prefetching explicitly using compiler intrinsic, and the other letting the CPU do its thing. If you see no significant perf difference, your CPU does prefectching. If the one with prefectching is faster, then you CPU doesn't do prefecthing. I wouldn't be able to tell you if your specific CPU does it. Some Intel do it, but I'm not sure which ones.
@noahgoldstein39512 жыл бұрын
@onosendaiAAB I think you're right there is no indirect prefetching. Possibly he means that the execution of later iteration loads can occur speculatively and in parallel to the first load because there is no dependency.
@SolomonUcko4 жыл бұрын
Is this the previous talk? kzbin.info/www/bejne/nHmxnoWhr917jdU ("Efficiency with Algorithms, Performance with Data Structures")
@MrAbrazildo8 жыл бұрын
_Value.template_, I don't even know what this means.
@barrettellisadair8 жыл бұрын
It is a disambiguation for accessing a templated member of an object that is itself a dependent type.
@xfreeman866 жыл бұрын
It's hard to parse as a human, and you just happened to get it wrong is all. Read it as (Value).(template is)(). It's for when you call a template member function with type parameters that cannot be deduced from its formal parameters.
@NXTangl5 жыл бұрын
@@xfreeman86 ...wait, you're Gordon Freeman's brother, aren't you?
@insideloop228 жыл бұрын
As usual, it is a joy to listen to Chandler's talks. I am working on an Open Source library of data structures for High Performance Computing and one of the first thing I had to do is to rebuilt a custom std::vector. One of the many reason I had to do that is that std::vector v(n); fills the vector with zeros which trashes the cache and prevents you to use the "first touch policy" which is extremely important for NUMA architectures such as dual socket nodes. Alignement is also an issue to get efficient vectorization in HPC. These things could be "solved" with allocators, but I just hate them as I don't want them to change the type of my containers. I have a few questions though: - Do you have a llvm::Vector or do you use llvm::SmallVector as a replacement for std::vector? Do you also use llvm::Vector as a replacement of std::array? - There is a BumpPtrAllocator, but is does not seem possible to use it with SmallVector. Do you have any design ideas for allocators? I just can't get them to work the way I way. I hate both the way they are done in the C++ standard library and the Bloomberg implementation.
@ChandlerCarruth8 жыл бұрын
We use SmallVector pretty much everywhere. On very rare occasions we'll want what std::vector provides, but it's unusual. std::array is a totally different beast -- it is a statically sized array which has essentially no overlap with the vector use cases. I suspect there are some places where we could use std::array and currently use llvm::SmallVector, but I think they're rare.
@ChandlerCarruth8 жыл бұрын
And regarding allocators -- for LLVM, the pattern we fall into more often is needing to control allocation of the *objects* with allocators, not the buffers in the data structures. For those buffers, tcmalloc, jemalloc, or any of the other optimized free-list based malloc libraries will do a fine job.
@henrikvallgren19438 жыл бұрын
I tried to get attention to this issue a while back: groups.google.com/a/isocpp.org/forum/#!msg/std-proposals/b946VongJjY/RYYj--v63qsJ
@insideloop228 жыл бұрын
There are too many things in the standard library which are broken for me and I am glad I decided to write my own containers. It provides me so many useful things. As I am working mainly on numerical simulation, all I really need is a good std::vector implementation. Having my own allows me to: - Have an il::Vector v(n) that does not fills the vector with 0 (same with float, int, etc) - Have an il::Vector v(n) that gets initialized to NaN in debug mode (very useful if you want to track uninitialized values) - Add my own constructor such as il::Vector v(n, il::align, 64) if I want to align the buffer to a cacheline - Easily count the number of copy constructor called in il::Vector (if I want to make sure that the move semantics are properly done) - Having il::Vector::size() return a signed integer (I hate unsigned integers and all the bugs they lead to) - Write a il::SmallVector as Chandler has pointed - so many useful things... I like the language C++, but I just have so many troubles with the standard library. I am glad to hear that LLVM worked on their own containers. I can point people who say I am stupid not to use the standard library to this kind of talk.
@MrAbrazildo8 жыл бұрын
So, did you already write your own, or just want to?
@TheEmT335 жыл бұрын
i love this guy
@tribalfromthetrapdontrappi30306 жыл бұрын
Nice!
@spudtaters84198 жыл бұрын
The smallvector slide at 5:26 is confusing me because of omissions: 1) Can all small size vectors be upgraded to heap storage on overflow 2) Is SmallVectorImpl what you use for function params? I assume so, it just sounds like a wierd / wordier name.
@insideloop228 жыл бұрын
1) Yes, with SmallVector v; they can call v.resize(10); 2) The name is a bit weird, but I don't know which one they could have chosen
@xfreeman866 жыл бұрын
For 2), the names should have been switched. Then you allocate SmallVectorImpl but pass around SmallVector.
@sebastianmestre89715 жыл бұрын
@@xfreeman86 what about allocating SmallVector and passing around BasicSmallVector
@JohnDlugosz5 жыл бұрын
6:17 Slide 6 -- How does the SmallVectorImpl::push_back know whether to free the old buffer? Without knowing the original N, it can't tell if it's been reallocated to the heap already, or was still using the small buffer.
@pazdziochowaty4 жыл бұрын
It can check if Begin points right after SmallVectorImpl object - this is where derived class keeps the buffer
@max0x7ba8 жыл бұрын
Bug in slide 21: `(Size >= (End - CurPtr))` should be `(Size > (End - CurPtr))`.
@stdafx5 жыл бұрын
In the original source code it looks a bit different: "if (Adjustment + SizeToAllocate
@MrAbrazildo8 жыл бұрын
I don't get the point about SmallVector class having a fixed size, whereas the containers goal is to change it. I can only guess that 'N' is the maximum size the "simulated container" will ever get, right? But how would you know that?
@ChandlerCarruth8 жыл бұрын
It's not a fixed size -- it's an initial *allocation* size. The container still is dynamically sized like std::vector.
@MrAbrazildo8 жыл бұрын
+Chandler Carruth I saw the push_back f(), however, it will produce just 1 element. And even if it create another buffer, how Begin and End would work with the 2 or more arrays, without an extra code to handle the situation? I guess you are talking about low-level, which would allow you to delete an compiling-time array, and realocate it, unlike C/C++ allow to do, right?
@ChandlerCarruth8 жыл бұрын
No, this is totally built with what C++ already allows you to do. The begin and end pointers start off pointing at the inline small buffer, and if extra space is needed it gets allocated and the begin and end pointers are moved to point at the new buffer. You can find all the code for the real implementation here: github.com/llvm-project/llvm-project/blob/master/llvm/include/llvm/ADT/SmallVector.h
@MrAbrazildo8 жыл бұрын
Chandler Carruth So, the penalty is risking to have an unused array, after the memory enlarge. Good idea, though. The pointers can also return to the array, if memory is freed. How much an array is faster than std::list? For GCC, some years ago, I measured that an array, simulating std::map, is 6-7 times faster.
@von_nobody7 жыл бұрын
btw How about intrinsic lists? Or when you say "list" you mean list `List` instead of `List`? www.codeofhonor.com/blog/avoiding-game-crashes-related-to-linked-lists This have very good properties if list are very dynamic and object can move between lists. Of corse it have one big drawback, object can belongs only to one list at time.
@MrAbrazildo8 жыл бұрын
> 5:00, in derived class, you could had use the base constructor instead (C++11 and above): _using SmallVectorImpl::SmallVectorImpl;_. > 8:36, at least for GCC, _std::pair_ drops performance by the half, compared to 2 separately values. It was shocking to me (absurd geek moment)! However, I guess that an std::array , and a first and second f()s, fix this issue. _Edit: nope. This array is as slow as std::pair._
@EvanED8 жыл бұрын
Re. pair, what are you comparing? Separate correlated arrays for keys and values (e.g. 'KeyType keys[N]; ValueType values[N];) vs an array of pairs?
@MrAbrazildo8 жыл бұрын
+EvanED I'm comparing: 1) int m; int n; 2) std::pair p; 3) class Pair { std::array a; public: inline const int &first () const {return a[0];} inline int &first () {return a[0];} The same for second, returning a[1]; }; The case 1 is more than 100% faster than the other ones. I also tryed to make a 4th case: packing a pair inside a size_t variable, which turned out to be more than 100% slower than case 1 too.
@Sopel9978 жыл бұрын
the thing is array is homogenous
@alexeiz8 жыл бұрын
Contrary to what Chandler says, bit packing is slow. You won't find it in use very often in the modern code. Bit packing makes sense only if you save substantial amount of data size (megabytes).
@MattGodbolt8 жыл бұрын
Do you have any qualification for that? On the machines I use (Haswell server class), bit packing is almost free, taking around 2/3rds of a CPU cycle to shift and mask.
@DeGuerre8 жыл бұрын
Bit packing almost always wins if it means your working set fits in cache, or for anything that is to be transmitted over a network.
@deadalnix8 жыл бұрын
Bit packing: 1/2 cycles. L1: 3-4 cycles If your data fit in L1, then bitpacking is probably not worth it. L1 is 32k. L2: 10-12 cycles If your data fit in L2, it's probably a good idea, but not sure you'll win a lot. L2 is 256k. If you manipulate anything bigger than this, bitpacking is DEFINITIVELY worth it. Most programs are here. Caveat: bitpacking makes your code bigger, so if you are icache bound - you have a large executable with fairly flat profile - then you may want to reconsider.
@SerBallister8 жыл бұрын
Depends on your usage case, if you are primarily setting/clearing bits then it will need more instructions compared to say, using an array of bytes, but if you are testing them, particularly multiple bits at the same time then you make savings.