std::simd: How to Express Inherent Parallelism Efficiently Via Data-parallel Types

std::simd: How to Express Inherent Parallelism Efficiently Via Data-parallel Types - Matthias Kretz

Рет қаралды 18,462

CppCon

Күн бұрын

Пікірлер: 18

@__hannibaalbarca__ Жыл бұрын

Std::SIMD finally.

@方浩宇-h1i Жыл бұрын

yeah...

@Roibarkan Жыл бұрын

28:17 [slides 31-32] I think “old school c developers” would define Pixel as a union of a single uint32_t and a struct with 4 uint8_t, and try to use this union as a way of simplifying the read-/writing code. Such approaches are undefined in c++ (break strict aliasing rules, I believe). I’m not sure if that C-style state of mind could guide us when designing how c++ should do it. Perhaps we should allow some std::simd for T’s that are aggregates of same-type “vectorizable” member-variables? Perhaps this is a generalization that can implicitly allow simd, mentioned in 48:22. Great talk, thanks Matthias !

@scion911 Жыл бұрын

I absolutely love this, for my current project/library this is absolutely a game changer for portability.

@redram4574 8 ай бұрын

very useful video

@dat_21 Жыл бұрын

It's a cool concept, but in practice that will mean even more spoon-feeding the compiler to get the code you want.

@eclipse4419 Жыл бұрын

Awesome!!

@PaulJurczak Жыл бұрын

@4:00 I'm curious why fake_modify/fake_read instead of passing initial x value as a parameter and returning the result.

@cranil Жыл бұрын

Because the compiler might simply remove the loop if you don’t use it later. And for modify first I think it’s to avoid the compiler pre computing the result at compile time.

@blacklion79 11 ай бұрын

Intel's left hand: push SIMD into all languages it could, including many mask defined operations. Intel's right hand: don't give us, simple people, AVX-512 for 10 years.

@Roibarkan Жыл бұрын

Great talk! It seems that exploiting ILP when using simd can be very beneficial. Will library/compiler vendors be allowed to “do it for us” - e.g. is the default size() of std::simd strictly mandated by the hardware, or will specific compiler/library vendors be allowed to choose larger size() (perhaps based on compiler flags) to exploit ILP? perhaps the ABI tag which was mentioned is able to support such desires.

@GeorgiyChipunov 3 ай бұрын

Cool

@Alexander_Sannikov Жыл бұрын

if you actually care about the performance of your data-parallel code, your PC has a special massively powerful hardware component that's specifically designed to maximize the throughput of this exact kind of task. it's called a GPU.

@MrHaggyy Жыл бұрын

Only view systems that have SIMD also have a graphics processor. And if they have one it`s only as much as you need for graphics. Servers, industrial machines, cars, home and kitchen devices etc. pp.

@ckjdinnj Жыл бұрын

Sending data to the gpu and reading back a result is also a pretty slow so for algorithms that utilize recursion or dynamic programming the gpu doesn’t make for a great resource.

@panjak323 6 ай бұрын

Ahmad's law. GPU processing is only ever worth it when the compute time greatly outweighs the serial time (in this case the atrocious pcie transfer times).

@okharev8114 Ай бұрын

Yeah but reading back and forth from the GPU isn t worth it unless you have the scale of data, for général purpose opération cpu are way more efficient