28:17 [slides 31-32] I think “old school c developers” would define Pixel as a union of a single uint32_t and a struct with 4 uint8_t, and try to use this union as a way of simplifying the read-/writing code. Such approaches are undefined in c++ (break strict aliasing rules, I believe). I’m not sure if that C-style state of mind could guide us when designing how c++ should do it. Perhaps we should allow some std::simd for T’s that are aggregates of same-type “vectorizable” member-variables? Perhaps this is a generalization that can implicitly allow simd, mentioned in 48:22. Great talk, thanks Matthias !
@scion911 Жыл бұрын
I absolutely love this, for my current project/library this is absolutely a game changer for portability.
@redram45748 ай бұрын
very useful video
@dat_21 Жыл бұрын
It's a cool concept, but in practice that will mean even more spoon-feeding the compiler to get the code you want.
@eclipse4419 Жыл бұрын
Awesome!!
@PaulJurczak Жыл бұрын
@4:00 I'm curious why fake_modify/fake_read instead of passing initial x value as a parameter and returning the result.
@cranil Жыл бұрын
Because the compiler might simply remove the loop if you don’t use it later. And for modify first I think it’s to avoid the compiler pre computing the result at compile time.
@blacklion7911 ай бұрын
Intel's left hand: push SIMD into all languages it could, including many mask defined operations. Intel's right hand: don't give us, simple people, AVX-512 for 10 years.
@Roibarkan Жыл бұрын
Great talk! It seems that exploiting ILP when using simd can be very beneficial. Will library/compiler vendors be allowed to “do it for us” - e.g. is the default size() of std::simd strictly mandated by the hardware, or will specific compiler/library vendors be allowed to choose larger size() (perhaps based on compiler flags) to exploit ILP? perhaps the ABI tag which was mentioned is able to support such desires.
@GeorgiyChipunov3 ай бұрын
Cool
@Alexander_Sannikov Жыл бұрын
if you actually care about the performance of your data-parallel code, your PC has a special massively powerful hardware component that's specifically designed to maximize the throughput of this exact kind of task. it's called a GPU.
@MrHaggyy Жыл бұрын
Only view systems that have SIMD also have a graphics processor. And if they have one it`s only as much as you need for graphics. Servers, industrial machines, cars, home and kitchen devices etc. pp.
@ckjdinnj Жыл бұрын
Sending data to the gpu and reading back a result is also a pretty slow so for algorithms that utilize recursion or dynamic programming the gpu doesn’t make for a great resource.
@panjak3236 ай бұрын
Ahmad's law. GPU processing is only ever worth it when the compute time greatly outweighs the serial time (in this case the atrocious pcie transfer times).
@okharev8114Ай бұрын
Yeah but reading back and forth from the GPU isn t worth it unless you have the scale of data, for général purpose opération cpu are way more efficient