Writing cache friendly C++ - Jonathan Müller - Meeting C++ 2018

  Рет қаралды 48,150

Meeting Cpp

Meeting Cpp

5 жыл бұрын

Writing cache friendly C++
Jonathan Müller
Meeting C++ 2018
Slides: meetingcpp.com/mcpp/slides/

Пікірлер: 39
@sanderbos4243
@sanderbos4243 2 жыл бұрын
7:43 I'm pretty sure that since each box represents 4 bytes/32 bits/an int, the example with a stride of 16 bytes (4 boxes or 4 ints) here should have its green boxes at 0, 4, 8 instead of 0, 3, 6, which results in a waste of 75%, instead of 71%.
@Sebastian-lz5ue
@Sebastian-lz5ue 4 жыл бұрын
6:17: "..the fast caches are also the slowest."
@CzipperzIncorporated
@CzipperzIncorporated 4 жыл бұрын
I think he meant to say smallest
@mapron1
@mapron1 5 жыл бұрын
19:10 - You say uint32_t* always must be 4-bytes aligned on x86, but is that exactly? I could have unaligned pointer to int32, don't I? Yes, it perf degradation, but it is possible?
@foonathan
@foonathan 5 жыл бұрын
Yes, technically you can have an unaligned pointer. Then that trick doesn't work anymore, correct.
@mapron1
@mapron1 5 жыл бұрын
Thanks for answer. And for awesome speech, too :)
@Carewolf
@Carewolf 4 жыл бұрын
I think you can in x86 assembler, but I don't think it is valid C++. Or rather it works, but is undefined behavior and a sanitizer will complain loudly.
@llothar68
@llothar68 4 жыл бұрын
@@Carewolf No complain if you use pragma. Normally no complain at all, because you use low level cats to get unaligned access. I love low level bit fuckery.
@Carewolf
@Carewolf 4 жыл бұрын
@@llothar68 You can also use a memcpy, a good compiler would optimize the copy away but leave valid code. In the case of x86 using unaligned memory reads
@sanderbos4243
@sanderbos4243 2 жыл бұрын
Amazing talk, thanks Jonathan! :-)
@simonhrabec9973
@simonhrabec9973 11 ай бұрын
Neverhood :)
@sanderbos4243
@sanderbos4243 11 ай бұрын
@@simonhrabec9973 Taste
@sanderbos4243
@sanderbos4243 2 жыл бұрын
19:29 Pretty sure it's 62 bits of information again, so 61 bits + 1 bit. The 61 is because of the bool being padded to take up just as much space as the uint32_t*, which means the first _3_ lower bits of the "a" field will stay 0. This is just like his previous explanation of 62, but you just keep the padded bool in mind. EDIT: I'm definitely wrong, see miguel's reply to me.
@miguelveganzones5103
@miguelveganzones5103 9 ай бұрын
what matters here is that it is pointing to a 32 bit type, with 4 byte alignment, thats how you loose 2 bytes of information. That there is a bool within the struct just adds more padding but is otherwise irrelevant for the pointer.
@TheJGAdams
@TheJGAdams 8 ай бұрын
Why is it 62bits though? If you're compiling x64 you should have a full 64bits. I can guess old games only supported 2GBs of ram because it's 31bits? But, they also patched it to use a full 32. What's going on here???
@sanderbos4243
@sanderbos4243 8 ай бұрын
@@TheJGAdams The context here is that if you have an 8-byte pointer to a 4-byte data type (the std::uint32_t), we're assuming the 4-byte data type to be aligned with a 4-byte boundary. If the pointer only ever points to addresses that are on a 4-byte boundary, the address its two least significant bits are constant and predictable, and so those bits don't carry useful information. So uint64_t'd be 61 information. "...on ARM-based systems you cannot address a 32-bit word that is not aligned to a 4-byte boundary. Doing so will result in an access violation exception. On x86 you can access such non-aligned data, though the performance suffers a little since two words have to be fetched from memory instead of just one."
@TheJGAdams
@TheJGAdams 8 ай бұрын
​@@sanderbos4243 I'm more confused now. I don't know what you're talking about, but I was asking about pointer themselves. They store address not the data it's pointing to. So, 8-byte pointer to a 4-byte data type? Not my question. The question is, why is pointer 62 bits? Why is the 2 bits constant and predictable? Old game used to only support 2GB and they can patch it to 4GB. E.g. 31bits is 2 billion. Also, can you explain alignment? I don't understand why it would take 2 words. You don't access memory by words you access by cache line. Also, word is CPU specific. it can be 32 or 64bits nowaday.
@sanderbos4243
@sanderbos4243 8 ай бұрын
@@TheJGAdams The 8-byte pointer indeed has 64 bits worth of possible states, that's completely correct. But what the video's "information" metric represents is related to the field of information theory (you'll find better explanations of that if you look it up). The point being that if you have an array of 4-byte ints, you *know* that any address that points to one of those ints will be 4-byte aligned. Simply put, if you print the addresses of the ints they'll go +0x0, +0x4, +0x8, etc. So the "information" metric from this video takes the number of bits an 8-byte pointer can address (64), and subtracts 2 from it simply because those last 2 (least significant) bits will always be 0. So "information" says "Yeah yeah, of course those last two bits are zero for this address! You didn't need to tell me that, I can see that from the size of the thing I'm pointing at being 4 bytes (2 bits)! I only care about its offset!"
@kenilmehta4247
@kenilmehta4247 5 жыл бұрын
20:18 How is sizeof(Normal) equal to 8 bytes?
@RyanCahoon
@RyanCahoon 5 жыл бұрын
He said earlier (18:10) that enums are 4 bytes on his machines, then 1 byte each for the bool and uint8, then 2 bytes of padding
@Rhumage
@Rhumage 5 жыл бұрын
19:00 I still don't understand where 62 comes from
@phonlolol5153
@phonlolol5153 5 жыл бұрын
he assumes that the std::uint32_t type has the proper 4 byte alignment. this means, that the pointer, which points to a std::unit32_t, can only point to addresses like byte0, byte4, byte8, byte12,byte16 and so on. so the last 2 bits are essential always zero.
@sanderbos4243
@sanderbos4243 2 жыл бұрын
17:03 Shouldn't sizeof(Bad) == 24 have been the right answer, instead of what's in the presentation?: The largest type is uint64_t or 8 bytes, so field a == 8 bytes and fields c, d and e packed together fit in another 8 bytes, so 8 + 8 + 8 == 24?
@LewiLewi52
@LewiLewi52 Жыл бұрын
The processor can only read N bytes on an address evenly divisble by N. Consider a struct with an int64 followed by an int32, the starting address of the int32 is divisble by 4 but following the int64 by int8 and then int32 would place the int32 on an address non divisble by N and thus in need of padding.
@sanderbos4243
@sanderbos4243 Жыл бұрын
Thank you!!: struct { i64; i32 }: 8 / 4 = 2 struct { i64; i8; i32 }: 9 / 4 != integer, so pads the i8
@konrad3688
@konrad3688 5 жыл бұрын
Should i prefer using "sorted_set + vector" rather than std::map / unordered_map? Are there any benchmarks for this?
@foonathan
@foonathan 5 жыл бұрын
Here are benchmarks against boost::flat_map, which is similar: stackoverflow.com/a/25027750 You should prefer it to std::map, but obviously an O(1) hash table is better than an O(log n) search. std::unordered_map is still not ideal, however. There are better hash table implementations out there, see foonathan.net/meetingcpp2018.html for some links.
@Thiago1337
@Thiago1337 Жыл бұрын
18:09 I don't understand this part. Why is it 2 bits of information?
@neohashi3396
@neohashi3396 Жыл бұрын
The enum has 4 states: a, b, c and d. In order to count to 4 in binary you need 2 bits: 00 01 10 11
@antonios-m4291
@antonios-m4291 2 жыл бұрын
One of the more unclear cpp presentations; I must say.
@tal500
@tal500 3 ай бұрын
This one assumes a big background in memory performance
C++ Concepts and Ranges - Mateusz Pusz - Meeting C++ 2018
59:55
Meeting Cpp
Рет қаралды 4,2 М.
CPU design effects -  Jakub Beránek - Meeting C++ 2019
1:04:07
Meeting Cpp
Рет қаралды 11 М.
когда повзрослела // EVA mash
00:40
EVA mash
Рет қаралды 3,3 МЛН
THEY made a RAINBOW M&M 🤩😳 LeoNata family #shorts
00:49
LeoNata Family
Рет қаралды 3,8 МЛН
She ruined my dominos! 😭 Cool train tool helps me #gadget
00:40
Go Gizmo!
Рет қаралды 62 МЛН
MEU IRMÃO FICOU FAMOSO
00:52
Matheus Kriwat
Рет қаралды 39 МЛН
James Beilby (Banking Industry): An Algo Execution System in Rust
10:32
Understanding Recruitment
Рет қаралды 1,5 М.
code::dive conference 2014 - Scott Meyers: Cpu Caches and Why You Care
1:16:58
NOKIA Technology Center Wrocław
Рет қаралды 185 М.
Understanding Compiler Optimization
1:41:09
NWCPP - Northwest C++ Users Group
Рет қаралды 6 М.
Cache-Friendly Matrix Transpose
27:17
Tom Nurkkala
Рет қаралды 9 М.
Using C++20 three way comparison - Jonathan Müller - Meeting C++ 2019
1:01:40
Ep 077: Cache Write Policies, Flag Bits, and Split Caches
18:44
Intermation
Рет қаралды 10 М.
1$ vs 500$ ВИРТУАЛЬНАЯ РЕАЛЬНОСТЬ !
23:20
GoldenBurst
Рет қаралды 1,4 МЛН
Что не так с яблоком Apple? #apple #macbook
0:38
Не шарю!
Рет қаралды 211 М.
GamePad İle Bisiklet Yönetmek #shorts
0:26
Osman Kabadayı
Рет қаралды 67 М.
Игровой Комп с Авито за 4500р
1:00
ЖЕЛЕЗНЫЙ КОРОЛЬ
Рет қаралды 1,7 МЛН