Writing cache friendly C++ - Jonathan Müller

Writing cache friendly C++ - Jonathan Müller - Meeting C++ 2018

Рет қаралды 50,958

Күн бұрын

Пікірлер: 40

@sanderbos4243 2 жыл бұрын

7:43 I'm pretty sure that since each box represents 4 bytes/32 bits/an int, the example with a stride of 16 bytes (4 boxes or 4 ints) here should have its green boxes at 0, 4, 8 instead of 0, 3, 6, which results in a waste of 75%, instead of 71%.

@sanderbos4243 2 жыл бұрын

17:03 Shouldn't sizeof(Bad) == 24 have been the right answer, instead of what's in the presentation?: The largest type is uint64_t or 8 bytes, so field a == 8 bytes and fields c, d and e packed together fit in another 8 bytes, so 8 + 8 + 8 == 24?

@LewiLewi52 Жыл бұрын

The processor can only read N bytes on an address evenly divisble by N. Consider a struct with an int64 followed by an int32, the starting address of the int32 is divisble by 4 but following the int64 by int8 and then int32 would place the int32 on an address non divisble by N and thus in need of padding.

@sanderbos4243 Жыл бұрын

Thank you!!: struct { i64; i32 }: 8 / 4 = 2 struct { i64; i8; i32 }: 9 / 4 != integer, so pads the i8

@Thiago1337 2 жыл бұрын

18:09 I don't understand this part. Why is it 2 bits of information?

@neohashi3396 2 жыл бұрын

The enum has 4 states: a, b, c and d. In order to count to 4 in binary you need 2 bits: 00 01 10 11

@sanderbos4243 2 жыл бұрын

19:29 Pretty sure it's 62 bits of information again, so 61 bits + 1 bit. The 61 is because of the bool being padded to take up just as much space as the uint32_t*, which means the first _3_ lower bits of the "a" field will stay 0. This is just like his previous explanation of 62, but you just keep the padded bool in mind. EDIT: I'm definitely wrong, see miguel's reply to me.

@miguelveganzones5103 Жыл бұрын

what matters here is that it is pointing to a 32 bit type, with 4 byte alignment, thats how you loose 2 bytes of information. That there is a bool within the struct just adds more padding but is otherwise irrelevant for the pointer.

@TheJGAdams Жыл бұрын

Why is it 62bits though? If you're compiling x64 you should have a full 64bits. I can guess old games only supported 2GBs of ram because it's 31bits? But, they also patched it to use a full 32. What's going on here???

@sanderbos4243 Жыл бұрын

@@TheJGAdams The context here is that if you have an 8-byte pointer to a 4-byte data type (the std::uint32_t), we're assuming the 4-byte data type to be aligned with a 4-byte boundary. If the pointer only ever points to addresses that are on a 4-byte boundary, the address its two least significant bits are constant and predictable, and so those bits don't carry useful information. So uint64_t'd be 61 information. "...on ARM-based systems you cannot address a 32-bit word that is not aligned to a 4-byte boundary. Doing so will result in an access violation exception. On x86 you can access such non-aligned data, though the performance suffers a little since two words have to be fetched from memory instead of just one."

@TheJGAdams Жыл бұрын

@@sanderbos4243 I'm more confused now. I don't know what you're talking about, but I was asking about pointer themselves. They store address not the data it's pointing to. So, 8-byte pointer to a 4-byte data type? Not my question. The question is, why is pointer 62 bits? Why is the 2 bits constant and predictable? Old game used to only support 2GB and they can patch it to 4GB. E.g. 31bits is 2 billion. Also, can you explain alignment? I don't understand why it would take 2 words. You don't access memory by words you access by cache line. Also, word is CPU specific. it can be 32 or 64bits nowaday.

@sanderbos4243 Жыл бұрын

@@TheJGAdams The 8-byte pointer indeed has 64 bits worth of possible states, that's completely correct. But what the video's "information" metric represents is related to the field of information theory (you'll find better explanations of that if you look it up). The point being that if you have an array of 4-byte ints, you *know* that any address that points to one of those ints will be 4-byte aligned. Simply put, if you print the addresses of the ints they'll go +0x0, +0x4, +0x8, etc. So the "information" metric from this video takes the number of bits an 8-byte pointer can address (64), and subtracts 2 from it simply because those last 2 (least significant) bits will always be 0. So "information" says "Yeah yeah, of course those last two bits are zero for this address! You didn't need to tell me that, I can see that from the size of the thing I'm pointing at being 4 bytes (2 bits)! I only care about its offset!"

@Rhumage 6 жыл бұрын

19:00 I still don't understand where 62 comes from

@phonlolol5153 6 жыл бұрын

he assumes that the std::uint32_t type has the proper 4 byte alignment. this means, that the pointer, which points to a std::unit32_t, can only point to addresses like byte0, byte4, byte8, byte12,byte16 and so on. so the last 2 bits are essential always zero.

@Sebastian-lz5ue 5 жыл бұрын

6:17: "..the fast caches are also the slowest."

@CzipperzIncorporated 5 жыл бұрын

I think he meant to say smallest

@notmewooshme9916 5 ай бұрын

I think he meant fast caches are smallest..

@mapron1 6 жыл бұрын

19:10 - You say uint32_t* always must be 4-bytes aligned on x86, but is that exactly? I could have unaligned pointer to int32, don't I? Yes, it perf degradation, but it is possible?

@foonathan 6 жыл бұрын

Yes, technically you can have an unaligned pointer. Then that trick doesn't work anymore, correct.

@mapron1 6 жыл бұрын

Thanks for answer. And for awesome speech, too :)

@Carewolf 5 жыл бұрын

I think you can in x86 assembler, but I don't think it is valid C++. Or rather it works, but is undefined behavior and a sanitizer will complain loudly.

@llothar68 4 жыл бұрын

@@Carewolf No complain if you use pragma. Normally no complain at all, because you use low level cats to get unaligned access. I love low level bit fuckery.

@Carewolf 4 жыл бұрын

@@llothar68 You can also use a memcpy, a good compiler would optimize the copy away but leave valid code. In the case of x86 using unaligned memory reads

@kenilmehta4247 5 жыл бұрын

20:18 How is sizeof(Normal) equal to 8 bytes?

@RyanCahoon 5 жыл бұрын

He said earlier (18:10) that enums are 4 bytes on his machines, then 1 byte each for the bool and uint8, then 2 bytes of padding

@sanderbos4243 2 жыл бұрын

Amazing talk, thanks Jonathan! :-)

@simonhrabec9973 Жыл бұрын

Neverhood :)

@sanderbos4243 Жыл бұрын

@@simonhrabec9973 Taste

@konrad3688 6 жыл бұрын

Should i prefer using "sorted_set + vector" rather than std::map / unordered_map? Are there any benchmarks for this?

@foonathan 6 жыл бұрын

Here are benchmarks against boost::flat_map, which is similar: stackoverflow.com/a/25027750 You should prefer it to std::map, but obviously an O(1) hash table is better than an O(log n) search. std::unordered_map is still not ideal, however. There are better hash table implementations out there, see foonathan.net/meetingcpp2018.html for some links.