I think memory packing was first introduced to me with The Lost Art of Structure Packing by Eric S. Raymond. This is feels like the follow up to that artifact. Fastest 46 minutes of my life. Such a good talk this flew by.
@mamertvonn13 күн бұрын
It feels constrained to have an array foreach boolean/enum type, since those arrays have to be immutable to preserve the indexes(used as pointer in this context) integrity.
@khuntasaurus8817 күн бұрын
the store booleans out of band example is kind of silly. 99% of the time you'd have >1 boolean for a real-world application. Lets say you have the same Monster, you'd want isHostile, isAlive, isEnraged etc. If you're memory constrained, you can pack up to 32/64 booleans in an integer as enum masks and you'd do monster.state & IsAliveEnum. Yes I get it's an example, but the multiple array option is a terrible example for literally any real-world scenario. Don't forget you're also introducing a mem copy if you have to move an object from one array to another. Are those arrays on the stack or the heap? Bad example.
@terryscott52419 күн бұрын
bro function calls are so wasteful. fuck it dog im just gonna inline every function in glibc
@ghostpeppered452428 күн бұрын
"not optimized for memory, but for fitting on the slide" you mean for fitting into L1 Cache! We may/not reduce total memory allocated in RAM, but the primary value is we've indexed the data for fast search/iteration.
@3bdo3idАй бұрын
24:20 As with my C++ background, this point of "tagged union" was blocking to me for continuing the talk but after a quick chat with chatgpt, I learned that Rust has a tag for each (safe, tagged) union and it uses this tag as a discriminant to indicate which type in the union is the active one, add the padding to that 1 byte and you got anothe 4 bytes added up to the 28 bytes. Edit: in Cpp, this is equivalent to a struct of an int and a union (and static manipulation functions).
@ChimiChanga1337Ай бұрын
Yes. tagged unions are called `enum` in Rust. You might enjoy this blog which dives deeper into memory layout and memory usage of Rust enums https ://alic.dev/blog/dense-enums
@3bdo3id23 күн бұрын
@@ChimiChanga1337 I found time to read it today, and it is amazing how saving memory to the last bit forces us to lose a good cache-hit ratio. I think in my next code, I will go with the simple SoA with two arrays saving the padding of the tag only. Thanks for this recommendation.
@10XengineeringАй бұрын
Where the regex lib for Zig?
@das_zolАй бұрын
I want to know what tools I can use to inspect these structures. debugger? I'd be instantly 1000x better at solving these puzzles if I had visibility into what was actually happening.
@mattshuАй бұрын
me: "ok just another talk in front of a room full of developers, no big deal, you got this" _starts the lecture saying "hard and long"_ social anxiety: sorry i'm late
@BlackCodeMathАй бұрын
Learning so much from this talk.
@markykid87602 ай бұрын
This video really helped me with all the naked humans with braces who are not bees.
@tristanboyle44502 ай бұрын
🥱🥱🥱
@grimvian2 ай бұрын
Interesting video. Eskild Steenberg address lot of the topics in an 8 year old video - How I program C: kzbin.info/www/bejne/amWWhoGbfNd5pa8
@r_j_p_2 ай бұрын
Basically write your code in Fortran.
@eggmeister66412 ай бұрын
Where can I access the slides from this talk?
@b1zzler3 ай бұрын
This is all great, but I can't help but think of how much more painful it would be to extend and refactor code like this without breaking everything.
@dreanwave3 ай бұрын
Throughout the whole talk I was thinking: What about bit fields?
@anonymousperson4203 ай бұрын
Worst thing about the internet, it makes me feel old. This guy started on the sega genesis? My goodness that came out ages after my first game.
@Izopropilen3 ай бұрын
so after all this stand-ups Andrew Kelley sitting on stream talking about how Zig has so many bugs because its compiler is slow. DoD didn't help?
@wegvS3 ай бұрын
I‘m a web developer and will probably never use any of this stuff but it’s super interesting!
@ZarviroffSerge3 ай бұрын
Brilliant!
@EmberMage81923 ай бұрын
One thing I noticed about booleans is that in some cases people duplicate an implicit information with them. Which means such booleans can be replaced with a function. For example, with the monster struct, you can actually detect if a monster is alive by simply checking if HP > 0 (unless your game logic allows a monster be alive otherwise, of course). Sometimes booleans duplicate implicit state of a nullable field. For instance, I've seen a user account record with email and a flag called activated. But user account was only considered activated if it has an email, so you can just delete the boolean and use email != null check.
@masterchief15203 ай бұрын
My previous company has that lmao. Mobile verification and email. It was fucking integer field 🥸
@petevenuti73553 ай бұрын
How do i get a system to treat a hard drive or flash drive AS main memory, fool it into thinking the storage device is RAM, then treat the ram as l2 cache? Can it be done with a driver or does it have to be done in the firmware?
@tmkcelll3 ай бұрын
isn't this just swapping/paging? why would you want to do this because it will be insanely slow
@petevenuti73553 ай бұрын
@@tmkcelll no it absolutely would not be swapping paging, I am trying to avoid paging all together and just use a block device directly AS RAM! I'm trying to both bypass motherboard limits on supported ram, and make programs believe I have more physical RAM then I would even have room to hold the page file for when swapping! Asking if it was the same thing makes me wonder if you watched the video? Sorry for the sarcasm but I am tired of people repeating that same response when im explicitly saying that's not what I want to do, it's very frustrating.
@tmkcelll3 ай бұрын
@@petevenuti7355 if that's the case i think the only way to accomplish that is through modifying the operating system itself, not sure if l2 is even able to be changed (i doubt it since it's inbuilt into the cpu itself). the system that enables paging essentially treats it like extra memory in the address space, so not really sure what else you want to do other than that? you can just allocate a whole device as swap space and use it as memory in linux, so there's that, and you can probably limit useable memory to almost nothing to emulate what you want to achieve and yes i did watch the video in full
@petevenuti73553 ай бұрын
@@tmkcelll yeah I'm not sure how the mmu on the motherboard and the memory management of the operating system work together or are capable of overriding each other. I was hoping somebody would know. I remember really old systems where you could put RAM on a card on the bus. As for the actual l2 cash you are most likely right as it's in the CPU chip, however if the operating system has anything to do with it, such that's not just microcode on the CPU that handles it, it would be cool if any real ram that was available could be used in that role.
@EvilTim19114 ай бұрын
Man, this guy is pretty good at Zig, he should join the team developing the language
@emmanuelbyiringiro72072 ай бұрын
He's creator of Zig hahahaha
@Multigor962 ай бұрын
@@emmanuelbyiringiro7207 r/whoosh
@shadyabhiАй бұрын
@@emmanuelbyiringiro7207 That was the joke here. :D
@propov18028 күн бұрын
If only he said that in the first 10 seconds.
@aftalavera5 күн бұрын
You have not hear anything! Watching him hailing satan. There is something wrong about this guy! kzbin.info/www/bejne/faeVemqnidqrbcksi=N4DL99SuBcSGOXQ1&t=752
@AhmadAli-kv2ho4 ай бұрын
can someone correct me,im new to this: is he say ing that its 1+4+3+8+3+4+1 which adds up to 24? or is it not 1 and 3 for new lines?
@khatdubell4 ай бұрын
The most important take away from this is that he does't know what OOP is.
@maskettaman14884 ай бұрын
As neat as some of these tricks and techniques are, I am absolutely rejecting your PR if you do pretty much anything talked about here. Code quality is wayyyy more important than technically saving a few bytes
@rusi62194 ай бұрын
@@maskettaman1488 you mean dividing perfectly readable functions into 5 classes and 25 methods LOL
@maskettaman14884 ай бұрын
@@rusi6219 No. I mean not jumping through hoops and writing some arcane code so that you can have a bool and an int share a byte. There are very very very few cases where that kind of optimization is necessary considering the quality of the code suffers greatly as a result. But also, yes, you should be breaking up functions in to smaller units. If you have 1 function that does 20 things... you should probably look at breaking it up and organizing things a bit.
@rusi62194 ай бұрын
@@maskettaman1488 hahaha "quality of code" is about the last thing a customer is thinking about as he's staring at a frozen screen due to fifty destructors destructing other destructors
@maskettaman14884 ай бұрын
@@rusi6219 I can only assume you're new to development in general if you think "fifty" destructors would even be perceptible to an end user. Have you never seen a stack trace from a modern web server or something? Also, destructors have literally nothing to do with cryptic bit tricks to save a few bytes on your data structures like the video is discussing You really seem to not know what you're talking about
@rusi62194 ай бұрын
@@maskettaman1488 I can only assume you're new to development in general if you think "cryptic bit tricks to save a few bytes" has no effect on how well software runs at scale You really seem to not know what you're talking about
@onsearchfocus4 ай бұрын
Great talk. No fluff. Just gold!
@Ariccio1235 ай бұрын
I'm not a zig programmer but I was very familiar with llvm for a while.... This is gold!!
@RobertGardnerEngineer5 ай бұрын
"I programmed my compiler like an ECS based video game and it dramatically improved performance" was not the talk I expected to listen to today. That is a cool take. It makes me wonder if there is value in a language where this is how it stores things and processes under the hood. Like, can you write a language where ECS is not *a* paradigm, it's *the* paradigm.
@pawello875 ай бұрын
I have doubts whether the monster example is relevant in the context of DOD. Let's say I want to update the position of all bees. I would have to iterate through the whole set and use an if condition to pull out only the instances I am interested in. That is still clogging the cache with data I am not interested in (all non-bee monsters). In the DOD approach this would be split into two (or more) separate systems, which would automatically eliminate the need for encoding/tagging.
@nicolas3935 ай бұрын
great talk
@Marco-sz6mq5 ай бұрын
Nice talk. I am just wondering if someone could suggest me any book that goes more in detail for hardware related stuff, maybe to better understand compilers and low level feature. For example, like the slide at 5:20
@ChimiChanga13375 ай бұрын
Patterson and Henessy - comp arch Dragon book - compiler
@Marco-sz6mq5 ай бұрын
@@ChimiChanga1337thank you
@yapayzeka5 ай бұрын
6:16 very important gotcha! "math is faster than every memory operation. maybe you should repeat the math instead of meoizing the result."
@tHaH4x0rАй бұрын
Personally I don't really see this to be a useful gotcha in most cases, misleading at best and poor practice at worst. We have two ways of writing our function that performs an operation on variable X. First calculating the result, and later in the function doing the calculation again (as seems to be suggested in the talk). Secondly, we calculate the result of the operation, and 'buffer' this in Y, which we can later fetch to use in our function. In either case, when the function is ran at first the block of memory where X resides has a cache miss, and X is added to the cache. If we do not buffer, we later take the variable again out of the cache and perform our operations on it. However, if you would temporarily save a result of the operation into Y, Y would be located on the stack. The stack is almost always going to be cached, because stack variables will be accessed relatively often. Hence my conclusion that in nearly all situations they have similar access times as they are both going to be cached, the 'non-buffered' solution would just require additional math operations. At least, that is how I think it works. My own specialization is not deep in computer science and/or computer architecture, so if someone with more knowledge is able to chip in on this I would appreciate it.
@Sean-jg9sdАй бұрын
@@tHaH4x0rthis is correct, but really what he should have said in the talk is that it’s better to calculate non trivial expression on the fly than recall a stored value that may have already been evicted from the cache, or just never store it in the first place. This is the same thinking as not initializing something that’s used once because you can just pass it as a literal or an expression. That way you’re not paying for the extra memory operations. His point was just to say think about where you’re paying for memory operations, especially allocations.
@Reashu11 күн бұрын
@@tHaH4x0ryes, there are caveats. Math is faster than memory if you already have the operands. If you're gonna have to look up x and y you might as well look up x + y.
@michaeln.81855 ай бұрын
Great talk! I look forward to using some of these strategies. Open question: 18:21 In languages with compilers that support #pragma pack, is there any advantage to storing booleans out-of-band? Assuming non-sparsity of the boolean field would you still not incur cache line misses regardless of whether the bool is in the structure or an array outside, since the cpu is forced to load the same # of bytes?
@PassifloraCerulea4 ай бұрын
The whole point of storing struct fields at their natural alignment is so the CPU doesn't have to do extra work. Back in the 90s, top-of-the-line RISC CPUs might even crash your program or trigger some OS interrupt to handle unaligned access. I believe x86 tends to be more forgiving about this, but I'm not sure. It is likely that your compiler will go to great lengths to avoid unaligned memory access, using multiple smaller reads or writes instead. As always, looking at assembly output and benchmarking your particular program will be necessary.
@tacticalassaultanteater96785 ай бұрын
Right, I guess my intuition about performance can go out the window now, because all of those steps sound incredibly slow
@zxuiji5 ай бұрын
3:09 nice :)
@tordjarv38025 ай бұрын
Damn, I should learn Zig
@yash11525 ай бұрын
30:10 the plug i was looking for
@yash11525 ай бұрын
20:27 the speaker image covered the content. and i dont understand this example too either.
@yash11525 ай бұрын
20:52 ohhhw, its about language feature where the lang allows to create array of each member ef struct.
@torgnyandersson4035 ай бұрын
About the lexer and retokenizing. That's unnecessary work for strings and other tokens where the length isn't the same for all tokens of that type. Instead, just have one token for "begin string literal" and one for "end string literal". That's what I've done in my own projects. You get tiny token struct and no retokenizing.
@sadhlife6 ай бұрын
16:03 zig has them now
@Fedor_Dokuchaev_Color6 ай бұрын
Bit fields? You don't have to use multiple arrays and enums which make the code less readable. struct Monster { unsigned int life_points: 63 unsigned int is_alive: 1 }; That's it. You can pack a lot of data if you know what range of values is maximum/minimum in each case.
@ratakor3 ай бұрын
Or in this case just check if life_points is lower than 0, but this is a very simple example. For the zig compiler I think that Andrew's strategy is good.
@minastaros6 ай бұрын
Hello, bitfields?
@YmanYoutube6 ай бұрын
Why the fuck is a talk about an abstract concept in CS have 67k views in 3 weeks, this field is GIGA oversaturated.
@rt15176 ай бұрын
In my opinion, this approach has an impact on code simplicity and maintainability. On one side you have a structure that contains what you are interested in. On the other side, well, it is more complicated. And I would even say more bug prone et harder to debug. Also, he focuses only on the size of the data, while the way you access the data is also very important. For example, it is supposed to be way faster to visit all the elements of an array than to visit all elements of a linked list. Because with the array, the memory access are very close to each other so you stay in the cache and the processor can even predict what memory address will be accessed next. With a linked list, you are accessing all over the place so most of the time the next item won't be in the cache. Don't get me wrong, it is very nice that Andrew is trying to make the compiler fast. And there are a lot of slow compiler out there. But I am not sure that many of us mortals need that level of optimization.
@Nors2Ka5 ай бұрын
I would say Andrew misunderstood the meaning of DoD and what he showcased here is straight up optimization - a step most mortals do not have to take as their code is 1000x slower for other reasons entirely, i.e. using pointers from a linked list to refer to individually heal allocated items. There is also this funny thing where he optimized the most irrelevant part of the compiler, LLVM will gobble 98% of the compile time regardless (unless Zig can use it's own fast backends for debug builds). But I do want to say that the way you refer to "readability" sounds like you don't know what you're talking about and the "readable" code you would spew out would be anything but actually readable.
@rt15175 ай бұрын
@@Nors2Ka For me, readability is about how easy it is to understand the code. There is an infinite way of implementing a feature in a program, and some of them are better than others according to some criteria, for example readability or performances. Regarding readability, I would judge it by the amount of time an effort an average developer will need to understand a peace of code. At 22:29, on the left, I instantly know and understand what is a Monster and what it is capable of from its struct definition. But on the right, If I look at the Monster struct, I don't know that it can held items. And if I find the held_items hashmap, I am not sure of the key of the hashmap. Is it the Monster index in the Monster list? Is the hashmap dedicated to the Monsters or is is shared with the NPCs or other possible item holders? Should I clean something in the hashmap when a Monster is deleted? For me, the left version seems a lot more easier to maintain and add features on (for example, looting).
@Average-Lizard5 ай бұрын
I feel the same way. I know in my typical projects (large web and console apps), the code being easily read and maintained is much more valuable than saving KB of memory or a fraction of the time of a single web request. I guess these kinds of optimizations are for specific use cases, but none the less it’s interesting to hear about and have in the tool belt.
@toby99993 ай бұрын
I think so. I've spent the last 20 years working on high-performance software where these techniques make the difference between being the best or lagging behind. For those who're developing web front ends or bloatware, or coding in java or Python, etc. not so much, if at all, in my opinion.
@TheOnlyFeanor6 ай бұрын
The example at 20:00 doubles the number of memory accesses for each struct (one with the animation id, another one for the monster type), unless the reduction in footprint makes it all fit into a single cache line.
@jaredcramsie1826 ай бұрын
I was going to comment this as well. Splitting the data like this will always require two cache-reads on the initial load *. If the out-of-band technique is applied to more data, then each one will incur an additional cache-read. When looking at an individual access, it is like it comes with a guaranteed cache-miss for each time you apply the technique. Data for arrays are stored contiguously in memory. In this example, if you expect to access several elements in a row, then much of the data will already be in cache - because the size of each struct independently is less than one cache line in size. If you expect to go through all the data in any order (and it is small enough to fit in cache), then this results in less cache-misses total, because there are simply less cache-lines to access. These benefits can artificially inflate the performance savings of the technique when people benchmark it by iterating over an entire array of data, or test it in a limited context that doesn't factor in the accumulation of external data in cache from other processes, if the data / procedure is not used often. * For completeness, there are some special cases where you don't have two cache-reads: One cache read: I assume this is what you were referring to in your comment. If you happen to be storing less than 64 bytes of total monster data, 7 or fewer monsters in this example (assuming the 'animation' data is cache-aligned to start with and the 'kind' data is not). Then you can fit 3 extra monsters into the cache-line, where it could only store 4 monsters in a cache-line previously. Three or four cache-reads: In this example his data is a divisor of the cache-line length, if it wasn't then there may be a chance of having to perform another cache-read for each access.
@Norfirio5 ай бұрын
This technique is very useful for instances where you do multiple operations on the Monster and you only need one (or a few) fields at a time. For example, in a game you will have an update function and a render function. If you iterate with the padded struct, you waste time going over the anim field in the update function and then waste time going over the kind field in the render function. You just have to be careful of your use case, as with any optimization methods.
@Oler-yx7xj6 ай бұрын
6:50 using light speed as a measure of how fast an operation executes is funny. It also makes you understand how fast modern computers are: having memory as fast as L1 cache 1 meter away from the CPU is literally impossible
@policyprogrammer6 ай бұрын
Avail yourself of bitfields. They're part of that standard and they can really save your bacon space-wise. In particular, I think they're better than encoding multiple Boolean values into an enum because of maintainability. Imagine you have these encoded enums, and you have to write a function like is_human, and it's working just fine but then someone adds a new variable and re-encodes all the enums so that you have to include some more, but you forgot to update the function. Another advantage of bitfields is you can take advantage of your knowledge of your actual use of the size of types. Like if you know you need a 24-bit, You can just allocate 24 bits, can you the next eight for something else.