El Chuchube
3:23
3 жыл бұрын
Пікірлер
@colonthree
@colonthree 7 күн бұрын
26:27 enum flags to the rescue? TwT)/
@Dygear
@Dygear 8 күн бұрын
I think memory packing was first introduced to me with The Lost Art of Structure Packing by Eric S. Raymond. This is feels like the follow up to that artifact. Fastest 46 minutes of my life. Such a good talk this flew by.
@mamertvonn
@mamertvonn 13 күн бұрын
It feels constrained to have an array foreach boolean/enum type, since those arrays have to be immutable to preserve the indexes(used as pointer in this context) integrity.
@khuntasaurus88
@khuntasaurus88 17 күн бұрын
the store booleans out of band example is kind of silly. 99% of the time you'd have >1 boolean for a real-world application. Lets say you have the same Monster, you'd want isHostile, isAlive, isEnraged etc. If you're memory constrained, you can pack up to 32/64 booleans in an integer as enum masks and you'd do monster.state & IsAliveEnum. Yes I get it's an example, but the multiple array option is a terrible example for literally any real-world scenario. Don't forget you're also introducing a mem copy if you have to move an object from one array to another. Are those arrays on the stack or the heap? Bad example.
@terryscott524
@terryscott524 19 күн бұрын
bro function calls are so wasteful. fuck it dog im just gonna inline every function in glibc
@ghostpeppered4524
@ghostpeppered4524 28 күн бұрын
"not optimized for memory, but for fitting on the slide" you mean for fitting into L1 Cache! We may/not reduce total memory allocated in RAM, but the primary value is we've indexed the data for fast search/iteration.
@3bdo3id
@3bdo3id Ай бұрын
24:20 As with my C++ background, this point of "tagged union" was blocking to me for continuing the talk but after a quick chat with chatgpt, I learned that Rust has a tag for each (safe, tagged) union and it uses this tag as a discriminant to indicate which type in the union is the active one, add the padding to that 1 byte and you got anothe 4 bytes added up to the 28 bytes. Edit: in Cpp, this is equivalent to a struct of an int and a union (and static manipulation functions).
@ChimiChanga1337
@ChimiChanga1337 Ай бұрын
Yes. tagged unions are called `enum` in Rust. You might enjoy this blog which dives deeper into memory layout and memory usage of Rust enums https ://alic.dev/blog/dense-enums
@3bdo3id
@3bdo3id 23 күн бұрын
@@ChimiChanga1337 I found time to read it today, and it is amazing how saving memory to the last bit forces us to lose a good cache-hit ratio. I think in my next code, I will go with the simple SoA with two arrays saving the padding of the tag only. Thanks for this recommendation.
@10Xengineering
@10Xengineering Ай бұрын
Where the regex lib for Zig?
@das_zol
@das_zol Ай бұрын
I want to know what tools I can use to inspect these structures. debugger? I'd be instantly 1000x better at solving these puzzles if I had visibility into what was actually happening.
@mattshu
@mattshu Ай бұрын
me: "ok just another talk in front of a room full of developers, no big deal, you got this" _starts the lecture saying "hard and long"_ social anxiety: sorry i'm late
@BlackCodeMath
@BlackCodeMath Ай бұрын
Learning so much from this talk.
@markykid8760
@markykid8760 2 ай бұрын
This video really helped me with all the naked humans with braces who are not bees.
@tristanboyle4450
@tristanboyle4450 2 ай бұрын
🥱🥱🥱
@grimvian
@grimvian 2 ай бұрын
Interesting video. Eskild Steenberg address lot of the topics in an 8 year old video - How I program C: kzbin.info/www/bejne/amWWhoGbfNd5pa8
@r_j_p_
@r_j_p_ 2 ай бұрын
Basically write your code in Fortran.
@eggmeister6641
@eggmeister6641 2 ай бұрын
Where can I access the slides from this talk?
@b1zzler
@b1zzler 3 ай бұрын
This is all great, but I can't help but think of how much more painful it would be to extend and refactor code like this without breaking everything.
@dreanwave
@dreanwave 3 ай бұрын
Throughout the whole talk I was thinking: What about bit fields?
@anonymousperson420
@anonymousperson420 3 ай бұрын
Worst thing about the internet, it makes me feel old. This guy started on the sega genesis? My goodness that came out ages after my first game.
@Izopropilen
@Izopropilen 3 ай бұрын
so after all this stand-ups Andrew Kelley sitting on stream talking about how Zig has so many bugs because its compiler is slow. DoD didn't help?
@wegvS
@wegvS 3 ай бұрын
I‘m a web developer and will probably never use any of this stuff but it’s super interesting!
@ZarviroffSerge
@ZarviroffSerge 3 ай бұрын
Brilliant!
@EmberMage8192
@EmberMage8192 3 ай бұрын
One thing I noticed about booleans is that in some cases people duplicate an implicit information with them. Which means such booleans can be replaced with a function. For example, with the monster struct, you can actually detect if a monster is alive by simply checking if HP > 0 (unless your game logic allows a monster be alive otherwise, of course). Sometimes booleans duplicate implicit state of a nullable field. For instance, I've seen a user account record with email and a flag called activated. But user account was only considered activated if it has an email, so you can just delete the boolean and use email != null check.
@masterchief1520
@masterchief1520 3 ай бұрын
My previous company has that lmao. Mobile verification and email. It was fucking integer field 🥸
@petevenuti7355
@petevenuti7355 3 ай бұрын
How do i get a system to treat a hard drive or flash drive AS main memory, fool it into thinking the storage device is RAM, then treat the ram as l2 cache? Can it be done with a driver or does it have to be done in the firmware?
@tmkcelll
@tmkcelll 3 ай бұрын
isn't this just swapping/paging? why would you want to do this because it will be insanely slow
@petevenuti7355
@petevenuti7355 3 ай бұрын
@@tmkcelll no it absolutely would not be swapping paging, I am trying to avoid paging all together and just use a block device directly AS RAM! I'm trying to both bypass motherboard limits on supported ram, and make programs believe I have more physical RAM then I would even have room to hold the page file for when swapping! Asking if it was the same thing makes me wonder if you watched the video? Sorry for the sarcasm but I am tired of people repeating that same response when im explicitly saying that's not what I want to do, it's very frustrating.
@tmkcelll
@tmkcelll 3 ай бұрын
@@petevenuti7355 if that's the case i think the only way to accomplish that is through modifying the operating system itself, not sure if l2 is even able to be changed (i doubt it since it's inbuilt into the cpu itself). the system that enables paging essentially treats it like extra memory in the address space, so not really sure what else you want to do other than that? you can just allocate a whole device as swap space and use it as memory in linux, so there's that, and you can probably limit useable memory to almost nothing to emulate what you want to achieve and yes i did watch the video in full
@petevenuti7355
@petevenuti7355 3 ай бұрын
@@tmkcelll yeah I'm not sure how the mmu on the motherboard and the memory management of the operating system work together or are capable of overriding each other. I was hoping somebody would know. I remember really old systems where you could put RAM on a card on the bus. As for the actual l2 cash you are most likely right as it's in the CPU chip, however if the operating system has anything to do with it, such that's not just microcode on the CPU that handles it, it would be cool if any real ram that was available could be used in that role.
@EvilTim1911
@EvilTim1911 4 ай бұрын
Man, this guy is pretty good at Zig, he should join the team developing the language
@emmanuelbyiringiro7207
@emmanuelbyiringiro7207 2 ай бұрын
He's creator of Zig hahahaha
@Multigor96
@Multigor96 2 ай бұрын
@@emmanuelbyiringiro7207 r/whoosh
@shadyabhi
@shadyabhi Ай бұрын
@@emmanuelbyiringiro7207 That was the joke here. :D
@propov1802
@propov1802 8 күн бұрын
If only he said that in the first 10 seconds.
@aftalavera
@aftalavera 5 күн бұрын
You have not hear anything! Watching him hailing satan. There is something wrong about this guy! kzbin.info/www/bejne/faeVemqnidqrbcksi=N4DL99SuBcSGOXQ1&t=752
@AhmadAli-kv2ho
@AhmadAli-kv2ho 4 ай бұрын
can someone correct me,im new to this: is he say ing that its 1+4+3+8+3+4+1 which adds up to 24? or is it not 1 and 3 for new lines?
@khatdubell
@khatdubell 4 ай бұрын
The most important take away from this is that he does't know what OOP is.
@maskettaman1488
@maskettaman1488 4 ай бұрын
As neat as some of these tricks and techniques are, I am absolutely rejecting your PR if you do pretty much anything talked about here. Code quality is wayyyy more important than technically saving a few bytes
@rusi6219
@rusi6219 4 ай бұрын
@@maskettaman1488 you mean dividing perfectly readable functions into 5 classes and 25 methods LOL
@maskettaman1488
@maskettaman1488 4 ай бұрын
@@rusi6219 No. I mean not jumping through hoops and writing some arcane code so that you can have a bool and an int share a byte. There are very very very few cases where that kind of optimization is necessary considering the quality of the code suffers greatly as a result. But also, yes, you should be breaking up functions in to smaller units. If you have 1 function that does 20 things... you should probably look at breaking it up and organizing things a bit.
@rusi6219
@rusi6219 4 ай бұрын
@@maskettaman1488 hahaha "quality of code" is about the last thing a customer is thinking about as he's staring at a frozen screen due to fifty destructors destructing other destructors
@maskettaman1488
@maskettaman1488 4 ай бұрын
@@rusi6219 I can only assume you're new to development in general if you think "fifty" destructors would even be perceptible to an end user. Have you never seen a stack trace from a modern web server or something? Also, destructors have literally nothing to do with cryptic bit tricks to save a few bytes on your data structures like the video is discussing You really seem to not know what you're talking about
@rusi6219
@rusi6219 4 ай бұрын
@@maskettaman1488 I can only assume you're new to development in general if you think "cryptic bit tricks to save a few bytes" has no effect on how well software runs at scale You really seem to not know what you're talking about
@onsearchfocus
@onsearchfocus 4 ай бұрын
Great talk. No fluff. Just gold!
@Ariccio123
@Ariccio123 5 ай бұрын
I'm not a zig programmer but I was very familiar with llvm for a while.... This is gold!!
@RobertGardnerEngineer
@RobertGardnerEngineer 5 ай бұрын
"I programmed my compiler like an ECS based video game and it dramatically improved performance" was not the talk I expected to listen to today. That is a cool take. It makes me wonder if there is value in a language where this is how it stores things and processes under the hood. Like, can you write a language where ECS is not *a* paradigm, it's *the* paradigm.
@pawello87
@pawello87 5 ай бұрын
I have doubts whether the monster example is relevant in the context of DOD. Let's say I want to update the position of all bees. I would have to iterate through the whole set and use an if condition to pull out only the instances I am interested in. That is still clogging the cache with data I am not interested in (all non-bee monsters). In the DOD approach this would be split into two (or more) separate systems, which would automatically eliminate the need for encoding/tagging.
@nicolas393
@nicolas393 5 ай бұрын
great talk
@Marco-sz6mq
@Marco-sz6mq 5 ай бұрын
Nice talk. I am just wondering if someone could suggest me any book that goes more in detail for hardware related stuff, maybe to better understand compilers and low level feature. For example, like the slide at 5:20
@ChimiChanga1337
@ChimiChanga1337 5 ай бұрын
Patterson and Henessy - comp arch Dragon book - compiler
@Marco-sz6mq
@Marco-sz6mq 5 ай бұрын
@@ChimiChanga1337thank you
@yapayzeka
@yapayzeka 5 ай бұрын
6:16 very important gotcha! "math is faster than every memory operation. maybe you should repeat the math instead of meoizing the result."
@tHaH4x0r
@tHaH4x0r Ай бұрын
Personally I don't really see this to be a useful gotcha in most cases, misleading at best and poor practice at worst. We have two ways of writing our function that performs an operation on variable X. First calculating the result, and later in the function doing the calculation again (as seems to be suggested in the talk). Secondly, we calculate the result of the operation, and 'buffer' this in Y, which we can later fetch to use in our function. In either case, when the function is ran at first the block of memory where X resides has a cache miss, and X is added to the cache. If we do not buffer, we later take the variable again out of the cache and perform our operations on it. However, if you would temporarily save a result of the operation into Y, Y would be located on the stack. The stack is almost always going to be cached, because stack variables will be accessed relatively often. Hence my conclusion that in nearly all situations they have similar access times as they are both going to be cached, the 'non-buffered' solution would just require additional math operations. At least, that is how I think it works. My own specialization is not deep in computer science and/or computer architecture, so if someone with more knowledge is able to chip in on this I would appreciate it.
@Sean-jg9sd
@Sean-jg9sd Ай бұрын
@@tHaH4x0rthis is correct, but really what he should have said in the talk is that it’s better to calculate non trivial expression on the fly than recall a stored value that may have already been evicted from the cache, or just never store it in the first place. This is the same thinking as not initializing something that’s used once because you can just pass it as a literal or an expression. That way you’re not paying for the extra memory operations. His point was just to say think about where you’re paying for memory operations, especially allocations.
@Reashu
@Reashu 11 күн бұрын
​​@@tHaH4x0ryes, there are caveats. Math is faster than memory if you already have the operands. If you're gonna have to look up x and y you might as well look up x + y.
@michaeln.8185
@michaeln.8185 5 ай бұрын
Great talk! I look forward to using some of these strategies. Open question: 18:21 In languages with compilers that support #pragma pack, is there any advantage to storing booleans out-of-band? Assuming non-sparsity of the boolean field would you still not incur cache line misses regardless of whether the bool is in the structure or an array outside, since the cpu is forced to load the same # of bytes?
@PassifloraCerulea
@PassifloraCerulea 4 ай бұрын
The whole point of storing struct fields at their natural alignment is so the CPU doesn't have to do extra work. Back in the 90s, top-of-the-line RISC CPUs might even crash your program or trigger some OS interrupt to handle unaligned access. I believe x86 tends to be more forgiving about this, but I'm not sure. It is likely that your compiler will go to great lengths to avoid unaligned memory access, using multiple smaller reads or writes instead. As always, looking at assembly output and benchmarking your particular program will be necessary.
@tacticalassaultanteater9678
@tacticalassaultanteater9678 5 ай бұрын
Right, I guess my intuition about performance can go out the window now, because all of those steps sound incredibly slow
@zxuiji
@zxuiji 5 ай бұрын
3:09 nice :)
@tordjarv3802
@tordjarv3802 5 ай бұрын
Damn, I should learn Zig
@yash1152
@yash1152 5 ай бұрын
30:10 the plug i was looking for
@yash1152
@yash1152 5 ай бұрын
20:27 the speaker image covered the content. and i dont understand this example too either.
@yash1152
@yash1152 5 ай бұрын
20:52 ohhhw, its about language feature where the lang allows to create array of each member ef struct.
@torgnyandersson403
@torgnyandersson403 5 ай бұрын
About the lexer and retokenizing. That's unnecessary work for strings and other tokens where the length isn't the same for all tokens of that type. Instead, just have one token for "begin string literal" and one for "end string literal". That's what I've done in my own projects. You get tiny token struct and no retokenizing.
@sadhlife
@sadhlife 6 ай бұрын
16:03 zig has them now
@Fedor_Dokuchaev_Color
@Fedor_Dokuchaev_Color 6 ай бұрын
Bit fields? You don't have to use multiple arrays and enums which make the code less readable. struct Monster { unsigned int life_points: 63 unsigned int is_alive: 1 }; That's it. You can pack a lot of data if you know what range of values is maximum/minimum in each case.
@ratakor
@ratakor 3 ай бұрын
Or in this case just check if life_points is lower than 0, but this is a very simple example. For the zig compiler I think that Andrew's strategy is good.
@minastaros
@minastaros 6 ай бұрын
Hello, bitfields?
@YmanYoutube
@YmanYoutube 6 ай бұрын
Why the fuck is a talk about an abstract concept in CS have 67k views in 3 weeks, this field is GIGA oversaturated.
@rt1517
@rt1517 6 ай бұрын
In my opinion, this approach has an impact on code simplicity and maintainability. On one side you have a structure that contains what you are interested in. On the other side, well, it is more complicated. And I would even say more bug prone et harder to debug. Also, he focuses only on the size of the data, while the way you access the data is also very important. For example, it is supposed to be way faster to visit all the elements of an array than to visit all elements of a linked list. Because with the array, the memory access are very close to each other so you stay in the cache and the processor can even predict what memory address will be accessed next. With a linked list, you are accessing all over the place so most of the time the next item won't be in the cache. Don't get me wrong, it is very nice that Andrew is trying to make the compiler fast. And there are a lot of slow compiler out there. But I am not sure that many of us mortals need that level of optimization.
@Nors2Ka
@Nors2Ka 5 ай бұрын
I would say Andrew misunderstood the meaning of DoD and what he showcased here is straight up optimization - a step most mortals do not have to take as their code is 1000x slower for other reasons entirely, i.e. using pointers from a linked list to refer to individually heal allocated items. There is also this funny thing where he optimized the most irrelevant part of the compiler, LLVM will gobble 98% of the compile time regardless (unless Zig can use it's own fast backends for debug builds). But I do want to say that the way you refer to "readability" sounds like you don't know what you're talking about and the "readable" code you would spew out would be anything but actually readable.
@rt1517
@rt1517 5 ай бұрын
@@Nors2Ka For me, readability is about how easy it is to understand the code. There is an infinite way of implementing a feature in a program, and some of them are better than others according to some criteria, for example readability or performances. Regarding readability, I would judge it by the amount of time an effort an average developer will need to understand a peace of code. At 22:29, on the left, I instantly know and understand what is a Monster and what it is capable of from its struct definition. But on the right, If I look at the Monster struct, I don't know that it can held items. And if I find the held_items hashmap, I am not sure of the key of the hashmap. Is it the Monster index in the Monster list? Is the hashmap dedicated to the Monsters or is is shared with the NPCs or other possible item holders? Should I clean something in the hashmap when a Monster is deleted? For me, the left version seems a lot more easier to maintain and add features on (for example, looting).
@Average-Lizard
@Average-Lizard 5 ай бұрын
I feel the same way. I know in my typical projects (large web and console apps), the code being easily read and maintained is much more valuable than saving KB of memory or a fraction of the time of a single web request. I guess these kinds of optimizations are for specific use cases, but none the less it’s interesting to hear about and have in the tool belt.
@toby9999
@toby9999 3 ай бұрын
I think so. I've spent the last 20 years working on high-performance software where these techniques make the difference between being the best or lagging behind. For those who're developing web front ends or bloatware, or coding in java or Python, etc. not so much, if at all, in my opinion.
@TheOnlyFeanor
@TheOnlyFeanor 6 ай бұрын
The example at 20:00 doubles the number of memory accesses for each struct (one with the animation id, another one for the monster type), unless the reduction in footprint makes it all fit into a single cache line.
@jaredcramsie182
@jaredcramsie182 6 ай бұрын
I was going to comment this as well. Splitting the data like this will always require two cache-reads on the initial load *. If the out-of-band technique is applied to more data, then each one will incur an additional cache-read. When looking at an individual access, it is like it comes with a guaranteed cache-miss for each time you apply the technique. Data for arrays are stored contiguously in memory. In this example, if you expect to access several elements in a row, then much of the data will already be in cache - because the size of each struct independently is less than one cache line in size. If you expect to go through all the data in any order (and it is small enough to fit in cache), then this results in less cache-misses total, because there are simply less cache-lines to access. These benefits can artificially inflate the performance savings of the technique when people benchmark it by iterating over an entire array of data, or test it in a limited context that doesn't factor in the accumulation of external data in cache from other processes, if the data / procedure is not used often. * For completeness, there are some special cases where you don't have two cache-reads: One cache read: I assume this is what you were referring to in your comment. If you happen to be storing less than 64 bytes of total monster data, 7 or fewer monsters in this example (assuming the 'animation' data is cache-aligned to start with and the 'kind' data is not). Then you can fit 3 extra monsters into the cache-line, where it could only store 4 monsters in a cache-line previously. Three or four cache-reads: In this example his data is a divisor of the cache-line length, if it wasn't then there may be a chance of having to perform another cache-read for each access.
@Norfirio
@Norfirio 5 ай бұрын
This technique is very useful for instances where you do multiple operations on the Monster and you only need one (or a few) fields at a time. For example, in a game you will have an update function and a render function. If you iterate with the padded struct, you waste time going over the anim field in the update function and then waste time going over the kind field in the render function. You just have to be careful of your use case, as with any optimization methods.
@Oler-yx7xj
@Oler-yx7xj 6 ай бұрын
6:50 using light speed as a measure of how fast an operation executes is funny. It also makes you understand how fast modern computers are: having memory as fast as L1 cache 1 meter away from the CPU is literally impossible
@policyprogrammer
@policyprogrammer 6 ай бұрын
Avail yourself of bitfields. They're part of that standard and they can really save your bacon space-wise. In particular, I think they're better than encoding multiple Boolean values into an enum because of maintainability. Imagine you have these encoded enums, and you have to write a function like is_human, and it's working just fine but then someone adds a new variable and re-encodes all the enums so that you have to include some more, but you forgot to update the function. Another advantage of bitfields is you can take advantage of your knowledge of your actual use of the size of types. Like if you know you need a 24-bit, You can just allocate 24 bits, can you the next eight for something else.