In the intro to this talk... I was thinking "is he going to put the shirt on or not?!" ... I wasn't disappointed.
@N1c0_79210 күн бұрын
Great talk, interesting and very understandable throughout
@Bvngee8 күн бұрын
Great talk! very engaging, enjoyed
@karlohlen932011 күн бұрын
Nice talk! Very well done.
@sadhlife11 күн бұрын
Where can I buy the shirt?
@CoolestPossibleName8 күн бұрын
Are you planning on doing a talk on dod too?
@SimGunther11 күн бұрын
There's CPU local performance, which these DOD talks discuss pretty well, and then there's inter-coprocessor and inter-computer performance. While DOD can help with data layout to make the most out of the cacheline, it can get people stuck in the weeds. It's best saved for when you know the problem requirements and the bare minimum data that must be returned at each step from the DB all the way down to the (co)processors and see whether it's too little data to make the best use of co processors that make bulk data processing super fast or if the loops make it so a more "OOP" layout makes sense or if SOA/AOSOAs are more practical for maintenance and runtime performance.
@morglod9 күн бұрын
do you have comparison on how much cache misses you have with SOA and without? because in my mind it should be more cache misses because usually you check tag and then go to data, so you first load tags to cache than load datas to cache, then load tag etc
@mlugg54999 күн бұрын
It's not a problem that the accesses jump back and forth between those regions, because the CPU has lots of cache. If you imagine a hypothetical CPU with two cache lines for data accesses, then one line could store a block of tags, the other a block of data; neither gets evicted until we move onto the next block of tags/data. Of course, in reality, CPUs have orders of magnitude more cache lines than this -- so there's no problem here at all, the chance of those 2 lines getting evicted while you're e.g. in a hot loop over all your instructions/nodes/etc is basically zero. So, data fragmentation of this form isn't really an issue; it only matters once your data is fragmented enough that you can't make full use of a cache line before it's evicted.
@morglod8 күн бұрын
@@mlugg5499 okay, thank you!
@morglod8 күн бұрын
@@mlugg5499 cant find anywhere info about "two L1" caches. Everywhere its like L3 has many L2 which has many L1 do you have any link about it? like something very detailed
@hemerythrin8 күн бұрын
@@morglod Comment doesn't say anything about L1 caches, so I assume there's some sort of misunderstanding. Caches are separated into "lines" (usually 64 bytes), where each line shares a single tag and is loaded/evicted together.
@morglod8 күн бұрын
@@hemerythrin okay So it actually has L1 misses a lot But L2 is ok Understood, thank you
@ukyoize11 күн бұрын
13:38 one could have just "flag" tag with data describing what kind of flag it is
@mlugg549910 күн бұрын
Do you mean collapse `plus` and `minus` into one tag which uses `data` to encode which "actual" tag it is? If so, yep, that's sometimes a helpful strategy! We don't do it for the AST, since we have enough tags from 1 byte anyway, so there's no point in it. However, for ZIR, one similar thing we do (which I didn't cover in the talk for simplicity / time reasons) is we have a tag called `extended`, which essentially sacrifices 2 bytes of our 8-byte payload for a second tag enum. The reason for this is that, broadly speaking, a small number of ZIR instructions are very frequently used, and a lot of instructions are very rarely used. So, for frequently used ones, it's helpful to have the whole 8 byte payload to minimize the need for `extra`; but for the rarely used ones (think instructions for things like inline asm), we can happily sacrifice a bit of memory.
@Bobbias9 күн бұрын
@@mlugg5499I was wondering about why you guys did that. Wasn't expecting to just stumble into the answer in a KZbin comment, but I'm happy I now know.