Very nice approach and well explained! I hope to see more like this :)
@valcron-10005 жыл бұрын
This is some top quality content
@evgeniygorbachov39704 жыл бұрын
My sincere thanks for sharing this. So interesting, watched it in one breath.
@ExecutionUnit6 жыл бұрын
Bloody dynamic_cast! Great video.
@dholmes2156 жыл бұрын
Note that dynamic_cast performance can actually vary wildly depending on implementation and circumstance. Both libstdc++ and libc++ have a configurable optimization that replaces the strcmp() with a simple pointer comparison, which apparently is turned off by libstdc++ by default for compatibility reasons, but on by default in libc++ last time I checked. This can improve its performance by an order of magnitude. Of course, it would still be way slower than what was demonstrated later in the video.
@MattGodbolt6 жыл бұрын
Thanks for the comment! I was unaware it could be changed: do you have a link to this?
@dholmes2156 жыл бұрын
I wrote a thing about it here once: dholmes215.github.io/c++/2017/06/30/fun-with-typeid.html The bit under "What's going on?" mentions -D__GXX_MERGED_TYPEINFO_NAMES. Note that I didn't really dig any deeper than that, so I honestly have no idea if fiddling with that is a good idea. The main lesson I learned was to avoid dynamic_cast when it wasn't exactly what I needed anyways.
@dholmes2156 жыл бұрын
I looked it up again and found this change: github.com/gcc-mirror/gcc/commit/62164d4c3690257a01326783bed77c15831d52b9 ... which apparently is where the libstdc++ developers decided always defaulting to off was the safest choice. That's the limit of my understanding, though.
@AndreasStenmark6 жыл бұрын
Nice video Matt! I spend my free time tuning things in a very similar manner, only in a different domain. Out of interest, have you compared your itoa() to Alexandrescu's by any chance at all?
@pranaykothapalli39805 жыл бұрын
Beautiful video. I'd love to see more like this, waiting for more videos from you :)
@Xtcent5 жыл бұрын
Love your compiler explorer, nice!
@parnmatt5 жыл бұрын
SOH (\x01 or ^A) is Start of Heading ... frankly they should have gone with US (\x1f or ^_) the unit separator
@hl2mukkel6 жыл бұрын
Damn I missed the videos you uploaded since spectre because youtube didn't notify me! Gonna turn on the notifications now! I'm very passionate about performance and efficiency so I love these videos! Thank you very much for them. By the way, have you tried Rust and if so, what do you think of it?
@jubbernaut5 жыл бұрын
Very interesting stuff Matt!
@DenMarket-uc1nq6 ай бұрын
Brilliant!
@kejith38535 жыл бұрын
Would it make sense to build a lookup table? So you could just pass quantity as key and you would get String back. Its a lost of wasted memory but maybe it could be a little faster?
@MattGodbolt5 жыл бұрын
My later examples use look up tables to do two digits at a time. If you're proposing a lookup table for every number, you'll need to dedicate about 20+GB of RAM which is quite a lot :). And at some point the cache misses are far more expensive than the small amount of processing needed to use a smaller table (that fits into L1)
@itssuperninja6 жыл бұрын
This is an amazing video. Thank you.
@afigegoznaet6 жыл бұрын
Special thanks for the link to the cool and exciting secret things
@MattGodbolt6 жыл бұрын
Which link? :)
@afigegoznaet6 жыл бұрын
godbolt.org obviously, I'm playing with the code right now, without having to retype everything
@MattGodbolt6 жыл бұрын
Oh I see! :) Hardly a secret hehe! You had me worried for a moment that I'd accidentally leaked something I shouldn't have...
@afigegoznaet6 жыл бұрын
Nah, it's a failed joke related to your DRW job ddescription in some places. You've got a fan here ;)
@MattGodbolt6 жыл бұрын
Haha, right, my blog description. I was just panicking, so your joke definitely worked!! Thanks!
@YouAreUnimportant3 жыл бұрын
shift left and shift right might be simple operations but they are very slow on some intel chips.
@MattGodbolt3 жыл бұрын
Can you say which? None of the chips I've looked at (From Conroe onward) take more than a cycle (source: uops.info/table.htmle.g. uops.info/html-instr/SHL_M16_0.html and others) for regular left and right shifts. Some involving the carry have a longer latency, but I've never seen a compiler emit those (mainly as they are so slow).
@utromvecherom6 жыл бұрын
How were the measurements performed? And in general, how do you measure a particular piece of code?
@MattGodbolt6 жыл бұрын
I think I described in the video: I ran the code a number of times and took the average. This is less than ideal, but gave a decent enough idea without having to get in the weeds of "how to measure performance", which is a whole other talk! I can't find the source now...I did have it around at one point and I'll post it here when I find it :)
@utromvecherom6 жыл бұрын
Do you mean you run a piece of code in a loop? Let me elaborate a bit: I've played around with Agner Fog's (www.agner.org/optimize/#testp) testsuite to measure pieces of code and results (measured in cycles, results can be converted to ns by dividing by 3.6 for my cpu) are quite random for different runs of a test program ( each run measures a piece of code N times). And for simple code like pow(sin(x),2)+pow(cos(2),2) I have stats like (avg=236, stddev=60), (380, 120), (450, 60), and that is after I throw out 14 min and 14 max values out of 128(=N) measurement in a single run. So having this I'm very excited if there is an approach to get stable cost estimation technique for small pieces of code. Running code in a loop reveals the true cost to some degree but there you appear to have less realistic scenario if compared to a case where a given piece of code is a part of bigger path and it is surrounded with other code that touches different locations in memory and uses registers and things like that.
@perryizgr86 жыл бұрын
I wrote the newOrder() function and I'm calling it with random data, but I don't want to profile the calling function and the random generation etc. So how do you instruct perf to record and report stats for only one function (newOrder)? Very informative presentation btw!
@MattGodbolt6 жыл бұрын
I didn't -- I just literally did `perf record` and `perf report`. I did edit out functions from the bottom of the list that weren't interesting (until the end slides when I show both the function in question and main())
@perryizgr86 жыл бұрын
Looks like my compiler is inlining newOrder() and maybe that's why it simply doesn't show up in perf's report. But I'm able to annotate main() and follow it till I reach where the call to newOrder() is supposed to be, but it is literally just a direct callq __sprintf_chk.
@MattGodbolt6 жыл бұрын
Awesome...that's pretty much what I saw. When I got to the sprintf() version slides (~11mins) you'll see there's no mention of newOrder in the profile, just the vfprintf/xsputn etc of the implementation of sprintf. So you're seeing the same :)