I love the level of performance and versatility C++ provides. C is fine, but C++ provides a lot of useful additional features without sacrificing performance. Im not into full-blown OOP. I tend to use classes as "structs with functions" which I find conceptually helpful for controlling scope.
@mrkmed70506 күн бұрын
is radix sort good for sorting 500 numbers under 5500 operations ?
@BanedonDeWhite9 күн бұрын
3:38 sneeky!!! taking advantage of the language conditional
@BanedonDeWhite9 күн бұрын
knowing hardware helps make software ! alan kay cool to listen to and ken thompson n chuck moore n linus
@hanelyp110 күн бұрын
When I was going to school, home computers were only starting to have instruction caching like modern processors. 80486. And the cache miss penalty wasn't nearly as steep. A modern video card GPU, where branchless is only game for shader programming, was unheard of. Branchless programming wasn't anything the professors would have known.
@mscott5432111 күн бұрын
Example: int Smaller(int a, int b) {...} Stannis Baratheon: int Fewer(int a, int b) {...}
What bad advice? The concepts presented here are very good, and they have been used in high-performance software for decades. I've used them in my chess engine, as have others, and I'm using them in a compiler project I'm working on for a propriatory language. Roughly a 50% performance increase across a set of 20,000 unit tests just by eliminating 'if' statements. Perhaps your comment was sarcasm, but yes, there's plenty of bad advice, but the bad advice is all in the comments.
@lepidoptera93375 күн бұрын
@@toby9999 So you were doing it wrong in the first place. Sound of one hand clapping. ;-)
@farcry78714 күн бұрын
ily:3
@MAli-wu4rx14 күн бұрын
Excellent !
@dfields951115 күн бұрын
Gr8, always wanted to understand this in C++
@3DProgramming18 күн бұрын
on the other side, AoS should be better for cache coherence in some applications, if I am not wrong. For example for storing vertex data in graphics programming.
@peppidesu21 күн бұрын
branchless programming is also useful for cryptographic algorithms because side channels are a thing
@lepidoptera933713 күн бұрын
Only if you have full hardware access. If you have full hardware access, then side channels are the least of your worry.
@hodayfa000h22 күн бұрын
Watchig this at 1.5 cuz i already know this stuff But i watch for a refresher Also i loved that joke about "eating their ram" 😂
@thanatosor26 күн бұрын
MPLAB write program memory from the bottom -> top.
@era275527 күн бұрын
Your closing brace placement is insane
@saisharannagulapalli96829 күн бұрын
Thank you so much. Even after 9 years, I found it very useful for learning the concepts for my job.
@greywolf271Ай бұрын
Does anyone know if Creel is OK ?
@tamaskonkoly7037Ай бұрын
Hi Creel! I found a mistake in the ToUpperASM3 proc, where cmp bl, 'z' cmovg r8d, r9d cmp bl, 'a' cmovg r8d, r9d One of cmovg should be cmovl !
@woliveiraxsАй бұрын
A little late to the party, but just to throw it out there. NEVER and I mean NEVER write code like this before actually testing if you need performance improvement. Like many have said, more ofthen than not, this will backfire and you will gain nothing but code complexity, as compilers are becoming better and better at dealing with these scenarios. The best way to go about performance tunning is to do whatever you need to do first, and then and only then, if you NEED it to be faster, start looking into alternatives. This is because, specially with low level languages, it is likely you won't need any performance tunning right away, save for some specific cases where the benefits are clearly outlined and it is standard practice. Also, if branchless is ever used, document the function or write comments top of the statements. Your future self, and your team will thank you.
@toby99995 күн бұрын
You're stating the bleeding obvious, which is to not optimise too early. But what is too early? The later one leaves it the harder it becomes, and the easier becomes to introduce bugs. When developing high performance software, optimisation should be in the forefront of one's mind before writing any code. It should be part of the design process.
@paulkossikАй бұрын
I remember in retrospect that the main influence on my first code was whether I could read it. I was taught to design first in pseudo-code and my code looked very similar to my sketch. This video is about making your code branchless, but it displays pretty well how non-beginner code looks. The main influence on individual lines is scraping against the limits of the hardware.
@rasitsimsek9400Ай бұрын
X86 assembler syntax is odd compared to another CPUs like Motorola or ARM. The register names are strange like eax, edx ...
@logoleptАй бұрын
I see a portrait. Quite quaint.
@dincraft1716Ай бұрын
I've watched many of your videos about direct2 and 3 d. Could you please continue making tutorials?
@goodtimejoe1325Ай бұрын
Holly... Why is this comment section full of hypercompetent people like I just going through uni and reading some of these comments makes me really understand that we are standing on the shoulders of giants
@RobertShaneАй бұрын
Is it safe for t to be signed in the last example? When I try to do -(~t) in c# it says Operator '-' cannot be applied to operand of type 'ulong'
@ferb1131Ай бұрын
"I hate your video, I only care about which formula is faster for integers." - Linus Torvalds
@ruskiikoshkaАй бұрын
Weirdest place where I expect to hear from Al Di Meola. Great video.
@shaiksoofi3741Ай бұрын
Best
@petergriffin8086Ай бұрын
It's 3am and I'm watching x86 assembly slideshows from 11years ago
@unzaptable7 күн бұрын
Oh, count me in too. This guy is a legend. I hope he continues to make new videos!
@orychowawАй бұрын
Just a side note: In C++ all function names are mangled, regardless of them being class members or not. That enables function overloading.
@TheBypasserАй бұрын
a) In fact, a "branch" is not just a conditional jump, it is every occasion your code gets "derailed", e.g. your instruction address register (usually called a Program Counter, or PC) gets modified. An interrupt, a function call (with either a static or a dynamic address), a function return, an interrupt return, just a locked-in-place goto-style jump - those all cause your program counter to take any value instead of advancing a fixed number of bytes. b) The branch-less approach is a pretty scary thing as, first, it is very hardware-bound, say your code based on the fact a C bool is either 0 or 1 may cause a real mess of register setups on many RISC cores. Skipping one single mov instruction is usually not a big deal performance-wise tho. c) The uppercase example is suboptimal. You could do "bool cnd1 = (ch >= 'a'); bool cnd2 = (ch <= 'z'); ch ^= ((char)cnd1 & (char)cnd2) << 5; :P
@jcamargo2005Ай бұрын
I spent a lot of time in my studies learning how there was nothing better than O(nlog(n)). But then I learned about radix sort... Not based on comparisons, so this rule do not apply to it. What a cool algorithm. Sorting networks are also interesting for small lists
@MariusSchwendtmayerАй бұрын
In embedded systems I often inline boolean conditions as ints and use 2D matrix of function pointers so: true + true for the "x" index and true + false for the "y" translates into executing the code at index [2, 1]
@josephmanning2129Ай бұрын
10:17 could be improved with d[i] = d[i] - 32 * (d[i] >= 'a' && d[i] <= 'z');
@josephmanning2129Ай бұрын
Nevermind 😂
@ah-lx9xiАй бұрын
Interesting, have a question if you could kindly answer. If I was to compare two images example; cameraImage and storedImage, do I compare pixel against pixel to check if the images match? Or is there one command which compares all pixels in one C++ function or a conditional IF statement? Thanks.
@playdertyАй бұрын
Hello, you make such cool videos, this is exactly what I wanted to find. All with detailed explanations and all that. Although I look at it now and realize that you did this 13 years ago, yo, I was 5 years old then. Well, in general, in the current world of the fast-growing Internet, this still remains relevant. Love it, thank you
@yellowrose0910Ай бұрын
More assembly instructions do not necessarily mean longer runtime: different instructions take differing number of cycles.
@dbdejonge20812 ай бұрын
Bad decoding: char(letter) is enough 😂
@unzaptable2 ай бұрын
Where is the king?
@jcamargo20052 ай бұрын
Brilliant and refreshing explanation
@carelhaasbroek15752 ай бұрын
Yeah. Im way to stupid for this....
@medielectro2 ай бұрын
The smaller_branchless( ) does not have any branching instruction. But... its code will be larger and will take more space (memory) and more time to execute. 😂 This happens because although we do not see any "branching" in the instruction return( a*(a<b) + b*(b<=a) ) there are hidden branchings. Because (a<b) and (b<=a) are conditional instructions and must involve branching. I have practically examined this with microchip XC16 compiler and found that smaller_branchless() function take 7 extra memory locations compared to smaller() function.
@markmanning29212 ай бұрын
The problem with the PIC architecture is that if there are opcodes in the pipeline and you set a breakpoint the controller is going to potentially continue to execute opcodes AFTER the breakpoint has been hit. This is why with a pic you must inject two nops after the one you want to stop at. These are called the skid bucket because this is where execution will come to a screeching halt. The problem with this is if you are *required* tp fly what you test then you can not remove any of those breakpoint skid bucket opcodes.
@xyz-vrtgs2 ай бұрын
Slightly comment technically coring the things u multiply by a branch is slightly faster
@gigigigiotto16732 ай бұрын
Technically you are wrong when saying "this instruction does this in only 1 clock cycle" as when decoded by the cpu in micro ops, it might take more than 1 cycle to actually execute
@tornoutlaw2 ай бұрын
That smaller function irks me, returns a if a < b or b if b <= a. one of the two being smaller is not a tertium non datur, or do I have a brainfart?
@fergalhennessy7752 ай бұрын
the prefix sum part is SO COOL thank you for this great video mr creel
@astralfoxy17872 ай бұрын
Am i wrong, but at 15:20 you obsiosly just swap the adress of fish and dog, and did nothing with "vtables". And you call "dog" with "fish" name and vise versa
@Sokrates95002 ай бұрын
3:50 is this even branchless? The CPU can't do this arithmetic ahead of time so what's the point?
@Sokrates95002 ай бұрын
I guess the explanation at the start wasn't really clear, the "predicting" only decodes instructions ahead of time rather than actually execute them