I learned to program in the 80s when compilers stunk, and it was a piece of cake to beat them with hand coded assembly. As a result many projects were written in assembler to run on older and newer hardware. The advent of efficient compilers was a godsend, and for work I was glad to see it sidelined. But for fun I still code in assembly because building high level features like lambda functions or garbage collectors from the ground up teaches you a great deal.
@sallylauper82223 жыл бұрын
Yeah, I thought it was really inarestin that he said that today to write faster assembly you have to know all the tricks of the compilers.
@SunMasterXIV3 жыл бұрын
I used Lattice C (and 68k assembly) on the Amiga in the 80s, and I thought it was pretty good. But the way modern compilers are able to optimize the code is sometimes amazing. It doesn't help tailormake assembly that so many x64 CPUs variations are available, where instructions execution time vary.
@AURORAFIELDS3 жыл бұрын
68000 is a good example of why C compilers are not good for everything. A lot of the efficient code relies on passing arguments via registers, while C relies on stack frames. Memory access on the 68000 is really slow, so automatically C will be slow too.
@mheermance3 жыл бұрын
@@AURORAFIELDS true, but many C compilers implement fast call linkage. They pass by registers and the called function saves on the stack if it calls another function.
@Ehal2563 жыл бұрын
@@mheermance finding a compiler that does that for the 68k nowadays however, is quite difficult. GCC doesn't, and while llvm recently added support, I doubt it does either. Maybe something from the 80s, but I'd rather code things by hand when performance is really important.
@randyscorner94343 жыл бұрын
With current compiler technology there is one area where the move to assembly provides massive advantages. That is when you can vectorize the code to fully use the SSE and MMX extensions. For one routine, unrolling the loop 1 time fit the register set, allowed 8 wide vector calculations and increased the overall performance of a high end electronic piano by 12X. This was sufficient to move the program off a new Mac to a RPI3. The load went from 40% of the CPU on the Mac to 9% of the CPU on the RPI3 with just one thread. Getting to this point with a high level programming language requires a different compiler and coupling that to C or C++ is much harder than doing the 60 assembly instructions by hand. It's all about how badly one or two routines dominate the runtime. It's often the case that these "hotspots" can get extra love and show major performance improvement. Of course, the best optimization would be to stop using Python as production code.....
@thomasmaughan47983 жыл бұрын
"Of course, the best optimization would be to stop using Python as production code" LOL 🙂
@FM-tq2gs Жыл бұрын
Newbie question: why can't compilers do that kind of optimization? Will they be able to one day?
@Mr8lacklp Жыл бұрын
@@FM-tq2gs they will be able to do it sometimes in the future but there are really two problems here: One is that the compiler can only do an optimization if it can prove that it won't change the behavior of the program for any value it might possibly see and it simply doesn't have all the information as all it sees is the source code. You might for example have a number that represents the day of the week so *you* know it's never going to be greater than seven but the compiler can't know that so it can't apply any optimizations that assume that the number won't be greater than seven. So there are some optimization you can do that are literally impossible to do for a compiler no matter how advanced. The other problem is that both finding an optimization and proving that it doesn't change the behavior of the code are very difficult and not generally things computers can do at all. And this is where compilers are steadily getting better but it's very possible that there are some optimizations that will just never be worth the longer compile times or the effort of implementing them.
@FM-tq2gs Жыл бұрын
@@Mr8lacklp thank you for the explanation!
@robegatt Жыл бұрын
@@Mr8lacklpyeah, that is why some programming language are better than others... a Pascal compiler could easily do what you said in the first example.
@ChiliTomatoNoodle4 жыл бұрын
Really good information quality and density here. This guy knows his stuff.
@WhatsACreel4 жыл бұрын
Means a lot brus! You are a legend, Chili :)
@classicnosh3 жыл бұрын
@@WhatsACreel - He's not wrong. I learned Pascal and C wasn't really taught in my school since Pascal was considered "academic". Assembly was also easier in those days since the microcomputers were much smaller and it was possible to really understand the memory map. Nowadays, the philosophy is very different. The rule of thumb is, don't try to outsmart the compiler. ;)
@tootaashraf13 жыл бұрын
The c++ guy
@Andoxico2 жыл бұрын
ayy it's papa Chili
@spacewolfjr4 жыл бұрын
I work in CyberSecurity and end up using assembly a lot when reverse engineering / disassembling malware, it's an essential skill for that kind of work
@shanehebert3963 жыл бұрын
Well... you have to since I doubt the malware writers are going to give you the source and all you have is the executable ;)
@tappineapple33813 жыл бұрын
Did you go to college? If so what did you major in? I am currently a junior in high school and I would like to further learn about reverse engineering and getting better with stuff like IDA and reclass. Any advice?
@y2ksw13 жыл бұрын
Agreed.
@y2ksw13 жыл бұрын
@@tappineapple3381 I suggest to disassemble Viruses. Most of them are brilliant examples of engineering and most of them are made by true masters of art. The next step I suggest, is to make your own operating system. If you master this step, you will have no problem to solve all other problems you may come across.
@tappineapple33813 жыл бұрын
@@y2ksw1 Thank you!, I have been following the tutorials on guided hacking and I have very much enjoyed reversing video games and I feel like malware would be the next best step. Now, making an operating system scares me.
@wingunder3 жыл бұрын
"If you can help yourself, try not to write a virus." 😂😂😂 You should put this quote on a t-shirt. Your sense of humor is simply wicked 👍
@OpenGL4ever Жыл бұрын
I love that line. And the background to that is, if you can do that, you don't need to write a virus. You will also find a well-paid job without having to drift into the criminal corner to make a lot of money.
@craigmhall Жыл бұрын
I rarely write in assembly any more, but it's good to know for: -debugging release / optimized code -studying the generated assembly and finding ways to tweak the source code to generate better assembly -generally understanding how the machine works, what is expensive and what is not
Жыл бұрын
This! I personally write asm only as a hobby for microcontrollers, where cycle-level timing is sometimes required (the rest of the time C suffices), but I read it a lot more as disassembled code for the reasons you mentioned.
@lgrantcdg3 жыл бұрын
Excellent talk! In the 1970s at General Motors Research Labs, they ran an experiment with a PLI-based computer graphics system. They recoded a few high-usage routines in assembly language. The system got faster. Then they recoded them in PLI and the system got even faster. Then they recoded them in assembly language again, and it got faster still. It turned out that each time they recoded the routines, they improved the algorithm, and that made much more of a difference than which language they used.
@guillermoleon02163 жыл бұрын
First Assembly I ever learned was for the Z80 and I absolutely loved it! I don't use it at work but getting to know it taught me a lot about how computers work.
@kevinjensen30563 жыл бұрын
Been programming in assembly and C since '79. Assembly is still widely in my field of embedded programming, but I haven't needed to resort to it for years. The code density that an expert on the CPU can achieve in assembly is incredible. Still most of what you've said is correct for most complex CPUs, but some comments are a little inaccurate for embedded processors today. Most MCU core instructions are still atomic, but the problem of mutilthreaded read write race conditions still apply when the data size is less than the buss width. This sort of issue appears in most interview tests for embedded programmers. You really should do a lecture on race conditions at the sub instruction level (as you just did), the instruction level, at the thread level, the o/s level and even beyond. Liked your lecture on radix sort. Never tried that one before. Keep up the good work.
@mattias36684 жыл бұрын
There are some case were you want to use assembly for performance because the compiler will not choose the best instructions for your good. For example, if you are addition on bigints, you will probably with to use the addition with carry instruction, which the compiler probably will not be able to figure out that it can use. And there are probably a large number of very specialised instructions like this, I imagine for example that the compiler won't use the SHA or AES instructions. Not only are there different assembly languages for different architectures, you also have different dialects for different assemblers.
@WhatsACreel4 жыл бұрын
I absolutely agree!
@shanehebert3963 жыл бұрын
You would hope that if you are using a library that's implemented bigint or SHA/AES that the people who wrote the library used intrinsics to implement the library calls.
@mattias36683 жыл бұрын
@@shanehebert396 Actually, I wouldn't necessarily hope that. When I implemented addition for bigint, GCC didn't have a good intrinsic for doing add with carry (I don't know it it has now), the closest it had was addition with overflow detection, which it couldn't optimised, so inline assembly was necessary for good performance. So you want your bignum to use inline assembly in this case, and then just add a portable fallback for unknown architectures. In other situations, intrinsics may work just as well, but in these cases you still need a portable fallback, so the older reason to use intrinsics instead of inline assembly in these situations is that the intrinsics may be supported for multiple architectures, and hopefully most compilers will recognise them, but that's not necessarily they case, and it is more likely that they will recognise the inline assembly. Similarly, intrinsics for SHA/AES, if there even are any, are not portable.
@shanehebert3963 жыл бұрын
@@mattias3668 yeah, that's the beauty of conditional compilation ;) if the arch is detected, use the version of the library that uses intrinsics, if not, fall back to the library made from portable code. Then it's up to the library providers (or an interested 3rd party in the case of open source) to add to the project. But yes, you're also at the mercy of the compiler and how it generates code (gcc, in your case, with add with carry).
@andrewdunbar8283 жыл бұрын
Rotate instructions are also not accessible from your high level language. Endian-switching instructions used to be inaccessible too but various compiler + CPU combos I looked at a while ago could recognize most ways to do endian switching in C and produce the right ASM code... but not always!
@ricos14974 жыл бұрын
If I'm to take just one thing from this video its that I shouldn't write viruses. One virus, absolutely fine - or recommended perhaps - viruses, not. Great advice, thanks.
@WhatsACreel4 жыл бұрын
Hahaha :)
@clickrick3 жыл бұрын
I'm glad you got to the point that there are assembly languages for just about every processor and didn't allow people to assume that x86 is all there is. As someone who has written assembler on ICL 1900, IBM 360 & 370, DEC PDP 11, as well as microprocessors like the 6502 and Z80, I've become aware of just how different the fundamental architectures are, in particular addressing modes.
@Guztav13374 жыл бұрын
You should get more cushions/backdrop in the room, there is a bit of echo in the background.
@mrdouble3 жыл бұрын
Was thinking the same, looks like an expensive mic though :/
@swharden3 жыл бұрын
The condenser microphone is "too nice". It's picking-up every little echo in the room. A dynamic microphone or a basic gaming headset (microphone closer to the mouth) could be better options for this space. Edit: audio is good in later videos
@starpawsy3 жыл бұрын
Most successful assembly program I wrote was in 1992. I did a square root function using Newton's method, that was faster than what the compiler of the day provided in the maths library! In those days, the width of the floating point divide register was 80 bits. Dunno what it is today. This might not work today. As an aside, some people night say "only 80 bits"? Well, consider that 80 bits == 24 significant decimal digits. Consider that if you measure the diameter of the known universe to 24 significant figures, the last figure is less than the classical diameter of a hydrogen atom. Newton's method for calculating the square root of x. Start with a guess, call it a. Calculate b = x/a. Take the average c = (a/b)/2. That will be closer than either a or b. Use c as your next guess for a and iterate. Keep going until a & b vary only by 1 in the LSB. The challenge was making a really really good guess for a that works for all numbers. I hit on the idea of dividing the exponent by 2 (shift right by 1) , and zeroing all but the most significant bit of the mantissa. For negative exponents you do the opposite - double the value of the exponent. This actually worked really well.! Here's a worked example. Square root of 10 (well actually 10.000000000000000000000000000000) start with 3 10 / 3 = 3.33... 3 + 3.33...= 6.33... divide by 2 = 3.166... In one iteration, you've got 2 decimal places.
@draconite2 жыл бұрын
#1: This does depend on the architecture you're building for. Compiling for the 68000 with GCC, it's easy to beat the compiler if you know what you're doing
@OpenGL4ever Жыл бұрын
You've already made an assumption here, using a specific compiler. On the other hand, if you use a compiler that is optimized for the use of fast calls and 68k, then it can look different.
@brannonharris46423 жыл бұрын
Reductive learning. Discovering what something is not is seemingly more potent than only pondering on what that thing is. Love this video!
@herrbonk36353 жыл бұрын
2:34 _"That one clockcycle is called the latency"_ Not really, that one cycle is called _throughput_ in these contexts. The latency *for simple instructions* (like ALU reg,reg/im) usually equals the number of pipeline stages. In a simple pipelined CPU, that would be: fetch+decode+calculate+write result, i.e. 4 stages and so 4 clock cycles. For the 486, that was five stages and five cycles, for the P4 it was around 20 stages and cycles, and so on (again for simple instructions like ALU reg,reg/im).
@laurelsporter3 жыл бұрын
But, calculate can be repeated as nauseum, and as long as that can go on, write can be hidden. The full pipeline isn't executed fully for each instruction, before the next one executes.
@herrbonk36353 жыл бұрын
@@laurelsporter Yes, that's the basic idea with a "pipeline", i.e. having all the stages of the instruction execution fully overlappning, so that (different stages of) several instructions in a sequence can be processed at the same time. (Typically instruction fetch -> decode -> effective address calculation -> operand fetch -> ALU -> write-back.)
@TellowKrinkle2 жыл бұрын
Don't know how people talked about the 486, but on modern processors, when people talk about latency, they mean the number of cycles from when the register value is first needed to when it's available to the subsequent instruction. If your CPU has forwarding circuitry (like every modern processor), that's only the number of calculation stages. For the example of an `inc rax`, if you had four of those in a row, the cpu would fetch all four in parallel, decode them all in parallel, and calculate them serially, with each one forwarding its result to the next without waiting for writeback. In the end, four (dependent) `inc rax`s would run in four consecutive clock cycles, which is why `inc` is considered to have a latency of just 1 cycle, not 20 or however many a modern processor's pipeline has. The throughput of inc is not 1 but 1/4 for a skylake processor, meaning that the processor can execute four non-dependent inc's in one clock cycle.
@TerjeMathisen3 жыл бұрын
Congratulations Creel, you've managed to create a very informative set of videos on x86 asm, all stuff that I would have loved to have back in the days, starting in 1982 when I had to write interrupt drivers in hex. :-) PS. I went on to use asm on everything from video (DVD & BluRay) & audio codecs (ogg vorbis), crypto (AES competition), games (Quake) and I still write some really low-level code, usually using compiler intrinsics since Visual Studio doesn't allow inline asm anymore. :-(
@hell0kitje4 жыл бұрын
Glad to see you back, mate :) I started with your c++vids and now im discoveri g asm, keep posting more!
@WhatsACreel4 жыл бұрын
Thanks, will do!
@DownhillAllTheWay3 жыл бұрын
12:15 "Assembly language is the language of the hardware." Permit me to nit-pick. *_Machine language_* is the language of the hardware. Asm is a near-English representation of it. Many years ago, I had access to a Data General Nova computer (it was the back-up machine on a customer site). I knew how to swap modules, and I was OK at hardware maintenance (scopes, and that sort of stuff) but I didn't know anything about computers at the time. By reading the manual, I entered a 3 (in binary) into a memory address, and a 6 into another address using the front-panel switches, then I wrote an instruction in machine code to add them together - and it produced a 9 in the destination address - a thrill that I remember to this day. I learned the machine code pretty well on that machine, and wrote an assembler in binary code. I had been intending to write diagnostics on the machine, but I moved on before I did that, and never used my (rather strange) assembler. Well, I had never seen an assembler up to that point, so I didn't have much to go on.
@ancapftw91133 жыл бұрын
The best example I saw was a guy making a 6202 (I think) program by writing to a ram chip and feeding it into the processor. He showed what the assembly would look like, but had to program it in hex code.
@alberto30284 жыл бұрын
ASM is perfect for bootloaders and some parts of OS
@WhatsACreel4 жыл бұрын
It is indeed! UEFI changed the necessity a little, but certainly low level OS code is one of the most important use cases for ASM! Cheers for watching mate :)
@lewiscole51934 жыл бұрын
Assembly language gives complete control of the hardware to the programmer in a way that no HLL can, in no small part because assembly language is processor architecture specific, while an HLL is supposed to be processor architecture independent. So, it's not that "ASM is perfect for bootloaders and some parts of OS", it's that there is no other way to get there from here using an HLL.
@WhatsACreel4 жыл бұрын
@ozan o. I would love to :) Judging by the recent reviews of Apple’s new M1, I think maybe ARM will give x86 a very good shake very soon! We might be witnessing the beginnings of the fall of x86 in the laptop and desktop markets...? Unbelievable! Not sure when I can cover these things, but they’re certainly on my to-do list. Thanks for the suggestions, and cheers for watching :)
@lewiscole51934 жыл бұрын
@ozan o. OSs have to change over time to meet new hardware and/or user demands, or else they die off. Unix is no different and has evolved over time to be different than what it originally started out as. So in a very real sense, I suspect that Tony Hoare's famous saying, “I don't know what the language of the year 2000 will look like, but I know it will be called Fortran,” has applicability to OSs with "Linux"/"Unix" being substituted for "Fortran". And keep in mind that there already environments where "Linux"/"Unix" is not king ... real time environments such as can be found in cars where QNX, a proprietary message passing microkernel based OS (which can run on ARM based systems by the way), is already more common. Yet, thanks to the Posix standard and the QNX's people's interest in it, how, QNX offers a similar interface ("abstraction") to application programs so that their developers feel warm and fuzzy about it. I suspect the same thing will likewise happen with any OS that depends on C, including Fuschia.
@lewiscole51934 жыл бұрын
@ozan o. > As you know, processes never really > pause in posix, I don't know if it > was due to hardware restriction or > design error during constructing > of unix back then. I don't know what you mean by "processes never really pause in posix". Posix is an interface standard for OSs that just happens to look like the interface that Unix/Linux typically used to present. It's not an OS itself. An OS can be something other than Unix/Linux entirely under the hood and yet present a Posix compliant interface as is the case with QNX which is a proprietary message passing microkernel based OS that is Posix compliant as I indicated before. To the extent that Posix was supposed to look like Unix/Linux to the outside world (programmer), various interface calls such as a file read or write do block (pause) because that's what they in Unix/Linux historically did in The Good Old Days. That doesn't mean that an OS can't present natively use non-blocking interfaces internally which are look like they are blocking to the user. > there is also root privilege problem. Again, I don't know what you mean since Posix isn't an OS. > Plus Android turn into giant layers of burger. > I guess that's why google wanna leave Android. Android *IS* Linux by another name. Really. > if any other new os becomes complicated > and consist of many layers in the future, > it will be loop then they will be wandering > new solutions in the future:). Again, OSs change over time or they die. To the extent that everyone thinks that what they want done is the way thing should be, OS developers are likely to toss in lots of crap to satisfy different users. If you want a lean, mean OS for your specific machine(s)/application(s), feel free to write one yourself ... and spend forever doing it.
@SimGunther4 жыл бұрын
Gotos are NOT considered harmful Wormholes in the other hand are considered VERY harmful
@k7iq4 жыл бұрын
If one does not like "goto" then just rename it to jmp and then it's OK because it's what the compiler might output in assembly anyway ! 😁
@imperatoreTomas4 жыл бұрын
Goto is my favorite function
@programaths4 жыл бұрын
In BASIC, well, it was very present. I learned that on my own and was used to put GOTO everywhere as it was the way to skip code based on a value "ON x GOTO label1,label2,label3" (or line numbers!) Then I used GOTO also to recycle code (as in GOSUD). Very good for state machines too, even if I didn't know it had a name. Then I had to take visual basic courses at school and the teacher was pulling her hair reading my code...no FOR and IF, GOTO worked just fine. On top of that, I kept my habit of reusing code. I am not even sure I would be able to understand my own code as I totally forgot that habit. Still, have good memories of that because the teacher ended up saying she will not correct it anymore and just give points for it working as intended. ^^ At the same time, others had troubles to understand what a variable was and I had already implemented snake and Sokoban just for fun :-D (As devs, we find it to be very simple, but I taught a bit too and this is a huge hurdle!)
@LionKimbro4 жыл бұрын
Wormhole = en.wikipedia.org/wiki/COMEFROM
@roygalaasen4 жыл бұрын
@@programaths when I started out with computer classes back in 1991, we had to draw flowcharts before we were allowed to write a single line of code. Only one entry point, one exit point and no lines were allowed to cross, essentially banning goto entirely. Now my favourite programming language, Swift, is sometimes forcing you to use a label to tell which loop you want to BREAK out of, which is essentially a goto in disguise. My brain cringes but I have to get used to it lol Edit: to clarify. Break in all programming languages breaks out of neared LOOP. If you are in a switch .. case you will still break out of the nearest loop. In Swift you will break out of the switch case, still stuck in the loop unless you label the loop you want to break out of.
@Alex-op2kc3 жыл бұрын
Here's an alternative definition: An assembly language is a set of mnemonics and other language elements defined by an assembler that let you write symbolic statements that map to hardware instructions. Under that definition, there can be multiple assembly languages per architecture. For example, there are multiple assemblers for x86: MASM, NASM, YASM, and fasm. And each define a different, although very similar, assembly language.
@robertobokarev439 Жыл бұрын
Nasm has the finest "classical" syntax, while all you wanna do looking at masm is to go back to C. Can't tell anything about fasm and yasm, don't have enough experience
@johnyoungquist65404 жыл бұрын
Talking about assembly in general across different processors is fraught with trouble. I do embedded apps in 8051 assembly only. In fact I wrote the assembler. I can promise that C in the 8051 environment is at least 500% slower and also 500% bigger than assembly even for simple things that C should be good at. It is widely accepted that compilers use a tiny fraction of the instructions set and leave a lot behind. It is easy to point out that ordinary languages contain no information to help compilers use special instructions or constructs. The assembly programmer will recognize an AES algorithm and use the AES instructions a C compiler won't. In modern processors the compiler code generator could hold a significant advantage over the programmer with a detailed knowledge of architecture magic like pipelines, cores, caches, threads. I don't know they handle the moving target of the new processor of the week or tell what processor they will run on. One processors optimization is another's down fall. In contrast the assembly programmer wizard may better the C code speed by 100 times or more with devilish clever thinking and detailed knowledge of the whole instruction set. One thing that is universally overlooked is how assembly and high level applications are similar. Apps are typically constructed of functions tailored to do common things for that app. If you need 98 digits precision you'll be writing routines to handle that in any language. These modules are easy to define and test and spread among several programmers. We build bricks first then walls later. A function call is about the same complexity and work to implement in any language. Now all of a sudden apps in all languages are basically function calls and logically look about the same. Neither is more difficult than the other. The planning stage and logic can be nearly identical for any language.
@donjindra4 жыл бұрын
Exactly. People who don't regularly program in assembler have no idea how much faster assembler is than any high level language. Compiler optimization cannot compete with a programmer who knows the instruction set intimately and can tailor the use of those instructions for a particular task. A 10x improvement in speed is pretty normal. OTOH, a poor programmer is not going to benefit much from assembler code. You have to know what you're doing. The 8051 is a good example. That cpu is so weird a compiler can't deal with it efficiently. A compiler does better with something like ARM.
@SimonBuchanNz4 жыл бұрын
@@donjindra complier optimisation can definitely best any reasonable amount of effort for the majority of code, assuming you're not using the trivial C implementations that come with microcontrollers - inlining and avoiding pipeline stalls is drudge work that's better to let the computer handle, especially when your problem is getting something working or cleaning up a mess, not making something faster. Not always, there's always going to be some cases that confuse a compiler enough that it's easier for you to use assembly than to figure out how to mangle your code so the compiler does the right thing, but advanced instructions are available through intrinsics, and compilers will auto vectorize loops, and so on. The low hanging fruit is getting picked all the time.
@donjindra4 жыл бұрын
@@SimonBuchanNz I don't know why you think that. In fact, I don't even know what sort of code you have in mind. I don't advocate using assembler to add two register-width numbers.
@SimonBuchanNz4 жыл бұрын
@@donjindra sorry, could you clarify what I said that you have an issue with? I was taking about your statement that "a compiler can never compete with [an assembly] programmer": trivially true in that said assembly programmer could at worst use the same instructions, but not practically true. Not sure where you're getting adding numbers from, but if that's literally all you're doing, then actually yeah, you probably will beat a compiler. It's the 50kloc of "adding two numbers" that's not worth the absurd effort to keep optimized in assembly, and mixing and matching can (depending on your baseline) actually pessimize the code since the compiler can't inline now.
@donjindra4 жыл бұрын
@@SimonBuchanNz Concerning adding numbers I said the opposite of what you think I said. If the task is simple, such as adding two numbers, the compiler does just fine. There's no point in resorting to assembler. It's the complicated, time consuming tasks that benefit from assembler. Compiler optimization was done by assembly language programmers. But they optimize general cases. They aren't magicians. They can't predict all particular cases. Therefore they cannot optimize for all of them. I have no idea what you mean by the end of your comment.
@_mrgrak4 жыл бұрын
The best programming related content on youtube right now. Creel explains complex topics simply, truly a great teacher. Looking forward to the next video!
@roax2062 жыл бұрын
Though from my understanding, assembly is mostly just machine code but replacing the binary instruction IDs with short nicknames for the instruction. Technically any compiled "higher level" language will be converted into assembly at one point (unless the person who wrote the compiler is a masochist and memorized all the instruction ID numbers). The main point when assembly becomes quicker then simply relies on whether the problem is easier to express in assembly language rather than the HLL used and to what level you are willing to manually optimize the assembly code.
@ParagonX133 жыл бұрын
i'm a young person and i taught myself reverse engineering/assembly over the past several years (messing around with disassemblers and searching my questions on the internet) and actually enjoyed it way more than i thought i would... at first it was just a means to an end but i very quickly grew fascinated with it all. i have no idea what to do with this passion though other than hobby projects... :p
@OpenGL4ever Жыл бұрын
If you need a playground. Many open source audio and video codecs are already optimized for the x86 and ARM architectures, but this is not yet the case for the RISC-V architecture. So you could buy a single board computer (SBC) with a RISC-V CPU and then see what could be optimized there. You would need to learn RISC-V assembly though.
@Cubinator734 жыл бұрын
15:49 I think you got something wrong there. Obviously, assembly is needed in all sorts of things like programming compilers and optimizing low-level routines. The "misconception" that "assembly language is no longer needed due to optimizing compilers" expresses the fact that your average programmer doesn't need to write assembly himself because far more competent people already did it and made their optimized routines available in the optimizing compiler. I myself only ever used assembly to explore how CPUs work and how compilers optimize stuff, but I never NEEDED to write my own assembly code for my own projects.
@lewiscole51934 жыл бұрын
That's nice ... OTOH being a former OS maintainer/developer, I used assembly a lot, not just because most of the OS was also written in assembly (which it was), but because it gave me control over data/code placement that no available compiler did/could, which was especially important in the bootstrap code I was responsible for the care and feeding there of. And I suspect that's still true ... the hardware defines and uses data structures that I don't want/need a compiler guessing what sort of code should be generated for.
@WhatsACreel4 жыл бұрын
Yes, I do wish that the proper position of ASM was expressed more clearly in computer science education. I was taught to fear the language during my degree, encouraged to neglect it entirely. Maybe it’s different in other institutions? I do not disagree entirely with the sentiment. But I do think it is skewed a little too far away from ASM. I think learning ASM for OS development or to understand the CPU are excellent applications! Cheers for watching and commenting folks :)
@lewiscole51934 жыл бұрын
@@WhatsACreel I have no idea how ASM is being taught in schools these days, but back when I was a student -- just after the dinosaurs had been killed off by an asteroid -- there was no question that any non-impaired human could outdo a compiler in terms of generating fast/small code. The reason why you were supposed to use an HLL was because it increased programmer productivity. Studies had supposedly been done that showed that the average number of DEBUGGED lines of code that could be produced per programmer per day was about TEN (10) independent of programming language. And because each HLL statement typically turned into more ASM line, that meant that if you could use an HLL, you should because you could potentially get more done using an HLL than you could ASM especially in terms of code that was supposedly "portable" across platforms. There were also supposedly studies that showed a wide variation in programmer output as well and so YMMV, but familiarity with a particular language also had a lot to do with programmer productivity (I don't recall how much). The gist of this is that I usually write in ASM because that's what I'm most familiar with, and because I'm no longer getting paid for what I write, it's my choice. I can speak C if I have to, but I don't consider myself fluent and I simply don't see the need to spend time becoming more fluent in C when I can do what I want probably (?) faster in ASM. What bothers me is that people who seem to shy away from away from using ASM seem to think that there's something fundamentally different in how you generated ASM code versus an HLL thrown at a compiler. To me, though, that's not the case. When I occasionally do write HLL code, I do the exact same thing that I do when I write ASM code, the only difference being how far "down" I "refine" the code before I come to a valid HLL or ASM statement. I just don't understand what it is that makes people think there's something special when it comes to how to write ASM code versus HLL code. It makes me think that maybe too much time is spent teaching the structure of various HLLs and not enough on how to think and solve problems. Just my opinion ....
@WhatsACreel4 жыл бұрын
@@lewiscole5193 Ha! I know the feeling! I learned in the 90’s. Things have changed a lot since then. Especially Assembly language. It’s gone from maybe 100 instructions and 16 registers to massive SIMD register files and 3000 instructions! I certainly agree that programmer productivity and portability are very important. And the choice of language is a big part of that. Sometimes ASM is a good fit, and sometimes it is not. I do love how fast it can be, and how flexible. There’s some brain-melting, deep trickery that is natural to ASM, which is too low level to be practical in HLL’s. But for the most part, anything is pretty achievable in any language, and so it becomes a matter of choosing the best tool for the job. I couldn’t agree more! The problem with ASM is the perception of it. Folks shy away from it in a way that might not be warranted. It’s just a language, after all. IMHO, it’s a really fun and powerful language. I do love a good bit of HLL code too, but ASM will always hold a special place for me. If for nothing else, I made a video about ASM 10 years ago and put it up on KZbin, and have since built this little channel :)
@lewiscole51934 жыл бұрын
@@WhatsACreel Ten years? My how time goes by when you're having "fun".
@3Balala34 жыл бұрын
Great video, helps a lot understanding the assemly's place and purpose nowdays. Also great timing. Tomorrow I have an exam in assembly. We are programming on an emulated dos program. Really, really interesting... :D
@theDemong0d4 жыл бұрын
In my experience writing assembly (mostly to capitalize on AVX), yes the function call overhead is a huge performance hit, but you need to write your program in assembly anyways because when you switch to AVX intrinsics, you need to know what assembly you want the intrinsics to produce. Writing the function first in assembly makes it easy to translate into AVX intrinsics, and the intrinsics should allow you to write C++ that compiles almost exactly instruction-for-instruction identical to your handwritten assembly. Yeah, it's not quite as cool as your program running your handwritten x86, but it's the next best thing and with the call overhead eliminated, you can reap large performance boosts.
@emjizone Жыл бұрын
3:53 This "one instruction per cycle" might be true for the oldest machines, with no clever vectors and lookups and with a very limited set of instructions. This might explain why people believe it to be still true today. In that case you'd have to program most of usual math functions yourself (modulo, square root, etc…) and they would take several cycles anyways.
@CallousCoder Жыл бұрын
ARM 64 cpus actually have a couple of assembly dialects. You have your AARCH64 but also your Thumb instructions, which are a small instruction to save space.
@VTdarkangel2 жыл бұрын
I had to do some SPARC assembly programming when I was in school. The real advantage of it was when we had to do hardware interfaces. Those functions could have been done in C, but when I broke the object files down, I found out that the compiler was inserting a bunch extra commands that were completely unnecessary such as settings in the master register for settings that weren't being used. By doing the interfaces in assembly, I could bypass all of that.
@y2ksw13 жыл бұрын
I have been programming for a vast time of my life in Assembly, and the most challenging tasks were to write code in a way, to run in parallel in the separate pipelines (super scalar). The example you have given, would have been rewritten, eventually longer, in order to get the parallel mechanism working. One way would be: mov ebx, eax inc eax nop inc ebx So the first two run together, and the resting again. And we would gain at least 2 clock cycles. However: assembly made a lot of sense in the old days. Now, with multi-core multi-scalar processors and the brilliant optimisation of compilers, Assembly code died pretty much out. I still use it on special hardware though. I am eyeballing the Raspberry Pi Pico, for example 😊
@OpenGL4ever Жыл бұрын
inc eax mov ebx, eax Does the same job as your code and requires less RAM.
@y2ksw1 Жыл бұрын
@@OpenGL4ever It's not a question of memory, but to get part of this code running in a different pipeline and thus double up the speed.
@y2ksw1 Жыл бұрын
Your code would run 4 times slower
@OpenGL4ever Жыл бұрын
@@y2ksw1 Why should it? In my opinion it runs at the same speed. Your code might do mov ebx, eax inc eax in its own pipeline, but nop ; does nothing and inc ebx depends on the mov ebx, eax before.
@y2ksw1 Жыл бұрын
@@OpenGL4ever If you do first an operation on eax, and then use it to assign its value to another register, it stalls and waits to settle just that tiny bit which doesn't allow to move the code to the other pipeline. I have been timing these instructions very accurately and your assumption, while are technically correct, perform way less efficient. On time critical applications, such as real time graphics manipulation I was working for, the code alignment and sometimes illogical reordering of instructions, made the difference of fluent or staggering graphics. I got mainly the filter and render code prepared by graphics specialists and my task was it to speed it up. But also big number mathematics and operating system libraries. Most of them grew noticeable in size, but were of unmatched speed.
@BlackStarEOP3 жыл бұрын
8:10 "Race conditions are brilliant" :D (y) Thumbs up for that... Tracking down race conditions has been the most difficult part of my career as a software engineer. If you implement something using more than 1 thread, if you carefully think things through, there's not much you can do wrong. However... when suddenly one guy in your team says "yes I know how to improve the performance, just put this and this into its own thread" then you know you need to buckle up. You're in for one hell of a ride...
@jeffm27873 жыл бұрын
I was writing x86 before it was called x86. Did 6502, 6809, etc. as well. Stopped when the 486 came out.
@brorelien84474 жыл бұрын
14:43 I partially disagree with you on this point. Some processor like the 6502 has a little instruction set which can be easily learn (only around 56 instructions). I know an 8 bit CPU can't really be compared with a modern x64, but some embedded CPU still uses these simpler 8 bit instruction set. Otherwise I like the video.
@y2ksw13 жыл бұрын
Well, some 8 bit processors have a lot of instructions. Of course, if you group, then almost any processor has only a few: Add, subtract, multiply, divide, invert, move. That's about it. When I teach, I actually point out that most processors can only add and negate. They do it in a very efficient way though.
@NoNameAtAll23 жыл бұрын
risk v >_>
@stevem34323 жыл бұрын
I begun learning assembly at uni this semester and I actually enjoy it. Thanks for these videos.
@DigitalPhage3 жыл бұрын
"x86 Assembly Language Misconceptions" would be a more apt title, however a good video.
@TheBypasser3 жыл бұрын
Oh yeah, say Arduino compared to pure AVRASM is like a snail vs a ballistic missile (just like for the most of the RISC cores, HLL vs ASM that is).
@niclash3 жыл бұрын
Misconception; x64 Instruction Set is a typical one. The micro controllers are typically magnitudes easier to learn fully. And then there are the funky/academic outliers, like 1 OpCode Instruction Set. But the majority of Assembly Languages out there are dozens, maybe 100 and a bit, and not the thousands in the Intel/AMD world.
@mikefochtman71643 жыл бұрын
Good information. When we had some ASM instruction dependencies, we sometimes would look down a few lines and see if we could move some other instruction in between the dependent instructions. That meant we could space out the two dependent instructions to let the first one finish and give another ALU something to do while the first one crunched. Also worked on a different processor that had a special increment. Used in the OS interrupt handling, it had a couple of instructions that were non-interruptable so we could guarantee that the increment and sto would be atomic.
@spacewolfjr4 жыл бұрын
The legend returns! Thanks Mr. Creel.. man..
@programaths4 жыл бұрын
First year in school: Compute the volume of a cone...in assembly! Most student were blocked on the division!!! That's when the learn overflow AND underflow. I do not remember the in and out, but the division gives you a good ride if you didn't pay attention to the curriculum. Then that's when you are doing your work that you realize that registers can be split in different way, that there is a flag register too. At that time (15 years ago), there was "help PC" with nice explanations of all of this... Another difficulty of assembly is that it's "verbose". In higher language, "if" is identified as is. In assembly CMP+JNE,JEQ,JZ,JNZ,JNP. And even conditions with conjunctive or disjunctive becomes challenging. Another nicety was using the stack for local variables instead of trying to guess which register is safe to use ^^ It's a bit cloudy, because it's far away now. But that wasn't that easy! It's a gymnastic on its own! But overall, whatever is the language, programming is really complicated. It's all about solving problems and expressing the solution as code...And most of the time, the problem to be solved is also to be found!
@WhatsACreel4 жыл бұрын
So true! Cheers for watching :)
@Lantalia3 жыл бұрын
So, with regards to #1 inline assembly skips the function call overhead, the main reason to do it is to do it is to use instructions not yet supported by your compiler
@AngDavies3 жыл бұрын
Minor nit/clarification: while you definitely need to know assembly on a deep level to be able to code an optimising compiler- after all, it's a program that turns code in a given language into as efficient/fast machine code representation as possible. That doesn't mean you necessarily should write one in assembly itself- it wouldn't make faster code, only code, faster. The better option is often to write the compiler in the language that you intend to compile with. You spend loads of time writing a compiler that can create really optimised code for a given platform, build it using some existing compiler, which doesn't make very optimised code, and so the compiled compiler takes ages to compile code. But now you've just created a program that turns your code in your language into optimised machine code, so just feed the original code through the new compiler, and you now have an optimised optimizing compiler :D Having just "GCC" that compiles to your machine is so much better than having to find a version of GCC tailored to your exact platform
@amigalemming Жыл бұрын
15:45 I am too lazy to plan register usage myself, thus I use LLVM to generate real assembly code for me. But I inspect the results regularly in order to find weaknesses in LLVM or my code.
@WolfCoder3 жыл бұрын
The only time I've written assembly was for the 6502 (because its fun), the Z80 clone in the Gameboy (because its fun and the only compiler I found was terrible and couldn't handle ROM paging well, etc.) and the ARM7 DTMI in the GBA where, while there's a port of gcc for it, you still have to write assembly for heavy duty subroutines like interrupts, audio engines, etc. as the compiler optimizations don't seem to work as well in the gcc port. For x86-64 though? Uh.. I think I'll let the compiler have the 'fun' when it comes to that.
@seneca9833 жыл бұрын
14:36 "The difficulty of assembly is the number of instructions." Is this part specific to x86?
@kelvinyonger88853 жыл бұрын
afaik this whole video is for in-vogue modern uarchs (x86/x64, ARM)
@seneca9833 жыл бұрын
@@kelvinyonger8885 Doesn't ARM have far fewer instructions than x86?
@0MoTheG Жыл бұрын
0:50 I disagree. Using inline asm does not take much overhead. In the simplest case it is just a mov to set up the register you want to change. A register to register mov does not take any clk cycles.
@RufianEmbozado Жыл бұрын
Assembly will always retain two strong points. First, when you learn to code in assembly you go through a rush of "illuminations" (I'm always thinking on 8 bit platforms because they are simple enough to have a grasp on all the landscape, and because I'm that old. Nothing is yet done, you push and pull all those pesky bits all over the place "by hand", a blazingly fast hand) that put a lot of pieces of the information science puzzle rigth into place. Second, there is an inherent beauty in assebly code. Motorola 68000 had a beatiful , beautiful assembler (I crashed on it with an Amiga 500 and, man, what a joy it was! All those fancy chips at your command... Most missed piece of hardware ever). I never got that feeling when I tried to code assembly on i386. I still think learning to write assembly for any CPU is worth the price. No need to do great things, just some humble tasks. You'll have the ride of your life (as a nerd, at least) and wont fall for those kind of misconceptions. Great video, of course. Assembly has the virtue to dispell all sorts of misconceptions. But assembly itself is covered by some key misconceptions which keep it from teaching all it can.
@pierce83084 жыл бұрын
Two dumb doubts: 1.) 2:15 , By "single clock cycle", do we mean the cycle for a pipeline stage? Cuz I recall that something like for nonpipelined processors, 1 clock cycle is meant to describe the execution time for one instruction, and the size of the cycle must be large enough to accomodate even the slowest instruction. So "1 clock cycle" is usually a term to describe one pipeline stage delay (of the slowest stage), since most processors are pipelined these days ? 2.) 6:00, Arent ASM instructions atomic in the sense that they *will* complete whole? As in ASM instructions are not atomic with respect to each other, but with respect to themselves, since two ASM instructions executions steps can be interleaved(like in the INC example you desrcibed), however a *single* ASM instruction will complete whole and not be incomplete, like for example: cant be interuppted by an incoming interrupt/trap. Just started learning about Comp Org and Arch, so pardon if the queries are too silly to ask. Thanks
@vikassm3 жыл бұрын
Fantastic video and channel! Subbed. My 2¢ about the poor audio: Use your mobile phone with a ~5$ lapel mic to capture your "B-Roll" audio 🙂 That way if your nice desktoo mic doesn't record for some reason, the backup audio from your cellphone is still wayyyyy better than the absolute garbage camera mic. Just clap once (Aaand ACTION) at the beginning and the end of each take to simplify A/V sync during editing.
@connclark21543 жыл бұрын
I think one thing that wasn't mentioned was assembly allows you flexibility that higher level languages do not. With this flexibility you can implement more efficient algorithms. For example in between assembly routines you can return more than one value from a function by using a custom calling convention. Its the ability to leverage the freedoms that gives assembly its power and performance.
@bigshrekhorner Жыл бұрын
That's not something exclusive to Assembly. C is able to do this by using pointers as function arguments. Even higher level languages are also able to do this by using tuples that mix types (or simply the same type), or with methods similarly to C, if they allow memory management concepts like pointers. Compilers and compiler engineers are extremely smart and definitely way smarter than me or you. That means that if you have thought of an efficient implementation of an algorithm in Assembly, it's also pretty likely the compiler engineers have also thought of it and implemented it. At least if we are talking about mainstream compilers, like GCC or Clang (for the case of C/C++)
@PaulaBean Жыл бұрын
When the rubber hits the road, you can always benchmark the speeds of your C++ code against assembly code. Measurement trumps speculation. Thanks for the nice video!
@danepane5273 жыл бұрын
The algo sent me here.. was watching a bunch of Coach McGuirk videos.. subbed!
@BrightBlueJim3 жыл бұрын
So to summarize a couple of things you said: 1) Functions written in assembly don't really run faster than compiled functions. 6) Assembly is still necessary for low-level optimization, where speed is really important. Also, your point on atomic operations applies just as directly to C and C++, or indeed for ANY program written to take advantage of multi-threading.
@SolarLantern424 Жыл бұрын
0:57 I assume you are talking about "inline assembly" or something? Machine Code has to be faster by definition. It is after all what the compiler is going to produce at the end of the day. So yes assembly is faster to execute, it has to be. That isn't a misconception .Whether the speed benefit of writing in assembly is worth it (will anyone notice for instance!) is another question.
@thadtheman37513 жыл бұрын
Actually part of the complexity of assembler comes from the fact that "decorations" of instructions are not uniform. To clarify I will make up an example (it's been a while so don't expect this to be a real world example ). You might have INC A,N. increase A by N. A might be a memory location and N a number (direct addressing) INC $A, N A might a memory location pointed to by a memory location (indirect addressing) INC [$A],N N might be a memory location INC A,$N ... THe thing is that some comands accept some of these addressing modes and other do not. A JMP forexample might exceprt all addressing modes, abut a JSR would not. So it get complicated keeping track of which instruction does what.
@porky11182 жыл бұрын
1:30 Is it really true, that calling assembly causes overhead? I think, it's possible to do inline assembly, which just replace a certain part of the function by assembly. At least that feature exists in Rust. I'm not sure if it's really inlined into the function, or if it's just called this way because you can just write it without creating a new file. But at least theoretically it should be possible.
@TellowKrinkle2 жыл бұрын
Yes, inline assembly will be inlined into the function. It's a good way to write smaller pieces of assembly without having to do entire loops in it, and without having to deal with various os' calling conventions.
@den2k8853 жыл бұрын
Compilers optimize very well... for general purpose code, without knowing its data layout. It's very difficult that a compiler will use SIMD instructions and in the rare cases it does it won't make use of the inner characteristics of your problem, as it has no knowledge of them. Using Assembler I managed to douvke a linear Sobel algorithm performaces and triple a segmented integral table algorithm's performances. Not even Intel compiler managed to equal those times.
@cthutu3 жыл бұрын
INC RAX won't execute in one clock cycle on a x86 because of fetching and decoding. However, pipelining can make it seem like it does.
@dcocz39083 жыл бұрын
I agree but there are lots of situations where the compiler simply fails for example gnuarm won't use multiple load and store properly which for me generated a lot larger code that wouldn't fit in SRAM so it had to run with wait states from flash on my project. By re-writing it in hand assembly allowed me to get a much smaller function, allowing it to be moved into SRAM with the data that was required by application and that is where I got a really large speed improvement. I couldn't have done it without swapping micro for larger memory footprint using just compiler
@PvblivsAelivs3 жыл бұрын
I have seen many people say that compilers do these wonderful tricks and that hand-coded assembly language is not (generally) faster than a compiler's output. While there may be some compilers that do this, no compiler I have actually used does so. "You might get the right result." Especially if you use the lovely little LOCK. Any processor that can feasibly be part of a multi-processor system needs a way of executing al least certain instructions without interference from other processors. "The CPU will perform the instruction a lot slower." It will if two processor units are trying to access the same memory at the same time. After all, one must stall. But the processor that "gets there first" has a negligible performance penalty. It was a two-cycle penalty on the 8086. (I only have timing information up to the 486.)
@thomasmaughan47983 жыл бұрын
There was a time when assembly was much faster than compiled but eventually the compiler optimizations produced code that executed efficiently. Depending on what one is doing, assembly is considerably smaller. A function in COBOL to parse a text file was 30 kilo-words and took 30 seconds to execute; I re-wrote it in assembly and it produced an executable that was only 3 kilo-words and parsed the same file in 3 seconds. 1/10th the size and ten times faster! But that extreme example is a result partly of COBOL not really a good choice for that sort of thing and my re-write also used static linking; everything it needed was already linked in the executable so at run time, no "fixups" were needed.
@derzweistein89734 жыл бұрын
Where do i learn "everything that [i need to lern] about a computer" to gain significant speed in assembly ? (especially the fun hardware stuff like ooo Execution, Loop Streaming, difrent Execution Engines)
@sambrown94943 жыл бұрын
Very interesting stuff, enjoying these videos. Hope you don't mind my asking - is that microphone actually turned on? It's a bit echoey like it's the camera microphone doing the recording across the room ..? Looking forward to more vids! Thx :)
@sambrown94943 жыл бұрын
Ha umm sorry! I commented and only then read the description. Already covered. Just so you know I was paying attention! ;) Rock on ...
@GogiRegion3 жыл бұрын
I’ve actually looked into virus programming, and commonly out of curiosity, and it looks like good hackers will use C and then compile to assembly for optimization, then assemble it. That’s assuming that you need high level functions in order to do what you need, you want it to take up as little space as possible so it’s harder to detect, and possibly want to remove null bytes (which is supposed to allow your code to work with a wider array of hacks since some rely on a lack of null bytes). It’s actually an interesting topic, and from what I was reading, it sounds like C is preferred over assembly for the same reason Linux is shown in primarily C.
@wrtlpfmpf3 жыл бұрын
One thing doing a project on a small assembler can really help is with coding style. I used to write multiple screen long functions with control structured nested several levels deep. Writing in assembler can really teach you how to write code that is as simple as possible, yet correct. I once did that for a little project on an ATMega. Those are cute little 8-Bit micro controllers. Since they have different addresses for RAM and Flash, programming them in assembler is a lot less painful than, for example, C. Anyhow that project really helped me write readable code when I later did C projects. I later played around with those microcontrollers in C and looking at the assembly created by the compiler I have to say that it's highly dense. (The rationale behind assembler was that I had more experience with AVR assembler and that that code would use the remaining flash program storage as data storage, something that is even harder to do in C)
@trashtrashisfree Жыл бұрын
I always wrote a good macro library for the assembly I was working in. System 360/370 didn't even have stacks so my first priority was writing things to push and pull values and create subroutines. Everyone else was hand-cutting every single line. Far more error free. Same for other issues in 6502.
@tchiwam3 жыл бұрын
Would be fun to see a video on transforming locked multithread to lockless thread with a thread manager and completely lock less multithread manager.
@BobDiaz1233 жыл бұрын
When I program the Microchip PICs in Assembly, the code is very simple. The RISC instructions are only 33 or 35 depending on the core used. The fun part is making the task work in the PIC's limited memory.
@GodzillaGoesGaga3 жыл бұрын
PIC's are glorified state machines. They have no stack!! At least the early ones I used!!
@gideonz74b2 жыл бұрын
@Creel: Executing an instruction in one cycle does *not* mean that the *latency* is one cycle. It means that the *throughput* is one instruction per cycle. The latency is always a lot more than that, because it has to pass through the pipeline.
@michaelbuerge3 жыл бұрын
Great stuff. Interesting and relevant info. Thanks. Allow me a remark about audio: You invested in a nice mic. Now you might want to think about the room you're recording in. Maybe put something absorbing in place to reduce room reverberation.
@SimonClarkstone3 жыл бұрын
4:58 I am amazed how slow division is. I thought that was 90s or 00s slowness, and that modern CPUs took 5 cycles or something.
@alienrenders3 жыл бұрын
To make things worse, there's usually only one ALU that can perform the instruction. If you don't use vector operations, it's downright dreadful performance for multiplies and division.
@MaximYudayev4 жыл бұрын
That's mostly applicable to general-purpose CISC, no? For example RISC, namely ARM, RISC-V (okay, not always), PIC and other embedded processors and DSPs, execute instructions in one clock cycle and seem to be the main targets for optimization in ASM where compilers are not smart enough to take advantage of all the ins and outs of the dedicated CPU.
@WhatsACreel4 жыл бұрын
Yes, I do recall the PIC is designed to run instructions at the same speed. Except for branching, maybe?? My memory is a little shaky there. ARM takes different times for instructions much like x86. I was definitely thinking mostly about x86 in this video, but most of it is applicable to other hardware. Cheers for watching mate :)
@nordgaren23582 жыл бұрын
Hey, Creel! Was curious what you mean by the overhead over jumping to assembly? Everything compiles down to assembly, so I am curious why the overhead? Thanks for the videos! Cheers!
@WhatsACreel2 жыл бұрын
Oh, just meant calling the function. Often the compiler will inline functions, but if you use ASM, then it can't. It's going to be the time to set up the stack, pass parameters, and the jump to the function itself. Hope this helps, and thanks for watching :)
@nordgaren23582 жыл бұрын
@@WhatsACreel Yea, that's what I was guessing, but I figured I'd confirm! Thanks again, Creel!
@RT55J3 жыл бұрын
The effectiveness of unrolling loops as a performance optimization can vary wildly depending on the caching situation. If your architecture has no cache to worry about, then it would give a definite performance boost. However, if you have an instruction cache to worry about, then (depending on the size of the unrolled loop vs the cache) you might suffer a performance decrease from the extra instruction fetching from RAM.
@KeinNiemand2 жыл бұрын
0:45 well perfectly optimised assembly is in theory faster then any high lavel language so it's sort of right. It's just that compilers these days are pretty good at optimising stuff so most manually written assembly is probably a lot slower then what a compiler would produce. This of course dosn't mean ithe compilers are perfect tough so it's probably still possible to write somthing in assembler that ends up faster then the C equivelent even with all the compiler optimisations
@lyingcat90223 жыл бұрын
Ever play with the new RISC-V instruction set?
@DanEllis3 жыл бұрын
I was a bit puzzled by the first one. You seemed to be suggesting that _calling_ code written in assembly language was slower (disregarding the actual execution time of the function). But of course that's not so. Regarding malware, it's sometimes necessary to write code in assembly language to strictly control what machine code is generated. For example, to ensure there are no zeros. Finally, "assembly language is etched into the chip". Not really, though. The ISA doesn't dictate the syntax of the assembly language. For example, x86 has two very different syntaxes (Intel and... the other one. AT&T?)
@rfvtgbzhn Жыл бұрын
From what I heated, you can get a significant performance boost in some cases by disassembling the compiled code and rewriting parts in Assembly language.
@kindpotato3 жыл бұрын
"race conditions are brilliant" This guy is awesome.
@emjizone Жыл бұрын
15:36 The issue being one line of C++ might end up in dozens of lines of ASM once compiled.
@jp5000able2 жыл бұрын
Back in the early 80's I did some 6502 assembly programming. What made it so difficult, the cpu was only 8 bits. There were no instructions for 16 bit numbers and floating point numbers.
@NomenNescio993 жыл бұрын
A long time ago in a galaxy far far away, before the time when gcc used the mmx instruction set to optimize vector arithmetic there was sometimes huuuge gains to be had from inlining some assembly code.
@Alex-op2kc3 жыл бұрын
Creel's back on his cubemaps!
@phinok.m.6283 жыл бұрын
14:37 Well ... it depends. Most microcontrollers for example don't have that many instructions. Some only have 20-30 instructions. And I think in that case, the difficulty is rather to write complex programs with so few basic instructions. If you ask me, Assembly is really easy to learn, but really difficult to program with. While higher level languages like C are more difficult to learn but easier to program with. If you had a language that only let you connect NAND gates to each other somehow. You could still write pretty much any program with that. But you would have to figure out how to compute the simplest things. And most importantly, how to do it efficiently. In my opinion, it's really quite easy to learn Assembly. But you sure as hell ain't out-optimizing a C compiler anytime soon.
@DukeDudeston3 жыл бұрын
"You can do a lot of stupid things in any language" I was able to delete ntfs.sys in a language called "DarkBASIC" when I first started out. So yes. You can do a lot of stupid things in languages.
@EvilSandwich3 жыл бұрын
I like to program for old systems like the Apple II and the NES, so I code a lot in 6502 ASM. Believe me, you start to miss high level after a while. You guys ever try Hello World when you have to explain to the computer how to read and print strings before it can even do that? Heck, the NES doesn't even have ANY internal ROM, so you have to draw the letters manually before you can even start on strings. lol
@DIYRepairHour3 жыл бұрын
Lock? Part from caching, depends what CPU and clocks configuration of this CPU to memotry controller are the penalty can vastly differ...
@michaelmoorrees35853 жыл бұрын
Still write assembly code for microcontrollers, such as the AVR and 8051 lines. Those are bit painful. Writing assembly on the old Motorola HC line was beautiful, in comparison. I have a hierarchy of pain. If the final binary is less than 4K, its gonna be assembly. If 16K or larger, it will mostly be high level (ie C), but some critical areas will still be in assembly. Optimized compilers are similar to autorouters, when laying out a PCB. They will screw things up. Often you have to go into the trenches, and do some manual labor.
@Misteribel2 жыл бұрын
18:01 misconception #10: assembly language is etched in the CPU. It is not. The machine instructions are. Assembly language is the human readable form of the machine code, and it maps (usually) one on one. If it was the same, assembly compilers wouldn’t be needed, nor would instruction-to-opcode tables be needed. I’m sure you know this, just wanted to dot the i’s and cross the t’s.
@tabletopjam48943 жыл бұрын
Interesting about dep. scaling, how you might actually be capable to do more in a cycle by putting instructions slightly out of order I suck at ASM... don’t plan to program anything in that, but it’s still cool to see what my HLLs are doing under the hood
@microdocker Жыл бұрын
Very good and explanatory shot. One small weired thing (not related to the topic) is, guy is literally sitting in front of a mic and still recording his voice on oncamera microphone ^_^
@xeridea3 жыл бұрын
Older compilers were known for being slow, and assembly was often used, especially in early consoles. Modern compilers are highly optimized. Besides all the basic stuff, they have all sorts of tricks for optimizing multiply, divide, and what instructions to use, even specific to CPUs if you want. Sometimes CPUs have weird quirks that compiler developers can take advantage of, or at least avoid penalties. Optimizing multiply and divide goes beyond obvious stuff, like bitshifts for powers of 2, they have all sorts of tables for methods for various numbers. Often they can even convert loops into SIMD instructions automatically. If not, doing SIMD completely manually is very tedious, there are methods available in some lower level languages to make it a bit easier. Some things can still be hand optimized, but requires very in depth knowledge of CPUs, and even then, may not even be faster. For most purposes, not worth it, though some low resource embedded systems, some drivers, and some other niche cases benifit.
@sergiomarroquinjr35873 жыл бұрын
I always seem to learn something new from you. Keep it up!
@codenamelambda3 жыл бұрын
Well, I'm pretty sure even naive assembly is going to be faster than Python Though that isn't exactly a fair comparison. Also, inline assembly is a thing in many compiled languages, so "stay in asm as long as possible" is not *entirely* true. Sure, the compiler is still hands off there, but it *can* inline the function around the inline assembly.
@monad_tcp3 жыл бұрын
17:53 of course assembly is used, by the optimizing compilers themselves. If you are really good at assembly, you ought to teach the compiler how to do it. But programming compilers is an entire other art form / engineering.
@gFamWeb3 жыл бұрын
I've always pictured the talk about clock speed as analogous to how fast a cars tire can spin. Sure it can spin very fast, but if you don't have that good of traction on the tires, it's not going to help much. Same with throughout and clock speed.
@okaro65953 жыл бұрын
IMO the engine RPM is better.
@ug3333 жыл бұрын
Great information, great knowledge Side note: what's up with the audio?
@erwinmulder13383 жыл бұрын
I grew up programming home computers in the 1980s. You had to write assembly (and sometimes even translate it to number by hand) to make anything that would run faster than at a snail's pace. I mean 8 bit computers at 3.5HMz are not incredibly fast at anything. So if you had BASIC, which was interpreted (not even compiled) that was SUPER slow. You couldn't even draw an entire screen in one second most of the time. These days, I mostly work with assembly in writing (toy) compilers for my own programming languages. In the end, what any compiler really does is basically translate the source code to assembler instructions.