Your passion for assembly language is hugely inspiring!
@WhatsACreel2 жыл бұрын
Thank you my friend!! Cheers for watching :)
@NeilRoy2 жыл бұрын
Agreed.
@blimolhm27902 жыл бұрын
I seem to learn more from people with a temperament like yours, jovial, unassuming in instruction. Thanks for keeping the topic straight forward, short and sharp
@kamertonaudiophileplayer8472 жыл бұрын
Finally, somebody does a real programming.
@olteanumihai12452 жыл бұрын
Highly underrated channel! Keep up the good work!
@dylanconway71902 жыл бұрын
I could watch videos like these all day. Great way to learn something new and refresh my memory. Awesome video!!!
@change_profile_n87552 жыл бұрын
So I found your content due to the fact that I'd like to start with an understanding on how computers work. I just started out learning Java as well as assembly. I don't do that because of commercial reasons but for reasons of fascination. Thank you for your work! Greetings from Switzerland 🍾 BTW: I have no experience in programming/computer architecture, but built an 8bit calculator in Minecraft (Redstone). This gave me a huge fascination to core concepts in CS and EE. Highly recommend this!
@Eidolon20032 жыл бұрын
I highly recommend Ben Eater if you're interested in the nitty gritty details of the hardware. I built a redstone computer as well as a physical one based on what I learned from his videos!
@change_profile_n87552 жыл бұрын
@@Eidolon2003 I actually was watching his "Hello, World from scratch" video about a week ago :D. Oh wow, a physical as well? How was the process?
@Eidolon20032 жыл бұрын
@@change_profile_n8755 It was a long but very rewarding process. Being able to say that I fully understand how it functions is really cool, especially since I didn't really stick to Ben's design at all. His is really simple, but not very capable tbf. I think knowing how a simple computer like that works makes it easier to understand how a modern x86/64 system works too. Honestly modern computers never cease to blow me away. They're so complex it's insane lol
@TerjeMathisen2 жыл бұрын
The sort3() function via XOR is very neat, you could in fact do it in scalar using 64-bit integer regs! The scariest part when sorting fp numbers happens when you have infinities or NaNs: I.e. the add and subtract the min/max fails completely, both with a single Inf or a single NaN, even though the ordering is at least defined when you have a mix of regular and a single inf. With NaN, all comparisons return false!
@BlueRider.2 жыл бұрын
To sort 3 numbers, I rather suggest this method which requires 6 operations (instead of 8 in the video): min3 = min( min(a,b),c); //2 operations max3=max(max(a,b),c); //2 operations med3=max( min(a,b),min(c,max(a,b))); //4 operations, but can be done in 2 operations instead! As the temporary results of "min(a,b)" and "max(a,b)" can be kept from the first steps in registers, this method requires only 6 min/max operations!!! (BTW, no issue any more with floating point precision)
@treelibrarian7618 Жыл бұрын
Well, yes, but actually no. The XOR solution is faster, since the 2 serial xors that come after finding min3 and max3 can happen in less time than the 2 equivalent min/max ops, (latency of fp min/max = fp add/sub = 4, latency of xor = 1). The initial 2 xors can happen concurrently with the min/max ops on the third unused vector port. For the same reason using XOR also reduces load on the 2 vector min/max capable ports and enables faster looping of the whole sequence, although the difference in reality is minimal (8/3 vs. 6/2 cycles throughput - about 11% faster).
@programaths2 жыл бұрын
Oh, just discovered CMOV 😢 That would have been extra useful when I was doing CS. The worse part is that I read through helppc at the time to find useful mnemonics we didn't learn. I don't know how I missed that one! That shows how basics can benefit everyone ^^
@TerjeMathisen2 жыл бұрын
CMOV (on x86 cpus) is very rarely a win! The branch predictors are so good that it is _almost_ always faster to simply load the one possible return value, then branch over a single instion that loads the alternative: ;; EAX has a, EBX has b, return the smaller of a & b: cmp eax,ebx jl done mov eax,ebx done: When EAX is the prevalent smaller value, then the cpu will predict this correctly and run the entire block in a single cycle (or even less if there is some other work which can overlap). With EBX being the return value we also have to execute the MOV, but this can be done in the renamer and so don't actually take any cycles! 🙂 The CMOV version will always take the same number of cycles, typically 2 or 3. (There are other architectures where CMOV is much faster, sometimes down to one or zero cycles.)
@Double-Negative2 жыл бұрын
The renamer can take extra time depending on surrounding code, so it’s not always free
@TerjeMathisen2 жыл бұрын
@@Double-Negative Sure, I thought that was clear from the way I wrote it, but I see now that it wasn't. Anyway, absolute worst case a MOV REG,REG takes a single cycle unless the CPU is from before about 1992 (Intel 486). 🙂
@maxmuster70032 жыл бұрын
Using intel syntax: 1. fast addition 16 bit instructions LEA bx, [bx+si] ; no memory access, no flags touched, result have to fit the target 32 bit: LEA ecx, [ecx+eax]
@ged992513 күн бұрын
Excellent!
@allmycircuits88502 жыл бұрын
Drawback of these sorting methods is they can't be applied if there are not only "keys" but also values which should be sorted with these keys. Plane old bubble sort in that case, I presume... Nice video nevertheless!
@programaths2 жыл бұрын
Reminder of XOR⊕ property: a⊕a=0 a⊕b=b⊕a (a⊕b)⊕c=a⊕(b⊕c)=a⊕b⊕c a⊕0=a So, if we call the 3 registers a,b and c respectively and min and max as m and n respectively, we have the following expressions: a⊕b⊕c (xor the 3) (a⊕b⊕c)⊕m⊕M (and xor with min and max) Let say m=a and M=c (could be any pair), then the expression becomes: (a⊕b⊕c)⊕a⊕c Per the above property we can remove parenthesis: a⊕b⊕c⊕a⊕c We can move values and group them: (a⊕a)⊕b⊕(c⊕c) We can also reduce the parenthesis: 0⊕b⊕0 Which evaluates to: b
@andrepoelman4162 жыл бұрын
Nice video! I am probably too late to the party, but I think you didnt answer question 3. The question was how to count bits in a dword on a 8086 (16 bit processor). No fancy bitcount instructions there.
@zeyogoat2 жыл бұрын
You've been teaching this chem teacher to code for years now. Cheers! One question I have: How would you sort *four* numbers in asm?
@WhatsACreel2 жыл бұрын
Ha! I reckon BubbleSort would do the trick :)
@infinitesimotel2 жыл бұрын
I love assembly, the problem is that it gets too addictive and you want to use it evrywhere LoL..
@michaelwilkes0 Жыл бұрын
If we can use xor to get perfect float math, why dont all cpus just always do that? Why is floating point error still a problem we have to deal with? Even assuming that trick does not work with multiply and divide, making add and subtract perfect would be amazing.
@williamdrum98992 жыл бұрын
I have a question about ARM Assembly. If you use malloc, will the kernel try to give you a pointer that is 8-bit rotatable (i.e. can be loaded into a register using a single instruction?)
@mikesbasement69542 жыл бұрын
Another potential problem with the additive method is the potential for overflow. Sure, it's not likely with three values, but it is possible. What instruction format do you prefer (AT&T, Intel, NASM)? Over the years I've found I'm liking nasm more.
@zxuiji2 жыл бұрын
12:55, nah if I was to code that I would've skipped eax completely: mov ecx 17 mov edx 32 cmp ecx, edx cmovl edx, ecx ret
@Pistolsatsean Жыл бұрын
I got a question bout assembly. If you have a determinate size loop, does it execute faster if written out line by line? (even if marginally so)
@yusufat12 жыл бұрын
what happened to FADD (float add) instruction, why do we use SIMD all the time for one floating-point value?
@mp-lv8bw Жыл бұрын
faster and easier to use simd registers
@lohphat2 жыл бұрын
How do compilers generate object code which can run on the variety of AMD64 family CPUs? There are so many variants which have extended complex action opcodes, how can the compiler know when to use those opcodes? I know there are compiler flags but in software distribution it’s impossible to know ahead of time which CPU instructions are supported. How is this handled at runtime?
@rsa59912 жыл бұрын
There is a CPUID instruction, that returns information about supported instruction sets. You can patch your code at the start of the program. You can also compile several versions and make installer pick one, depending on CPUID. But the default behavior is to target old enough CPU and just crash, if even older one is present.
@gower19732 жыл бұрын
What’s your day job? Are you a systems engineer? Or do you contribute to open source projects, is that ebook just for patreons or can anyone read it?
@pierreuntel19702 жыл бұрын
Oh yes, I remember when I was a young lad and started writing my first code in AutoIt and trying to figure out what's ASM and I was like... "WTF are these? are they just there for moving numbers around, adding and subtracting them? What for?" as I was trying to create a program with nice UI and messageBox and stuff... I'm pretty sure there are many peoples out there having the same question when looking at ASM at first ;) one day it just clicks and I still have no idea what I'm doing with ASM most of the time but can read and understand some parts of it.
@Eidolon20032 жыл бұрын
Could you possibly do an explanation of how to call C library functions like puts() from assembly, or maybe just link to a guide with the correct answer? I've found a couple different guides online and I couldn't get it to work for one reason or another. I'm just not experienced enough to know why. I'm using VS2022 btw
@mp-lv8bw Жыл бұрын
he covered this 11 years ago kzbin.info/www/bejne/qqmpiZx8lsuHisU
@ChrisM5412 жыл бұрын
Excellent video, cheers for the upload. I wish there was conditional moves back in the day with the 6502/10, then again, I like spaghetti. For the count the set bits question, using that 8bit 6510, I would look at bit shifting e.g. ASL of the value being examined (split over 4 bytes), examining the carry flag and increment a counter if set. I wrote the following as one way to do the job. ldy #4 ;4 bytes lp0 lda Data-1,y ldx #8 ;8 bits lp1 asl bcs BitSet lp2 dex bne lp1 dey bne lp0 rts BitSet inc Count bne lp2 ;faster than jmp and ok to use as long as Count never wrapped to 0 ;faster if below in zero page Count byte $0 Data byte %11110000,%00000111,%10101010,%11100111
@rsa59912 жыл бұрын
Instead of conditional jumping, you can use ADC #0 to add CF to A.
@ChrisM5412 жыл бұрын
@@rsa5991That's a really good idea. Could you provide a working example for the 4 bytes? - my brain was too sore to write an optimised version for the 4 bytes.
@rsa59912 жыл бұрын
@@ChrisM541 I don't have any 6502 tools, so I cannot confirm it working, but: ldy #4 ;bytes lda #0 ;count byteLoop ldx Data-1,Y stx 0 ;use zero page to store current byte asl 0 ;"prime" the loop bitLoop adc #0 ;spoils ZF, so should be before ASL asl 0 ;sets CF for ADC, ZF on last '1' shifted out bne bitLoop adc #0 ;count last '1' dey bne byteLoop rts ; reg A holds the result UPDATE: Had checked on online emulator, seems to be working. Runs in 312 cycles vs yours 506.
@ChrisM5412 жыл бұрын
@@rsa5991 Fantastic! works perfectly! - I've CBM Prg Studio installed, working on old platformer CDU magazine entry I submitted a 'wee while' ago - wish I had that utility then. Always nice to see different ways to solve problems, cheers.
@bpark10001 Жыл бұрын
Your second scheme is what I refer to "converting algorithmic operation to arithmetic operation" eliminating branches. The routine is "straight-line code". The complexity of code is proportional to the number of branches in it. Your scheme for calculating the number of 1's in a number only works if you have the special instruction. Without one, I have scheme for determining if the number has one or fewer 1's in it. Copy the number to another register. Decrement one of the numbers. Then do a bitwise AND between the 2 numbers. If it is zero, the number had one or fewer 1's. ROUNDING CRAP: get RID of floating point math! Floating point math belongs only in hastily "slapped together" programs written to get a quick answer. Most programmers are too lazy to properly scale their numbers. For the sort: as you explain in your sort videos, there are 3! possible = 6 outcomes. Do 3 compares, say 1&2, 2&3, & 1&3. After each compare, shift the carry flag (is set if 1st arg >= 2nd arg) into register with SHIFT LEFT (with carry) into precleared register. You have 3-bit number (8 possible outcomes, of which 6 are "legal"). Use this to index into a look-up table of the swaps required. Put the index of the "get" of the swap into the table of 8 entries. Example: let's say : A>B, A
@gilmannayeem43402 жыл бұрын
But that comnt is 4 months old
@dennisrkb2 жыл бұрын
Damn mate it's nice to have you back but you put on some weight. Please don't let it get any worse!
@WhatsACreel2 жыл бұрын
Ha! Sure did! :)
@andersjjensen2 жыл бұрын
This was really interesting. I've ever only come across assembly in the Linux kernel's architecture dependant code. It looks like you need to be a certain kind of masochist to enjoy the challenge of writing actual problem solving code in assembly... I should give it a try...
@unperrier59982 жыл бұрын
There's another concern with the sorting of 3 numbers method involving min/max/substract technique that is worse than losing floating point precision: if all three floating point numbers are close enough to the absolute maximum representation, adding them will overflow. Not sure what an overflow looks like with floating points, but if it's like with integers you'll get something very wrong in the end. In any case, thanks for the video, that's interesting. I'd be for a follow-up with more usual patterns and tricks. And maybe another video about ARM and RISCV assembly at some point?
@aaron68072 жыл бұрын
An overflow in floating point will probably either be an Inf or a NaN
@RickeyBowers2 жыл бұрын
Glad to see you laying some of the groundwork for assembly.
@first-thoughtgiver-of-will24562 жыл бұрын
We need to get all the language experts in a room (you being one of them) and create another assembly abstraction like C but with modern memory protection and better/modern op representation built in to the syntax but still being a "mid/low level" structured typed functional programming language that closely represents the codegen.
@alexvitkov2 жыл бұрын
ok
@lapatatadelplato65202 жыл бұрын
you're not gonna get a functional programming language if you're abstracting assembly. You'd be better off making a procedural language bc it fits the architecture more, but C already exists, so I don't see the point.
@NeilRoy2 жыл бұрын
Fascinating stuff. Love your videos, I'm always impressed by your knowledge and find this all VERY interesting! Keep up the good work, thanks. 🙂
@WhatsACreel2 жыл бұрын
Cheers mate! Thanks for watching :)
@davidprock9042 жыл бұрын
So what about writing in assembly language an application like FreeCAD or autodesk fusion 360
@WhatsACreel2 жыл бұрын
I would personally write the forms, buttons and front end in C++ or C#, and just keep ASM for the number crunching. I'm not sure I have the engineering skill to organize a very large scale, 100% Assembly project like that! It would certainly be a challenge :)
@pyromen3212 жыл бұрын
RollerCoaster Tycoon comes to mind!
@adivp72 жыл бұрын
That would be a massive drain of time and effort. Not much to gain and much to lose. What's to be written directly in assembly has to be important enough to be justified being written in assembly.
@maxmuster70032 жыл бұрын
x86 cornditional jump instructions: for unsigned values JA jump above JB jump below ... for signed values JG jump greater JL jump less
@quantumlightum7 ай бұрын
Just wanted to say a huge thanks for your videos about modern x64 assembly. I'm teacher and I'm preparing a course about x64 assembly and your videos helped me a lot. Many thanks. Subscribed to the channel. Concerning the side effect of 32 bits operations on high 32 bits of 64 bits registers, after doing some search, it seems that there is no physical RAX, RBX etc. registers, but there is a bank of registers and registers are allocated depending on instructions and then merged when instructions are completed... may be for optimisation reasons it is faster to just put zeros in the 32 high bits... but indeed it's a strange effect.
@kjrl818 Жыл бұрын
I've been learning about the open source risc-v assembly. liking it so far. Also. keep up the good work.
@FalcoGer2 жыл бұрын
Okay, adding two numbers together is easy. but that's not really helpful, is it? What we want isn't to read the result off in the debugger, or to change our code to change which numbers to add. In other words, I/O is missing. I know to deal with I/O you use syscalls, system interrupts or in embedded devices access the mapped memory of attached devices and read/write values to specified addresses which map to those devices, possibly in response to an interrupt.
@ebbflow45912 жыл бұрын
Excellent stuff, amazing channel!!!!!
@rsk57142 жыл бұрын
Hey man you got nice skills and look & talk similar to Mr.rocky balboa ! 👍🙂
@thatcrockpot15302 жыл бұрын
I'm always happy to see you posting
@decky19902 жыл бұрын
Do you have Irish in your family??
@josephmoore96092 жыл бұрын
🌸 𝓹𝓻𝓸𝓶𝓸𝓼𝓶
@dennisrkb2 жыл бұрын
Does a jump always have to immediately follow a cmp? Or could you execute some other instructions in-between?
@WhatsACreel2 жыл бұрын
Some instructions don't affect the flags, so you can execute some instructions between. Mostly MOV doesn't change the flags. Usually the CMP and Jcc are close by though.
@williamdrum98992 жыл бұрын
Usually on x86 the answer is yes, but ultimately it depends on the instructions you're using. On ARM, RISC-V, and MIPS you can do whatever you want in between.
@pyromen3212 жыл бұрын
If you look at compiler output, jumps are often put far after cmps or other flag altering instructions! I’ve seen a loop where the comparison was a SUBS instruction at the very top of the loop like 40 instructions before the branch. I think compilers strive for this because it essentially guarantees that the comparison result will be completely done before the branch is hit, preventing branch prediction miss penalties.
@maxmuster70032 жыл бұрын
Imagine since Intel Core2 architecture we can execute 4 integer instructions parallel, if there are no depency between and if the code have a good mixture of complex and simple instructions in the pipelines. This is not a CISC CPU, it is a mixture of RISC and CISC. The CPU split complex x86 instructions into micro ops to execute with some of the RISC units.
@dennisrkb2 жыл бұрын
@@pyromen321 Could that actually become counter-productive at some point? I.e., could it happen that by the time you reach the jmp, the cmp has been evicted from the intrsuction cache?
@programaths2 жыл бұрын
Also, fp error are "weird" as the gap between "consecutive" numbers just widen like crazy as you get far from 0. (expected since mantissa has a finite precision ^^) I think that fp introduces too much weirdness because of that and can be a big hurdle for beginners.
@zxuiji2 жыл бұрын
Just a note for those implementing FPN comparisons via binary, treat the sign, exponent & mantissa as separate comparisons: int cmpf( fpn a, fpn b) { int sigA, sigB, expA, expB; intmax_t manA, manB; /* Extract info */ ... if ( sigA - sigB ) return -(sigA - sigB); if ( expA - expB ) return -(expA - expB); return cmp(manA,manB,bits); } fpn minf( fpn a, fpn b ) { return cmpf(a, b) < 0 ? a : b; } fpn maxf( fpn a, fpn b ) { return cmpf(a, b) > 0 ? a : b; } Doing it that way avoids the possibility of incorrect return values (provided I got the signs the right way round in cmpf)
@zxuiji2 жыл бұрын
19:17, I thought you would do x = min(a,b), y = max(b,c), z = (a+b+c)-(x+y) **Edit:** Gave it more thought and noticed a scenario where the wrong answer would be given, I'll leave finding that as a thought exercise for peops who care
@SimGunther2 жыл бұрын
I think the important takeaway here is that if you haven't experienced the pain of making your assembler for a fictional CPU, you don't truly know the assembly meta.
@dookshi2 жыл бұрын
Great content, always keeping me stoked for the next video. For clarity sake, don't you think you should update the leftover comments that still state that xoring "sums" or "subtracts"? You even sinfully say it out loud. 🙂It accumulates and extracts which is good enough and just what we want but far from adding or subtracting. Keep it up pleeeease! 👍
@aaron68072 жыл бұрын
I love your videos, I can just watch them without having to think too much but still learn a lot
@waynemv2 жыл бұрын
How, in assembly language, does one write function that takes two or more arguments and returns a result? And how does one afterwards call that function from other languages, such Python, C++, C#, or F#?
@WhatsACreel2 жыл бұрын
Well, there’s videos on here for doing some of these things. Mostly very old videos. Calling from C++ is easier than C#, and I found that writing a wrapper in C++, and then calling that from C# was maybe the best way? C# just has a lot of extra type safety and memory management issues you have to work with. To call native code from C#, I have found it convenient to compile the native code to a DLL. Then you use something like ‘interopservices’ and ‘importdll’ from C# to import the functions you want to use. Something like that, you’d have to look up the details. As for calling native ASM from C++, there’s a lot of ways. If you’re in 32 bit, you can code inline ASM. If you’re in x64, then it’s a little trickier, but a lot of the videos on my channel here involve calling ASM from C++, so maybe if you have a look at one of the early ASM and C++ vids, you will see one way to do this. I’m pretty sure we did this in the very first video I uploaded. You might want to try assemble to a library file, either LIB or DLL, and link to that in your C++. I do not usually do this in these videos because they’re usually just little code snippets, but in a real project it helps to set things out like that. Then you’re looking for how to call a native DLL from C++, which is bound to get plenty of results on googs. As for Python and F#, I must say I have no idea sorry. Hope this helps, have a good one :)
@waynemv2 жыл бұрын
Thank you. I've figured out how to do some of that. I have DLLs created from legacy code written in Fortran that I then call from C# using interop services. But in that situation, the Fortran compiler made the DLL for me, so I didn't learn much, if anything, about the internal layout of the DLL in the process. Do you have any link to clear instructions on how to code a DLL from scratch in pure assembly?
@devmishra18 Жыл бұрын
I don't even wanna learn assembly, but I still watch your videos as they make me feel smart.