Chip designer here. One thing to note is that above a certain size, eventually you can't compute the carry lookahead in a reasonable amount of time. In this case we will often combine carry lookahead logic with a technique called *carry select*. For example, let's say we can compute the sum of two 32 bit numbers in one clock cycle using carry lookahead (CL), but we don't have enough time to compute a 64 bit sum. In that case, we could combine three 32 bit CL adders. One of the adders computes the sum of the 32 least-significant bits, one computes the sum of the most significant 32 bits assuming the carry in bit will be 0, and one computes the sum of the most significant 32 bits assuming the carry in bit will be 1. Then once we know the actual carry bit from the LSB adder, we just use it to select (hence the name) which of the two MSB possibilities was the real answer (via a simple 2:1 mux), and ignore the incorrect result.
@adamalbee34337 ай бұрын
On a software level, graphics shaders often contain a bunch of this kind of *branchless* decision making : calculate *every possible result*, and then discard the incorrect ones.
@sammy55767 ай бұрын
What a legend
@GordonWrigley8 ай бұрын
Thank you for not stopping at ripple adders. I get tired of the dozens of youtube videos that cover the basics and stop there. Like all the different videos that cover pathfinding up to A* but no further.
@ArneChristianRosenfeldt8 ай бұрын
I think it is interesting that every microprocessor used carry look ahead for the program counter. Maybe the slow RCA 1802 did not.
@GordonWrigley8 ай бұрын
@@ArneChristianRosenfeldt What do you find interesting about that? PC needs a fast increment on a decent word size, carry look ahead is good for that without being too many gates.
@ArneChristianRosenfeldt8 ай бұрын
@@GordonWrigley microprocessors in the 70a scarified a lot to fit into a single chip. But they did not scarify CLA, a technology unknown to a lot of computer “experts”.
@Darkev778 ай бұрын
Wow wow wow, I swear I was diving into the hardware part of computer science just days ago to get a better and lower-level understanding of cpu and hardware as a software developer and you just uploaded this!!! Please continue on explaining how the hardware works on a low-level because it’s so complex
@DFPercush8 ай бұрын
You can find people who design their own custom CPUs on KZbin. Ben Eater and James Sharman are two I know of. A primer course on digital logic design might help as well. Stuff like combinational circuits, sum of products or product of sums, Karnaugh maps (until you learn that somebody made a program in the 80s called Espresso that is infinitely better at minifying truth tables), finite state machines and latches, registers, multiplexers and selectors, helps to be familiar with all that. It's not too bad. The basic building blocks are incredibly simple, there are just so many ways to put them together.
@MaxPicAxe8 ай бұрын
I like the smile that my face generated when you mentioned combining CLAs, unfortunately that smile didn't propagate to everyone else around me who thought I was some lunatic staring at nonsense on his phone
@SunroseStudios8 ай бұрын
maybe you need to generate smiles instead of propagating them :)
@elijahf8 ай бұрын
The robots make this a lot easier to understand thank you for taking time to make this
@neshploda176 ай бұрын
I love how the little robots react to your dialog, following along with each step. It makes it super clear and easy to follow. Especially at 1:35, when the full adder robot squints at you, annoyed that you called what they do "very simple" xD
@canaDavid18 ай бұрын
8:40 actually, it is usually in picoseconds (1000 ps = 1 ns, 10¹² ps = 1 s) for highly integrated circuits, with cpu's reaching several GHz and cycle times in the sub-nanosecond scale
@ArneChristianRosenfeldt8 ай бұрын
The N64 could add 64 bits in a single cycle at 90 MHz and had a 3 to 5 stage pipeline. In current deep pipelines, is an Add still single cycle?
@khatharrmalkavian33068 ай бұрын
Addition on modern desktops is usually 1/4 cycles, meaning that a single ALU can perform 4 additions in parallel in a single cycle.
@ArneChristianRosenfeldt8 ай бұрын
@@khatharrmalkavian3306 so like Pentium MMX from 1998 ? AtariJaguar from 1983 has vec4 add. I just could not find out why it needs 2 cycles. I guess something with the bus. You could always leave some bits at zero to stop the carry between your fields in a struct. Then AND
@khatharrmalkavian33068 ай бұрын
@@ArneChristianRosenfeldtNah, they just have multiple adders in the ALU because they're so simple and easy to slip in between registers. They can execute in a single cycle because they don't actually wait for a clock signal to run the calculation, so you just read the existing output from the relevant adder into a register.
@ArneChristianRosenfeldt8 ай бұрын
@@khatharrmalkavian3306 I think that CPU design lost me with DECs alpha. But an FPGA like design with a big matrix sounds interesting. A checkerboard made up of registers and adders fields. Then the register renaming algorithm routes the data through it. And a router may know when we need the carry flag / do bit logic and can decide to use 4 bit full adders and not fully propagate the carry. A shift not aligned to 4 needs to propagate the carry up to 3 digits.
@Valo_iO8 ай бұрын
This is amazing! I'm studying for CS, and they never thought this to us in our Computer Architecture/Logic Gates classes...
@ArneChristianRosenfeldt8 ай бұрын
Yeah, how would you Lay-out those gates?
@sporksto43728 ай бұрын
Well, they did teach us.
@ArneChristianRosenfeldt8 ай бұрын
@@sporksto4372 I mean, I would love to see the circuit with gates in the video and how the information flows through it. I wanted this to be the heart of my CPU simulation, but it eats so much space if detangled and still needs bidirectional flow. I could probably read it, but it lacks clarity for noobs and will not impress them.
@DrunkenUFOPilot8 ай бұрын
I was fascinated by this kind of stuff since I was a kid, (mumble) decades ago. In all my time studying electronics, chip design, fiddling with logic chips, and explaining stuff as a lab instructor, I've never encountered a decent explanation of look-ahead logic. Until now. This is very well done!
@cmilkau8 ай бұрын
Actually at uni we learned to make hierarchical CLA consisting of just copy-pasting slightly modified single bit CLA. It's a very small step from what is shown here. Missed opportunity but maybe someone reads this comment and solves it as homework. The hierarchical CLA takes O(n) hardware pieces and O(log n) time steps to compute all carries for n-bit addition, compared to O(n²) hardware pieces and O(1) time steps for the variant shown in the video
@Henrix19988 ай бұрын
Isn't 7:52 basically it?
@CheaterCodes8 ай бұрын
I've seen the same number online, but i can't understand how the cost can be O(n), I would think it's O(n log(n)) like in the video
@Tynach5 ай бұрын
@@CheaterCodes They might be referring to carry chain adders, which can't be described very well with logic gates. They're a transistor-level optimization that basically lets the generate/propagate logic set up the chain of transistors to act as a wire, and at different parts of the wire the high/low state will be whatever the carry value needs to be for the adders next to it. Sort of. There's also shenanigans involving a clock signal and depending on the capacitance of the transistors to help stabilize the result, but I know less about that because I've only looked into it for implementing in custom Kohctpyktop level solutions (there's a 'Kohctpyktop Sandbox' someone made online that I absolutely love).
@yash11528 ай бұрын
6:23 that cute nod oh lol
@vampire_catgirl8 ай бұрын
Carry Cancel Adders in Minecraft are a thing that is very cool as well, though they don't have a real world equivalent
@ghasttastic19128 ай бұрын
what is that
@kitlith8 ай бұрын
@@ghasttastic1912It's essentially a compact version of a carry lookahead adder that takes advantage of signal strength mechanics and the ability to create instant diodes by having redstone run up a transparent block. (grain of salt for the following explanation, it's based on my understanding of the original posy describing it. If you want more details, do a search for Carry Cancel Adder and click the link that points to the open redstone engineers forum) A CCA generates two signals for each input bit. A Generate and a Carry Cancel signal, the latter of which is the inverse of Propagate. These are all output into the same Generate tower and Carry Cancel tower, and their signal strength is compared at every bit/height to determine the carry status. (in short: is the previous generate closer than the previous cancel) so, really I'd say that this video *does* describe the real-life equivalent to the carry cancel adder, it's just that carry cancel adder has a bunch of Minecraft - specific optimizations. (It's a similar story for the CLE or Carry Look Everywhere, though I think that one only relies on instant diode behavior.)
@Hubcat_8 ай бұрын
@@ghasttastic1912They work essentially the same way as explained in this video, but they make clever use of signal strength to do the carry look ahead, which allows the adder to run much faster than Minecraft would otherwise allow, since it essentially makes the two operation process a one operation process, at least in the realm of Minecraft. Under the hood it may still take more operations, but the intentional delays in Minecraft redstone components are much more important in redstone than the lag Edit: For a better explanation/more info, watch mattbatwings' video logical redstone reloaded #4
@stanleydodds98 ай бұрын
Carry cancel adders do exactly the same thing as ripple adders. They aren't a different algorithm. They are only faster than a simple ripple carry adder in minecraft because the carry propogation is based on methods that send signals instantly (within a single tick) in minecraft; this includes changing the signal strength of distant redstone dust, or changing the state of a wall much higher up, both of which happen instantly. It's totally possible to make a normal ripple carry adder in minecraft that's just as fast, using whatever instant wire method you want to. People just started calling the simplest ones "carry cancel" adders because of how they interpret carry propagation when it uses redstone components efficiently, but it does exactly the same thing as a full adder.
@a-bombmori73938 ай бұрын
I actually was wondering if technical Minecraft players did something like this while watching this video, so thanks for sharing.
@Max-df5zp5 ай бұрын
This is the first and only video/cerator successfully bringing this across, and not only that, i am having so much fun watching this. I cannot express my gratitude enough, wow!
@theutz8 ай бұрын
This video is great and your whole channel is fantastic! Keep up the good work!
@bathfun8 ай бұрын
Crazy, i really did not know this. Thank you
@yoonseongdo33038 ай бұрын
I understood everything about logic circuits except this one. And this explains it so well, awesome!
@HeilTec3 ай бұрын
Playing Hard Chip and building adders makes this video entertaining and enlightening.
@proturtleman8 ай бұрын
It’s been a while, but I’m so glad to see another video from you! It’s great, as always. The robots are really cool and they help make the subject simple enough to understand while still having depth and helping us understand the core details. Thank you for the video!
@nikbl4k8 ай бұрын
This is the first time i watched a video this intricate and understood it on the first watch.
@agsystems82208 ай бұрын
Im a fan of the 3 to 2 adder. When you are adding more than two numbers you don't have to wait for each calculation to finish before putting more information in. If each bit takes 3 inputs and outputs a value and a carry then you can run it as a ripple by replacing one of the inputs with the values and another with the carries shifted, and iterating till the carry disappears. The cool bit is that still leaves one input to be folded in every tick. You can add lots of numbers very efficiently.
@milakohen6308 ай бұрын
Love these animations ❤
@oscarvanslijpe5818 ай бұрын
Very good and clear animation!
@ritwikchandra46128 ай бұрын
Great explaination with awesome animation. Now I really got 4 bit look ahead carry adder
@memesalldayjack32678 ай бұрын
this is awesome, very well explained and easy to follow
@AliceErishech8 ай бұрын
I used a schematic of a 4-bit look-ahead carry adder to design an 8-bit one in Logisim Evolution a few years ago to ensure I understood how it worked. The difference in complexity was something like 4-5x. The 8-bit one was *way* more complex than the 4-bit one.
@maxmn58218 ай бұрын
Very nice video, thank you! Waiting for Mr Parker to do it in domino in a sport hall and Mr Mould in hydraulic with a blue fluid
@christopherjaya3428 ай бұрын
It's actually nuts that carry-look-ahead adder has the time complexity of O(log n) instead of O(n)
@ArneChristianRosenfeldt8 ай бұрын
But power requirement of O( n log n )
@tacticalassaultanteater96786 ай бұрын
I imagine in circuitry most of this can be streamlined; adders can set their Carry and Carry Ready eagerly if A == B, and a big AND tree can collect all the CR values into the Output Ready of the whole adder. This results in a completely flat chain and a single large aggregated signal.
@TinusBruins8 ай бұрын
The biggest revelation for me, was that the basis for a carry-lookahead is the same circuit as a half-adder. And as a full-adder is build from 2 half-adders you already have al your generate and propagate values available and you only have to add the secondstage of the carry-lookahead.
@CrazyGaming-zh3qk8 ай бұрын
Why is this channel so under rated 😭😭truly one of the best if not the best content on KZbin
@user-pr6ed3ri2k8 ай бұрын
1:55 the intro immediately made me realize that the faster method of look ahead just does the adding and carrying at the same time and FASTER
@user-pr6ed3ri2k8 ай бұрын
4:48 and and or are yummy
@user-pr6ed3ri2k8 ай бұрын
8:30 recursive gaming
@shy-watcher8 ай бұрын
It's good to note that in a ripple adder both the number of adders and the time-to-add grow linearly as we process numbers with more bits. With a carry-lookahead the time stays the same, but the amount of components grows as a square of the number of bits. See how formulas create a triangle shape at 6:31. So this is not a free optimization, you pay with more silicon and more complexity. I didn't know about carry-lookahead, this seems kind of fundamental to a practically useful CPU. Thanks for a great video!
@ArneChristianRosenfeldt8 ай бұрын
You pay with silicon, but as others have said only log (n). Not a real triangle. A ripple adder in CMOS switches twice: once when the inputs are applied and then when the ripple runs through. Best to use two lines: low means: latch inputs. One line high means that a carry ripples through, other line high means that a non-carry ripples through. Then pull down both lines starting from lsb
@shy-watcher8 ай бұрын
Unexpected, I was sure the triangle was real. I should probably try and implement this in some model to really understand it.
@ArneChristianRosenfeldt8 ай бұрын
@@shy-watcher the upfront cost is quite big though. Z80 gets away with a single 4 bit adder for 16 bit adds. Now let’s say we accept the carry from the last instruction late in the cycle, then for full speed we would need 4x2 adders in the leaf of the tree alone. ( like 8 16-pin packages on the breadboard).
@i.am.abhi7478 ай бұрын
Love to see well created deep knowledge. It will be awesome to watch similar video for multiply algorithm
@dixztube8 ай бұрын
Boy reading about book this and then seeing the video. Man really makes it makes sense. Great job!!!!
@vladislavvlasov58258 ай бұрын
Thanks a lot, waiting for new videos! Wanna see trees and graphs here
@timsmith25258 ай бұрын
This is such a clear explanation. Great job! Thank you.
@Chrisionvision8 ай бұрын
Boys, new Spanning Tree video just dropped!
@Wegbm95 ай бұрын
The explanation is amazing thank u so much for it i really appreciate it 🎉❤
@roronoa_d_law10758 ай бұрын
I don't understand why we can compute all the carries at the same time in one step. What's the difference between computing the carry of a pair of bits and making the addition of a pair of bits ? Why the carries are not computed one by one for each pair just like for addition ?
@capability-snob8 ай бұрын
Well, "one step" is a bit of a lie. In practice, fan-in for multi-input AND and OR gates is not quite linear, so the "tree" structure of several lookahead circuits demonstrated later in the video is used. So, you get one step for each payer in the tree.
@David_Box8 ай бұрын
If you just do the operations as described, the carry-calculators still effectively rely on the output of all the previous bits. The only way in which I see it being faster is if instead of simply oring the results together, you iteratively check over them from the "closest" to furthest bit in order, adding significant overhead to each operation. This would make it vary in time depending on the input, something you cannot really take advantage of unless you modify how clock cycles work.
@ArneChristianRosenfeldt8 ай бұрын
N64 could add 64 Bits in single cycle and check the sign in the next instruction (cycle) and you tell me how. Interestingly, the 64 bit on Jaguar were actually vectors of 16 bit - no carry all the way. 6502 timing depends on the carry, and I hate it. I also hate the 68k MUL time dependency. I kinda understand the early out on 386, but that thankfully also went away. Floating point addition can lead to a very small sum. So a floating point unit better can detect leading 0s in a single cycle ( while adding with CLA ) and also has access to a barrel shifter to normalize the float in constant time. But I also sinned. I thought of a CPU which just waits a state after any ADD. ARM has this dedicated fast 4 bit counter for loops. Larger loops would be made of nested loops. The SH2 loads two instructions in a pair. So the instruction pointer has 2 cycles available to increment. One could even interleave it on a two shared 16 bit adders with intermediate carry flag. Logic functions get their own ALU.
@Ryrzard7 ай бұрын
There's no clock involved. The whole propagation is asynchronous. The carry calculator uses bits generated in parallel all at once. There's no signal that has to propagate through each bit of the number sequentially, unlike a ripple adder.
@ArneChristianRosenfeldt7 ай бұрын
@@Ryrzard for low power consumption and high speed, I thought if a skewed timing pulse could run through the register file and lift up the chosen register bits onto the bitlines just in time for the ripple.
@Ryrzard7 ай бұрын
@@ArneChristianRosenfeldt No real reason to do that. Just let it propagate through naturally and latch the result on the next clock edge like normal.
@ArneChristianRosenfeldt7 ай бұрын
The reason I had in mind were that in CMOS every switch costs. So there better be only on switch per gate per cycle. All latencies need to match. The other reason is that the CPU could run skewed all the time and at a higher clock rate. Problem is that the flags would be latched to late to be available for the next instruction. So ADC and branches would need a wait state.@@Ryrzard
@old486whizz8 ай бұрын
I dont quite get how time is saved.. with the lookahead logic, it has to essentially perform the add anyway to generate the carry bit... It reads the bits in calculates etc.. so how does it end up speeding up the calculations since youre still waiting.
@ArneChristianRosenfeldt8 ай бұрын
The carry jumps over the groups. This allows you to have modern CPUs with 64 bit. They do a ripple add for all the groups of 4 bits. In the next stage they repeat this for the 16 bit groups. And finally they calculate the carry bit for 64 bit. Thus, a 64 bit CPU is only 3 times as slow as an Intel 4004.
@yaus05278 ай бұрын
Great video. Could you make a booth multiplier video?
@ArneChristianRosenfeldt8 ай бұрын
Does anyone actually do this? I just don’t get it. Why not use ripple carry for multiplication? You store a carry per bit and use it in the next stage. The ARM in the GBA only does 8 stages per cycle. So 32x32 but takes 4 cycles? There would need to be a final add. So 2-5 cycles! MAC should need one less cycle because it could start with the carries from the last MAC. Similar for other instructions. Only when branches, saturation, or div are involved, the accumulator would need to be transferred into the register file. In the other comment they mention dadda and Wallace, which use (full) adders within each column and only process carries in the next stage. Seems like this reduces latency even if we use an accumulator made of two registers which represent a sum. Doesn’t booth need to supplement each adder with an inverter and a barrel shifter? Sounds expensive. So it is for microcode which makes the instruction so slow that we don’t even… Since the tree multiply algorithms don’t really care where the operands come from, I think that the booth 10 and 01 cases are no problem. A full adder is kinda expensive, so skipping may be interesting with dadda or Wallace . Also we Need 3 bits to add. So there would be a fabric which can do some skips. But with a 10101 pattern, a second cycle is needed. And we need a register to remember how far we got. The fabric is pretty cheap in CMOS if we construct it as power switches to a lot of busses. So the output driver of gets power. Or the bus sensor gets power. Then fill the 3 bit groups . Looks like if we exceed a skip length, stuff gets nasty. So better have a full fabric. Only othe last and first group of 3 can utilize the law of commutation to remove some gates. Booth may be great for the Z80 with its sole 4 bit adder or for 32 bit factors on the 16 Bit ALU of the 68k.
@LaserFur8 ай бұрын
Given the number of transistors available nowadays it could be faster to have each group calculate both results. one with a carry coming in and one without a carry coming in. And then have the results selected at the end. That way the group adder does not have to wait for the carry's to be calculated.
@wayando8 ай бұрын
But then the selection is done after waiting, which doesn't sopve the problem of waiting.
@LaserFur8 ай бұрын
@@wayandobut the wait for each group is smaller since it's less total gates. so a 8 bit group would have 16 gates to get to the output carry. and all the groups would be done in parallel. then the carries would just take 8 groups to get to 64 bits which would be another 16 gates to get the final result. I see it as easier to explain since the look ahead math is the same as the carry math.
@eric-seastrand7 ай бұрын
Love this channel.
@mehmetdemir-lf2vm8 ай бұрын
good video. please explain fast hardware integer multipliers too.
@capability-snob8 ай бұрын
100%. Dadda and wallace multipliers are just lots of full adders, but are a complete mind bend once first examined.
@ArneChristianRosenfeldt8 ай бұрын
@@capability-snobthey look so similar. Both seem to need a carry look ahead add at the end.
@ArneChristianRosenfeldt8 ай бұрын
Now I guess I may have some misconception how a division circuit looks like, which in a parallel operation spits out 2 bits plus a carry. Like doing 4 comparisons in parallel.
@capability-snob8 ай бұрын
@@ArneChristianRosenfeldt Yes - this is why many processors can offer fused multiply-add at the same performance as just a multiply - the network just simplifies a large number of additions into one final addition! I don't know of any pipelined hardware division circuitry, which is why a lot of systems (notably, GPUs) just approximate the reciprocal and then perform multiplications to get it accurate. It takes a bit longer, but you can perform several divisions at once, or mix in other operations. Modern CPUs use this mechanism for integer division internally, which is why division tends to take 21-29 cycles. From what I can guess from the timing and power data, the floating-point division unit on modern CPUs does parallel trial subtraction. The divisor is statically multiplied by a sequence of small numbers, and these are compared with (effectively, subtracted from) the remainder. The index of the smallest positive number is taken as the next few bits of the divisor. It takes 8 cycles to perform division on 64-bit mantissas, but only one instruction on the core can be using this unit at a time.
@ArneChristianRosenfeldt8 ай бұрын
@@capability-snob I would build a pipeline of that digit based method for 8 bits followed by that iterative multiplication algorithm for 16, 32, 64 followed by one 64 bit carry look ahead check to get 100% correct integer division.
@jamesalewis8 ай бұрын
I built a Rust arbitrary size integer system, and a carry look-ahead may speed things up! Especially if I figure out how to parallelize the process.
@joshuascholar32208 ай бұрын
On a Von Newman architecture you can't get enough parallelism for the extra steps to break even. But if you were adding super large numbers on a GPU architecture then it would pay off.
@jamashe8 ай бұрын
Great vid bro. Thank u so much. How do you do these animations by the way?
@MAGNETO-i1i8 ай бұрын
Incredible! What books would you recommend to read to know more about how a cpu works?
@zokalyx8 ай бұрын
Haven't finished it myself but I've seen good reviews of "The Elements of Computing Systems"
@AmanYadav-ry3xr8 ай бұрын
"but how do it know" by J. clark Scott. It starts with simple gates(and, or, not, nand ....) and using those gates make a single 8 bit cpu and memory.
@ArneChristianRosenfeldt8 ай бұрын
I Wonder how a CPU can Route the data from a register to the integer units so fast. Surely, carry look ahead and bit logic don’t run on the same ALU anymore. Barrel shifter is extra?
@yensteel8 ай бұрын
Great video! will you continue to explain other concepts of computing in regards to mathematics?
@zuksofalter8 ай бұрын
Teacher Brian! #AlwaysInsightful
@iganic75746 ай бұрын
As a non computer science student half thing goes above my mind
@plemli8 ай бұрын
It is possible to optimize the special cases when - adding or subtracting 1 as in an up or down counter - one of the operands is an arbitrary but fixed number as in an increment by n counter - operands are known to have a (widely) different number of bits e.g. adding an 8 bit to a 64 bit number. It would be interesting to compare the output of your favorite VHDL or Verilog synthesiser for the general and these special cases. After that, consider adding more than two operands efficiently.
@ArneChristianRosenfeldt8 ай бұрын
Programm pointer circuits look almost like an adder even if the CPU does branch jump length calculation on the ALU. Only thing is count down in a sprite engine for x ordinate. Zero flag is calculated from msb to lsb. The msb don’t change before zero. So zero flag is faster than carry. Or keep a carry ever 4 bit and compensate the delay when loading the counter. Same thing with 8+64. There can always be a carry. For 6502 it would be cool to have two values of the high byte prepared. Jumps are within 256 targets. Still high byte needs to steal a cycle from the low byte ALU ( concurrently to the first fetch at the target).
@ArneChristianRosenfeldt8 ай бұрын
MUL comments show how to add multiple operands. X86 does this for some addressing modes with segment.
@yash11528 ай бұрын
1:48 TIL: chaining of digital units gives them name "ripple". thanks.
@Naveenyadav-kt8luАй бұрын
Very well explained but you made a mistake by not completing the value C1at 6.23, it should addition of anither term P0C0
@godofcows46498 ай бұрын
7:53 The circuit is the same for all the adders. The reason he says it's grouped into four bit adders is because that is the practical amount of adders that can be used on a 16 pin DIP package
@ArneChristianRosenfeldt8 ай бұрын
Ben Eaters SAP!
@godofcows46498 ай бұрын
@@ArneChristianRosenfeldthis videos are really cool - tho anyone is a mad man to use breadboards lol
@evansjahja7118 ай бұрын
Thank you
@Firmyourthoughts_7773 ай бұрын
Amazing
@adissentingopinion8487 ай бұрын
Aw heck yeah. Computer Engineering 201. Hit me with some of that Double Dabble with a real life 7 segment. Maybe even get some sweet AI hype with Wallace and Dadda multipliers and MAC operations.
@flameofthephoenix83958 ай бұрын
Very nice!
@abhishekrajanofficial7 ай бұрын
Superb information. How do you create such wonderful animations, Can you please tell me the source name or tool name?
@merseyless8 ай бұрын
I did not know this. Where can I go to find more digital logic shortcuts like this?
@hlibprishchepov3227 ай бұрын
I am to stupid or explanation to hard but i come up with my solution. The way I can come up with: we computing obvious carry and obvious not carry situation and fill all idk using regular algorithm. Doing it that way we split up our addition in chunks of two in average but longest one going slow down system. Is my solution same as in the video?
@fdk70148 ай бұрын
That's really clever!
@kabuuu_voky6 ай бұрын
Really great explanation, need you as a teacher lol, thanks
@samwalko8 ай бұрын
It might be worth mentioning that sometimes, there is a carry-in into the least significant bit, which means there's an additional term to the CLU formulas. But very good video nonetheless!
@ArneChristianRosenfeldt8 ай бұрын
Like the ADC instruction?
@samwalko8 ай бұрын
@@ArneChristianRosenfeldt I was actually just thinking of subtraction, which is typically performed by bitwise inverting the second operand and carrying in a 1 to the LSB.
@Z3rgatul8 ай бұрын
Last part is a bit confusing. We can't just calculate carry-lookahead for groups of 4 bits. We actually have to do 4 bits sum for them, right? And we have CL for groups, we run 4 bits sum again?
@kitlith8 ай бұрын
Propagate for groups of 4 bits should be ANDing the Propagate signals for all 4 of the individual bits, and the Generate signal for the group would be equal to the Generate signal for the most significant bit in the group, I believe.
@ArneChristianRosenfeldt8 ай бұрын
I think that this all is much more easy to understand if you are used to zero and carry flag in an 8 bit CPU. Then let us do a SUB. If the SUB internally of a group set the zero flag, we know that the group will propagate externally. If the group receives a carry, we need to DEC. For minimal delay we could have pre calculated this.
@bmk25615 ай бұрын
4:13 if there is a carry input, then there would be a carry output but 4:19 there is no carry input but why it still propagate a carry?
@johnopalko52238 ай бұрын
Thank you for the wonderful explanation. I had run across the concept of carry-lookahead ages ago but it all seemed like witchcraft. But my question is can it be done with purely combinatorial logic or do you need sequential logic to synchronize things?
@ArneChristianRosenfeldt8 ай бұрын
Combinatorial. Multiplication also. You could even unroll division, but with so many gates in the critical path, every synthesiser will insert pipeline stages.
@johnopalko52238 ай бұрын
@@ArneChristianRosenfeldt Thanks! Just for grins, I recently designed a simple 4-bit clocked adder/subtractor using ripple carry. I'll see if I can design one with carry-lookahead. My challenge to myself is to design these things without looking up how others have done it. I start with truth tables, derive the Boolean equations, and translate those into gates. A lot of people my age do Sudoku puzzles to keep their brains agile. I do this. Yeah, I know: weird.
@nevokrien958 ай бұрын
This is like simd but for adders
@jalapenogaming97408 ай бұрын
we need videos on algorithms
@weckar8 ай бұрын
I 100% still don't get this. With every carry still being dependent on every previous carry, how is it any different from ripple propagation directly? I understand that if an addition generates a carry you can ignore everything prior to that, of course.
@perplexedon98346 ай бұрын
I think I've got my head around it. The big picture idea is that even though heaps of comparisons are made in the carry lookahead, those comparisons can be implemented in parallel, wherase ripple progation is inherently serial and thus slower. Review 6:40. Even though there formula for C_n+1 depends on Cn, we can walk that back to generate a closed formula. This closed formula is the same for all numbers, the parameters just depends on the bits. The formula itself, even though it has many different G and P parameters, can be completed in two logical steps a set of AND operations that are computed in parallel, and a set of OR operations computed in parallel. Calculating each value of G requires only an AND gate, and calculating P requires only an OR gate, and these can also be done in parallel with one another for all values. Now it's important to understand the actual metal of the chip. The physical silicon is all NAND gates, so we can estimate the time an operation takes by the maximum number of NAND gates a signal will have to traverse. A full adder has 9 NAND gates, with the longest path consisting of 6 NAND gates. An AND gate is implemented with two NAND gates, and an OR gate is implemented with 3 NAND gates, with a longest path of 2. All that together means that, regardless of length, it takes only 6 NAND gate cycles to generate the carry for the nth bit, no matter how high n is, and you can do this for all bits in parallel. Waiting on the carry would take 6n cycles, and obviously 6≤6n for all n>0. So once you have the carry, you are only one full adder cycle away, which is 6 NAND gate cycles. Therefore the carry lookahead algorithm requires that no electron has to travel through more than 12 NAND gates to complete the addition of two numbers of any size, whereas the naive adder algorithm would require each electron to pass through 6n NAND gates where n is the number of bits! This is clearly better for 3 or more bits. I'm not sure of this, but I think the reason we stop at 4 bits is a practical geometry issue. Implementing the very large formulas for the 64th bit would mean a lot of NAND gates in a small space, all closely connected. At some stage the bottleneck will become the time that it takes for an electron to get to the next gate, rather than to go through it.
@AmanYadav-ry3xr8 ай бұрын
Make next video on CRC(Cyclic Redundancy Check)
@ArneChristianRosenfeldt8 ай бұрын
Ah yeah, how CDs jumps over dust and scratches!
@netheritecraftondrugs51268 ай бұрын
For an 8 bit number what youcould do is divide them into 2 4 bit numbers and do the calculations simultaneously
@ArneChristianRosenfeldt8 ай бұрын
That is what the video shows, no? Anyway, 8 bit CPUs need to deal with 16 bit instruction pointers. Z80 actually uses 4 Bit adders for bytes. 6502 uses ripple 8 bit. Ben Eater does what you propose in his SAP.
@matteofrattini91338 ай бұрын
I'm confused... If the carry-lookahead formulas at 6:30 are calculated in a recursive way, aren't they all waiting for the previous ones to finish their work just like the adders? I get that this would be very convenient for longer operations like multiplication, but single-bit addition is literally two logic gates 🤔
@capability-snob7 ай бұрын
Recursive in the sense that when you break up a 64-bit addition into a 32-bit addition, and those into 16-bit additions etc, you only have 6 steps before you're adding 1-bit numbers. These 6 steps are much fewer than the 63 steps you need to take in order when using a ripple-carry addition. By breaking the problem into similar sized chunks at every level and computing them in parallel, you arrive at the answer in a logarithmic number of recursive steps. You have to do more work, sure - but by focusing on breaking the problem up in this way, you can do more of the work in parallel and get the output much sooner.
@User-pi3nf6 ай бұрын
Is is awesome thank you
@darqed8 ай бұрын
Great video!, although I already knew the main idea. Wouldn't packing up into 2 bit pairs each time be faster though, because then it will be just a binary tree to search in?
@trevinbeattie48888 ай бұрын
I think that would be slower because you’re doubling the propagation delays. Remember that the point of the carry-lookahead circuit is to eliminate propagation delays for the entire group at the cost of increasing circuit complexity at a rate of N², so the choice of group size is a matter of balancing speed versus transistor count.
@ArneChristianRosenfeldt8 ай бұрын
Ripple is quite fast per bit and in CMOS can use this inversion trick. Not sure if it works for look ahead. Also 4-NAND is kinda the default gate and you can apply it more if you use groups of 4.
@zoey36058 күн бұрын
4:20 i dont understand how this propagates a carry when there is no carry output?
@kakyoindonut32138 ай бұрын
binary logic are the first thing I've learn before starting programming, yet I only knew about this carry lookahead algorithm after watching this video
@filipwolinski89158 ай бұрын
The fact that this is more efficient seems paradoxical to me. Isn't stacking so many gates together to look ahead for the carry the same as stacking full adders together? I know it isn't, but it doesn't seem intuitive.
@MuyGordoGato8 ай бұрын
Do an experiment. Assign a delay to each gate type (nand, and, or, nor, invert etc). Then design a full adder with ripple carry, and another with CLA. Chain eight bits together, and count the number of gates of each type the carry propagates through. I guarantee your paradox will resolve itself.
@kenrowe1678 ай бұрын
But doesn't the generate/propogate have to "ripple" from low look-ahead to high look-ahead? Since each successive look-ahead generator needs to know the status of all preceeding look-ahead generator decisions in order to make is own decision.
@ArneChristianRosenfeldt8 ай бұрын
It’s is speculation. You could say that each block of 4 bits calculates the result for both cases : incoming carry or not. See spectre bug in modern CPUs and branch prediction.
@sphynx_owner82248 ай бұрын
how do the byte's generate and propegate values get decided?
@ferenccseh40378 ай бұрын
But if the generate/propagate formula needs the generate/propagate of all the bits before, how do they not have to wait for each other?
@Ryrzard7 ай бұрын
The P and G of each pair of bits can be computed in parallel, without looking at any other bits. The last adder can get its carry-in from the calculated P/G and 2 stages of normal logic (ands and ors) instead of having to wait for the carry to propagate through 4/8/whatever full adders.
@ferenccseh40377 ай бұрын
@@Ryrzard Well, thanks for summarising the video, but that still doesn't answer my question. The video states that the P bits need the last bit's P&G to figure out if it propagates. But if it needs that, how does it not have to wait for it?
@Ryrzard7 ай бұрын
@@ferenccseh4037 It didn't state that. If you listen carefully, P and G can be figured out entirely based on the bit pair and nothing else.
@ferenccseh40377 ай бұрын
@@Ryrzard Doesn't Bit(n) only propagate if Bit(n-1) also propagates or generates in some cases?
@Ryrzard7 ай бұрын
@@ferenccseh4037 But you don't need to perform the actual addition to determine that. All the information needed to determine every bit's carry is available in a single step.
@kelpR8 ай бұрын
lets go!! a vid!
@proceduralism3768 ай бұрын
truly one of the vids of all time
@bytesandbikes8 ай бұрын
This is (as far as I can tell), the same as a tree of full adders that pass forward rather than along
@prosamis8 ай бұрын
I don't get it. So we fix having to wait for a process... By adding another process? Which also has terms that cannot be known simultaneously and relies on previous numbers? I don't see what we accomplished here besides trimming down on time for exactly just when we have generative and non propagating While it sounds like propagating carries go through the same issue
@dertechl66288 ай бұрын
The propagating/generating properties are not computed sequentially. They are done simultaneously for each digit.
@prosamis8 ай бұрын
@@dertechl6628 how is the outcome of the propagating determined without looking at parts before, and hence having a sequential part?
@dertechl66288 ай бұрын
@@prosamisat 6:50 you can see that after the propagate/generate properties for each digit was computed (which happens in parallel), the carry values only depend on these results. If we have OR and AND gates with sufficiently many inputs, we can then compute the carries in two steps, by applying the ANDs in parallel and then the ORs in parallel.
@poutineausyropderable71088 ай бұрын
Doesn't seem faster to me. You still have to wait for the end of the carry lookahead to do the addition step. It feels like doing the same task but one after another before addition rather then during. The only advantage is hard chemistry-ing 4 way | and & circuit.
@byeronkactus8 ай бұрын
not me using this for redstone builds! ( srsly though it was really useful )
@milakohen6308 ай бұрын
Brian 🎉
@nicholas_0x7f8 ай бұрын
Cute robots
@wolf71158 ай бұрын
We have to go deeper
@ArneChristianRosenfeldt8 ай бұрын
Why don’t you go check the commented die shots of chips from the 70s?
@chipichipichapachapaWHY8 ай бұрын
I thought the last bit would be a half adder (since it will never have a carry in)
@urgtuiop54558 ай бұрын
ADC. add with carry in. instruction needs a carry in so you can chain instructions together for adding/subtracting larger numbers. full adders in a cpu implemaentation need ci and co circuitry otherwise you would need an additional ADD #1 and some extra conditional jump instructions.
@TildaAzrisk8 ай бұрын
If all you are doing is adding 2 numbers that are the same number of bits as what the addition circuitry takes in, a half adder on the first bit works no problem. However, with some clever extra circuitry, adder circuitry can do alot more than just addition. When an adder is part of an Arithmatic Logic Unit (ALU), having the first bit have an optional carry in makes the ALU able to do more things than if the carry in was not possible.
@ArneChristianRosenfeldt8 ай бұрын
@@TildaAzrisklike subtract and increment.
@DominikJaniec8 ай бұрын
interesting ;)
@bayurukmanajati12248 ай бұрын
Now, we should aplly it into decimal based numbers. So human can do it as well. XD
@jitxhere5 ай бұрын
This is CS
@spezial-m91468 ай бұрын
did not understand why it is faster. :(
@SamiSaba28 ай бұрын
1/0? HOW
@NopeNopeNope91247 ай бұрын
For someone with the name "Spanning Tree" which pops up when I search spanning tree, I'd expect more spanning tree videos, but all I get is YOU talking about NOT spanning tree stuff, BRUH 😭😭😭😭 I wanna know about spanning trees man what the HECK Like yeah lemme call myself Planar Graph or Dijkstra's Algorithm or Fibonacci Sequence so every term links to a youtube channel NOT talking about the topic BRO!!!!!!
@lukamtc91888 ай бұрын
So basically, you use the fact computers can do many things at once whereas humans can't, so the fact that for humans all these "propagate" and "generate" values are a waste of time doesn't mean they are for computers too. I wonder if AI is going to do math in a fashion similar to us, even using intuitive rules for wether a number is divisible by 7 (the VSauce video), or if the computer's methods will always be faster?