I'm 67 yo. I'm amazed that the training I was given in the 80's on early microprocessors, combined with the fun I had writing op-codes for my Commodore 64 enabled me to follow this. Thanks for the instruction!
@marrygrim1996 жыл бұрын
Coool!
@kricku6 жыл бұрын
"I'm 67 yo" is now one of my favorite phrases
@Chris473684 жыл бұрын
You were always ahead of your time 😀
@christopherlawley18424 жыл бұрын
you are not alone
@ShawSumma7 жыл бұрын
He is a really slow c compiler. I’ll stick to my usual command GCC.
@ShawSumma7 жыл бұрын
Very verbose also
@leberkassemmel7 жыл бұрын
And he only supports ARM! And not Open Source!
@klaxoncow7 жыл бұрын
I don't know. He's telling us what he's doing and explaining it all, so doesn't that technically make him an open source compiler?
@leberkassemmel7 жыл бұрын
Or someone just set the -v flag.
@moshly647 жыл бұрын
JSR $DEADBEEF
@Thompson82007 жыл бұрын
I'd love to see an explanation of 'side-channels' and how you turn a timing of a memory operation into a specific value from memory.
@talleddie817 жыл бұрын
The timing is not turned into a value. The timing of the operation is used to determine whether the CPU read the value from the cache or main memory.
@mduckernz7 жыл бұрын
talleddie81 And, to add further to this, if you know when sensitive (eg. kernel) operations are being executed, you can figure out where they're actually stored, bypassing ASLR. This takes some time, as it's a pretty noisy side channel, but can be pretty effective, as it may take many such probing operations to gather data but as billions of operations are executed per second, it actually doesn't take much real time to get some interesting data, and the longer you do it, the more precisely you can hone in on your target address.
@Thompson82007 жыл бұрын
Since a computer might have 16+ GB of RAM how do you even start to get an idea of where in the memory you need to be looking if all you know is that it did have to hit the RAM due to the timing?
@talleddie817 жыл бұрын
As Matthew Ducker said, it is possible to break the ASLR. What you then can figure out is where the user data and kernel data are stored in RAM. As far as figuring out what specific data is stored at each address, that is a very difficult and complicated topic. As far as your original question, the timing is only used to determine where the data came from. Knowing that the data came from the cache can be a clue to an attacker that the data was from a previous operation. In the case of an attack, this previous operation could be a memory read forced by the attacker that should not have occurred.
@Thompson82007 жыл бұрын
Thanks for the replies!
@tarcal877 жыл бұрын
_"using the Computerphile paper in a _*_radically_*_ different orientation"_ such a rebel :D
@bananya60204 жыл бұрын
7:35
@NikiHerl7 жыл бұрын
I have a request / maybe constructive feedback: I think it would be neat if you could update / create new Computerphile playlists. There are tons of videos I'd like to rewatch, but it's a bit of a pain to look for them one-by-one. Specifically I'd want to rewatch all the explainations of exploits/security breaches, for example
@dDAMKErkk6 ай бұрын
6 jaar terug; geen reactie, geeft te kennen - “I am wrong”,,
@Xulfer7 жыл бұрын
"...that we talked about in the caching video, many years ago" *cuts to video clip of the same shirt*
@Ozzymandiyas7 жыл бұрын
Consistency is something that is sorely needed on YT.
@BeoandIsa7 жыл бұрын
not the same shirt, look closer...
@ryke_masters6 жыл бұрын
Not actually the same shirt, but there is more than a passing resemblence...
@felipemartins64334 жыл бұрын
_tom scott wants to know your location_
@chswin3 жыл бұрын
That’s how you know he is the real deal…
@martinkunev99117 жыл бұрын
Assuming integers, some more time can be saved if the multiplication is done earlier (in can run in parallel with load instructions).
@erikengheim11064 жыл бұрын
Nice job! I had to click through a few explanations before I got to this one. Went straight to the point and kept me engaged, without getting buried in technical details.
@edmundkorley88927 жыл бұрын
Thank you for mitigating the screeching sound of the markers!
@Revan123456787 жыл бұрын
Another thing that I noticed, that wasn't mentioned in the video, is that reordering the code also opens up registry space for reusability. For example (6:55) load r0; load r1; add r0 = r0+r1; load r2;
@gordonrichardson29727 жыл бұрын
Valid point, but that opens up a whole new layer of complexity...
@Revan123456787 жыл бұрын
Hey, if designing a CPU was easy, everyone would be doing it xD
@KohuGaly7 жыл бұрын
yes, intel CPUs can actually do this. However, they typically do the exact opposite: Consider this code. ... add r0 = r0+r1; load r1; ... Notice that the load instruction needs to wait for the add instruction to finish, because they use the same register. Intel CPU will simply use different free register in the load instruction and adjust the rest of the code accordingly. .... add r0 = r0+r1; load r2; ....
@JoshuaHillerup7 жыл бұрын
I'm confused why the processor would ever do the optimizing, instead of a combination of the compiler/interpreter (for the particular bit of code) and the OS (for different processes and whatnot) doing all the optimizing, since those actually have all the information about what will be run.
7 жыл бұрын
Actually they do not have all the information about what will be run (if that were the case, it could speed programs a lot). You have to take into account the dynamic factors. Values in cache, branch prediction, utilization of individual cores (hyperthreading) etc. all affect the program execution severely and they're very hard to predict during compilation (although compilers of course try to do their best and you can help them with profile guided optimization).
@gordonrichardson29727 жыл бұрын
The processor is the only one that knows whether an item has been fetched from memory previously, and is in the cache, which provides a huge speedup. The compiler cannot possibly know the contents of the cache, although it should do some optimisation of its own. BTW, modern software can be rather inefficient, and it if weren't for fast CPUs, things would sometimes go very slowly...
@radarspace7 жыл бұрын
That's exactly how Intel's Itanium CPUs work.
@zeikjt7 жыл бұрын
If the compiler or interpreter were to try and it do it would be what's known as a premature optimization because you'd be optimizing for an assumed/theoretical cpu instead of knowing what it's actually capable of. It could be that your optimizations work well for a select few or even a great number of cpus on the market today, but tomorrow will come and new cpus will be released and your modified code could very well perform worse on those. You should just let the cpu itself do what it knows it can do.
@JoshuaHillerup7 жыл бұрын
ZeikJT if your compiler knows what CPU it will run on (and given the size of actual executable machine code versus the size of storage there's no reason not to include all existing CPUs), then it can target all of them. If a new CPU is built you can recompile your code to make it the most optimized.
@larryg23207 жыл бұрын
Since Dr. B is right-handed I would like to recommend that the camera be located over his left shoulder instead of his right. Love the shows.
@sebastiankumlin95425 жыл бұрын
Its just amazing how much time goes into making these videos. Thank you!
@magnum3337 жыл бұрын
What a great channel, thank you for this.
@tsmupdatertsm76337 жыл бұрын
Thanks alot for your work! I really like those videos with Dr. Bagley. He explains everything very well. And the deep level of how computers work is very interesting.
@DavidHamby-ORF-487 жыл бұрын
Nicely presented. I thought of the CDC 7600 designed by Seymour Cray as you were using the Acorn RISC machine in your example. The 7600 was superscalar & pipelined with a multiply unit, divide unit, adder, load/store unit, all 60 bit floating point. Integer operations were 48 bits using the same units but exponent fixed at zero. The Fortran compiler did critical path scheduling of expression evaluation in code generation. An instruction word stack handled decode and issue. Tight loops fit in the IWS and executed without instruction fetch.
@eldebo997 жыл бұрын
The color palette at 5:53, the left side, with example line "01 LDR R0, a", is challenging to read by my color-deficient eyes. Please reconsider that particular font / background color combo.
@powel54516 жыл бұрын
William Hebert no
@scatterlogical7 жыл бұрын
I think the unfortunate situation (like any security) is that this is not a pure computing problem, but a human one. Imagine how much more efficient computers and networks could be without the overhead of dealing with untrustworthy influences. :/
@carlosgarza316 жыл бұрын
A hardware bug that allows user level computer programs access to kernel space or other user level processes memory address space defeats the purpose of having virtual memory security in the first place. We should all be outraged that speculative branch prediction doesn't block cache memory writes on instructions that failed the branch prediction. From what I can tell engineers were well aware of this problem but ignored it because they assumed it would be difficult to exploit what seemed to be the random nature of cache page reading and writing and the extra cost of blanking out a cache page or blocking the writeing of that cache page during a failed branch prediction. People wanted faster recovery during a failed branch prediction for marketing their CPUs. Now they've got more marketing by allowing them to sell spectre/meltdown proof CPUs.
@debanikdawn70097 жыл бұрын
"I'm out of order?! You're out of order! The CPUs are out of order!"
@bwzes037 жыл бұрын
Debanik Dawn If I was half the CPU I used to be, I'd take a pipeline to this place! Out of order ? Who do you think you are talking to? I've been around you know!
@SproutyPottedPlant7 жыл бұрын
The Orona lift(elevator) is out of service! Out of order! press the alarm button
@gideonmaxmerling2044 жыл бұрын
with programs like these, many modern CPUs will send a few memory fetch requests one after the other. while the CPU is waiting for the memory it usually does other tasks. when the memory arrives, it might arrive out of order (out of order as in, you get b, then a, then d, then c) so it will compute the calculations by the order of arrival.
@leberkassemmel7 жыл бұрын
Anyone noticed the CD hanging out of the right iMac?
@billparsons33417 жыл бұрын
Anyone notice that he was wearing the same shirt in the cache flashback video from a few years ago?
@kigtod7 жыл бұрын
Bill Parsons yes - Looks like Sean's continuity briefing paid off.
@awirstam7 жыл бұрын
Maybe a off topic question about CPU´s. The question is in the time frame of around 1998 - 2006. Was the PowerPC actually faster than the x86 as apple always stated even though the clock frequency was a lot lower.
@linawhatevs83897 жыл бұрын
12:40 actually, instruction 8 (MUL) could happen earlier, during 6 and 7. It still wouldn't be faster than the reordered code, though.
@PaulsPubAndBrew7 жыл бұрын
Why wouldn't it take more cycles to analyze and determine an optional order than you'd save by using that new order? Does the compiler that originally compiled the code handle this? Or is this truly on the fly?
@gordonrichardson29727 жыл бұрын
Moderns CPU's have sufficiently complex hardware to analyse instructions several steps before they are actually executed. IMO the example chosen is simplistic, and not a good example of how pipelining works in practice.
@Disthron7 жыл бұрын
Super scaler? There was a Sega arcade hardware called the Sega Super Scaler. Though I think that was referring to its ability to scale sprites though. Look at games like After Burner, Outrun and Thunder Blade. Just to name a few.
@thejedijohn7 жыл бұрын
Great Video!!! I still have some questions: What part of the CPU looks at the instructions and evaluates a better order to execute them in? How does that not take more time than just executing in the order they were given? And do compilers like GCC rearrange the order first, or is it usually the cpu's job. If the C compiler does rearrange the order, can it inform the CPU that it's already been optimized, and to not waste time checking?
@KnightRiderDDR7 жыл бұрын
It's really funny how for the past 20 years no one mentioned this issue, but now when it is known the comment section of every video about Meltdown and Spectre is full of experts on the matter.
@gordonrichardson29727 жыл бұрын
When the X86 architecture started out more than 40 years ago, the design was entirely open, and exploiting flaws was trivially easy. Security features have been added in layers over the last few decades , while maintaining backward compatibility of instruction sets and memory addressing modes. At the same time numerous enhancements have been added, all adding to overall complexity. This is not how you would design a secure CPU from the ground up, and it does not surprise me when vulnerabilities proliferate. Trading-off speed and convenience, versus security and robustness, is seldom a winning strategy. On a personal note, some of us old fogeys were around 20-30 years ago, writing low-level machine code and understanding how the CPU worked, and well aware of (some of) the vulnerabilities.
@0xCAFEF00D7 жыл бұрын
Well I have similar surprise. Not that people know about it but that I've seen multiple new popular programmer friendly sources on pipelining and how it works just this year. Before specter and meltdown. It's an odd coincidence and I wonder what the catalyst is. Maybe it's just me being human and seeing patterns where there are none. But Cppcon had a talk covering it just now in 2017 and I can't recall any other talks that have. I've watched those a lot. I was introduced to this in 2013-2014 I think.
@gordonrichardson29727 жыл бұрын
One popular issue that underlies this is the simple question: Why should I upgrade to an expensive new CPU, when due to heat dissipation limits, the maximum clock speed is pretty much the same as last year's model? Moore's Law has not ended, but it continues to be implemented in ways that are not obvious to the layperson. With previous generations of processors the differences were large and quantifiable. Now its all about cache size, incremental improvements, and reduced power consumption. IMO discussing these fundamental factors have forced the topic of speculative execution into the public consciousness, whereas it was previously known only to a limited number of geeks...
@KnightRiderDDR7 жыл бұрын
It is kind of strange how this issue was revelaed when according to some we have reached the limit of traditional CPUs (silicon chips). If it is not a mere coincidence I can speculate and say that now that silicon chips can't get more powerful at the same rate that they were before CPU makers will have to find another way to pitch us their new products: "Look at our new CPU. It is not more powerful than the our previous ones but it has new architecture and is not vulnerable to Meltdown and Spectre so you better buy it!" But this is ONLY a speculation. I have my doubts that Intel would be willing to lose so much stock value over this.
@HenryLoenwind7 жыл бұрын
This general issue has been known for a long time---cryptographic processors are hardened against it. Those things aren't used because they are faster (often they are not), but because they take extra measures against a variety of out-of-band timing attacks. This is just the first time someone looked for and found a way to exploit it on a general purpose CPU with usable results instead of just some academic "oh, interesting". (Also, add media hype.)
@47Mortuus Жыл бұрын
FYI - the way this fictional CPU executes the code also uses Instruction-Level-Parallelism. I don't think there is any useful CPU design that has either but not both, which means they go hand in hand.
@flyball1788 Жыл бұрын
Spent my life on the H/W side of the fence as a developer, and have NEVER understood why problems like this, which could be addressed by having architecture-specific compilers written once and used once to generate optimised code, are always moved into H/W creating massive complexity (and hence bugs that turn up months later and can't be retro-fixed) and burning power on every single execution cycle on every single machine every single time it runs that bit of code. I agree that, usually, generalisation = slow and optimisation = complex, but surely it's only logical to put the complexity into that part of the system that can be easily changed when problems arise (as they always do with complexity) and which only entail effort/energy/time once at the start of the process. For H/W, the KISS mantra reigns supreme and complexity should be reserved for those things that can't be done up-front.
@bananya60204 жыл бұрын
tl;dr: optimization isn't all about using the fewest instructions, it's about using them in the right order and sometimes using a less "efficient" instruction to achieve parallelization so you can use as much of the CPU's power at once as possible.
@SparxableTunes7 жыл бұрын
Dr. Bagley always delivers to the forefront of my curiosities. I hope to be an example of one of the individuals who may never see the footsteps of higher education, and however prove that we can indeed continue to prove ourselves as veritable compliments to the field of computer science.
@jaywye3 жыл бұрын
How does an out-of-order CPU work? Is there a separate module that reorders instructions?
@dharma66620137 жыл бұрын
Wouldn't the time taken by the CPU to re-order the instructions wipe out any time gained by being able to perform those instructions in parallel? In other words, re-ordering the instructions makes it quicker to do them, but you waste time re-ordering before you can start.
@mduckernz7 жыл бұрын
dharma6662013 No, as this is usually performed by the decoder. The ALU and L/S units aren't yet involved. At this stage they will also perform things like checking to see whether data needed by the decoded operation requires data not in cache - if it's not, it will be prefetched, so that it is in cache when it's needed later. This is also where branch prediction comes in - if a branch hasn't been executed yet, it doesn't know whether data used by each branch will be needed, so it will gather the data for the operations involved in the branch it predicts will be taken based on previous behaviour. It may also perform speculative execution (this depends on the design of the specific CPU implementation)
@dharma66620137 жыл бұрын
Please forgive my ignorance, but that just seems to "kick the can down the road". Something, somewhere, has to spend time re-ordering things. The result is that the CPU can run things faster. How do we know, and how to we measure, how much the time used re-ordering compares to the time saved *by re-ordering*?
@vringar97927 жыл бұрын
dharma6662013 I would assume that chip designers and their respective companies have done quite some testing on this. You might want to look up which generation of chips was the first one to implement such a thing and how much faster they got.
@vringar97927 жыл бұрын
dharma6662013 tl;dr: thinking about how long something might take is faster than doing it.
@DFPercush7 жыл бұрын
CPUs have an instruction prefetch where the next instructions are loaded into cache before they are executed, usually in 16-byte segments. That gets into branch prediction, and what if you jump to a different address. But the main takeaway regarding instruction reordering, and pipelining in general, is that it can be done _combinatorially_ - meaning a logic circuit that does not use clock cycles, but acts as a direct function on its own. As soon as you feed in the input, given some gate delays, the output appears on the other side. For the purposes of this discussion, just think of it as being an instant process. It's a very long and complicated "if" statement that happens all at once in hardware.
@momokoko88117 жыл бұрын
If the assembly was originally written in the optimal order, will the CPU's useless attempt to reorder them cause an overhead?
@gordonrichardson29727 жыл бұрын
Not likely. During design and testing the CPU will be optimised to avoid this kind of wastage. Modern processors actually have huge amounts of overhead, but this is all geared towards the fastest outcome. Low-power alternative processors that have less overhead, continue to be available for specialised applications.
@wherestheshroomsyo7 жыл бұрын
4:20 that "c" is moving! What? Did that happen in editing?
@jamma2466 жыл бұрын
My knowledge of how a physical processor actually works is low, but I am a mathematician by trade and find this optimisation procedure quite interesting. So I don't know if what I'm about to say actually makes sense. But: The two set of instructions in this video only differed in the order of operations. The only data that seems to be needed to run the code in the theoretically most efficient way possible is what dependencies there are between the instructions; whether they can be run concurrently; and the timings that the processes take. I guess the rub is that the latter isn't really deterministic (or perhaps they are up to a reasonable margin of error?). Still: is a simple on-the-fly optimisation (that is actually implemented at the moment) essentially one which chooses processes that allow other concurrent ones? If module A of the processor is awaiting a new instruction, then first it looks at those available, then prioritises those which allow, say, for a computation on a currently unused module B (which is perhaps prioritised a slower component of the processor)... and so on in a similar fashion? I guess the mathematical structure I have in my mind is a kind of dependency tree which forms part of the data of the instructions, perhaps with some other weights so as to incentivise some processes (those which take place on slower components of the processor). Lots of gaps here, but I find this optimisation problem theoretically quite interesting and would like to know the current state of the art. It reminds me a lot of FP, where because of lazy evaluation you can ensure that functions are performed in an order so as to not have superfluous operations. Sounds like similar ideas could be useful here.
@luckyluckydog1237 жыл бұрын
BTW I think it was the Pentium Pro from 1995 the first Intel CPU with out-of-order (as well as speculative) execution. The original Pentium (1993) didn't support those features, ASAIK.
@jasondoe25967 жыл бұрын
I think the Pro was indeed the first Intel with speculative execution, not sure about out-of-order. *edit:* apparently both
@snkline7 жыл бұрын
The original Pentium was superscalar but didn't support OOE that is correct. In the P5's case it had two execution units that could execute instructions in parallel, but it didn't make any decisions more complicated than "Can I execute the next instruction in the second pipeline or not". The Pentium tried to pair off instructions. Pairs could enter both pipelines, while unpaired instructions could only enter the primary pipeline.
@DanielMarrable7 жыл бұрын
I would like to see him explain hyper-threading
@appychd7 жыл бұрын
Very well explained
@y__h7 жыл бұрын
On the serious note though, rather than superscalar architecture, isn't it more effective if we put two pipelines in the CPU and both of them sharing the same execution units?
@postvideo977 жыл бұрын
Yoppy Halilintar This is what SMT does I believe.
@galier27 жыл бұрын
Short answer. No.
@mduckernz7 жыл бұрын
postvideo97 In a sense, yes, except that there is only one pipeline. While one thread has a particular execution unit tied up - say, waiting for data to arrive from main memory, which can take hundreds of operations in CPU-time... note that this would occur due to a failure in branch prediction; normally, it would have already noticed ahead of time that this data would be required and requested it in advance already, so that it would already be in cache or even a register, unless it predicted wrongly that it wouldn't be required - you can instead execute operations for a different thread that doesn't need that data, or that execution unit.
@jasondoe25967 жыл бұрын
Yoppy Halilintar, two pipelines sharing the same execution units is pretty much _the opposite_ of what you want, because delays during the execution and complex dependencies would "stall" _both_ of them. *edit:* Matthew is right, that's not what SMT (aka hyperthreading for Intel) does.
@jasondoe25967 жыл бұрын
Guy Maor, how does multicore "share the same execution units" ?!
@ms-ex8em4 жыл бұрын
Did Lander have sound too?? Thanks.
@richardmiklos7 жыл бұрын
Can the clones execute Order 66, while the CPU is executing these instructions? I mean they don't depend on each other or anything.
@TheDuckofDoom.7 жыл бұрын
And now we move to multi core cache management and prefetching?
@MichaelQuantum6 жыл бұрын
If people would compile their own software, you could do all this optimization with the compiler and CPUs could be a lot more simple with much less power draw while still being just has fast in the final execution.
@ms-ex8em4 жыл бұрын
Hello did Lander ever have sound at all?? Thanks.
@KipIngram9 ай бұрын
I think we took a misstep in processor design decades ago. Modern processors have become so complex that no one person can understand all of them (I mean really, REALLY understand - down to the gate level of what's going on in all cases). As a result, we wind up with things like Spectre/Meltdown and so on, which happen because the left hand doesn't know what the right hand is doing. What we chose to do decades ago was to add complex logic to our cores, in an effort to get them to execute code faster. We've gotten to the point where all that stuff represents more of the logic on the chips than the actual compute logic does. What we should have done instead was to embrace the multi-core idea much, MUCH sooner. We should have kept our cores dirt simple, and just piled more and more and more of them onto the chip. Use ALL of the logic for the business of computing. Of course, this would have required us to face multi-thread programming much sooner than we otherwise did, but we've wound up having to face it anyway. If we'd just swallowed that pill sooner then we would NOT have processors that no one can understand and I wager that we would have much more secure, reliable systems that didn't plague us with all of the difficulties that our current processors do. You can't really say "That wouldn't have worked as well," because we DON'T KNOW. Software would have evolved in a different way, and we don't have the software we'd have gotten from that other path, so we don't really know where we'd be on overall performance at this point. We let the tail wag the dog at every turn, though, and now we are where we are. I don't know if there will ever be a way out. Generally speaking, though, I oppose letting whatever body of legacy software we happen to have "at the moment" dictate how we design future hardware. The hardware design should lead, and the software design should follow.
@pontuz27 жыл бұрын
Is there any overhead in the CPU by re-ordering the instructions during OOE?
@FrodorMov7 жыл бұрын
Well the CPU, or the execution of instructions is not used for reordering. Within the CPU, obviously, some component is required to analyze instructions and dependencies to re-order them. Obviously this costs some area on the chip, and energy, but in the end it should make execution faster.
@pontuz27 жыл бұрын
Thanks for the reply. Now that I think about it, the overhead of a potential re-order (+ new execution time) obviously has to be smaller than the original execution time in order to actually enhance the performance.
@gordonrichardson29727 жыл бұрын
The main benefit of out-of-order execution is not to re-order the instructions, but to ensure that the CPU doesn't sit idle while waiting for data to be fetched from memory. In almost all cases there is something else useful that can be done, rather than doing nothing!
@xponen7 жыл бұрын
What if we re-order the instruction ourselves? would the CPU still do the re-ordering part?
@BrianCairns7 жыл бұрын
In short, yes. Out-of-order designs typically require *much* more die area compared to an in-order design, and they also tend to use more power. In-order designs need higher clocks to have the same performance as an out-of-order design, but they still tend to be more efficient for low-medium performance levels. For the highest performance, you just can't clock an in-order design any higher (or it becomes inefficient to do so), and an out-of-order design is better. There are a number of modern, medium-performance in-order designs for exactly this reason, most notably the ARM Cortex-A53, which is the primary core used in virtually every low-end and mid-range smartphone (because of cost). The Cortex-A53 is also paired with higher-performance cores in higher-end smartphones, which allows the higher-power out-of-order cores to shut off when the phone is idle or under light loads (ARM calls this big.LITTLE; there's also a new version called DynamIQ).
@simonnomis1233217 жыл бұрын
Shouldn't you run B,C, and D at the beginning so the multiply can run at the same time as W?
@WanderAway7 жыл бұрын
While we're here, may I suggest another video on how adders/multipliers are built in the CPU itself? Maybe explain the difference between ripple carry adders and carry lookaheads and that kind of stuff :D
@JoQeZzZ7 жыл бұрын
Wouldn't it be more benificial to do the multiplying first? Because surely a MULT takes more time than an ADD?
@mduckernz7 жыл бұрын
Joris Not necessarily. It depends on the particular values. Some multiplications can be done in a single cycle. Notably, power-of-two multiplications (for integers, anyway) will just be converted to bit-shifts (a single cycle operation), but there are still others that may also take only a single cycle. Divisions are worse (again, except powers of two, which are just bit-shifts for integers), particularly modular division. These can take many cycles. The implementations of ALUs have many complex tricks to allow for very fast execution - I recommend reading more about them! :)
@nikoerforderlich71087 жыл бұрын
In this particular case it would! If you fetch d and e first, you can do the multiplication while a, b and c are being fetched.
@JoQeZzZ7 жыл бұрын
Guy Maor yeah, so he showed hoe the processor would use OOE to speeds things up. If it would've been donr right it would choose to do the multiplication first in most cases (since a multiplication consists of bit shits and adding instead of just adding 2 numbers). This would mean that at the end of the line it would have to wait on an ADD instead of a MULT, which would speed the whole process up sligjtly
@PrevName-h9v7 жыл бұрын
Having a link to the caching video would be pretty cool.
@gordonrichardson29727 жыл бұрын
Its from 2015: kzbin.info/www/bejne/bHvTfXdphbp0kM0
@Computerphile7 жыл бұрын
+Josh Hayes kzbin.info/www/bejne/bHvTfXdphbp0kM0
@nullptr.7 жыл бұрын
Thanks for explaining how that works! great editing
@Treviath7 жыл бұрын
Needs a follow up video on how the bugs work themselves
@qwmf05gcpt427 жыл бұрын
How will they make future CPUs?
@skyler1144 жыл бұрын
Literally programming a queue problem for an assignment as I'm listening to this
@ifell37 жыл бұрын
It's mind blowing to think how much stuff is wrote and executed just for something easy that we all take for granted!!
@policyprogrammer7 жыл бұрын
At the end of this video he says something that I think is correct, but the entire tech media has gotten wrong about Spectre / Meltdown, perhaps because the people who wrote Spectre and Meltdown papers got it wrong themselves. Spectre is a class of attacks that takes advantage of speculative execution. The attack concept does NOT rely on out-of-order execution. It could very well be that OOO machines make it easier, or that only the OOO processors run far enough ahead into the speculative path to pull this attack off, but conceptually, Spectre is a speculation issue, not an OOO issue.
@gordonrichardson29727 жыл бұрын
Probably true, but AFAIK all CPUs that run speculative execution, also run out-of-order execution. The reality is likely to be messy...
@policyprogrammer7 жыл бұрын
Well, in PC-land, it all went OOO with the Pentium Pro, but the Pentium Classic and its variations had a branch predictor. But it also only had a 5 stage pipeline and dual issue, only one of which could handle a load. You know that to "surface" data, the meltdown code example requires the ability to get "far enough ahead" to do a speculative load followed by a second speculative load whose address depends on the value loaded in the first. I don't think that's possible in a short pipeline without many execution units, so older processors probably are not subject to this exploit. OTOH, there may be modern in-order processors that have deeper pipelines and superscalar with an LS unit and two ALUs that could be exploited. Some of the more modern ARM processors might qualify. ARM11 implementation are 8 and 9 deep. I think most (all?) of the modern ARM "A" cores are OOO, but I would not be surprised to see that some architectural licensees have built their own cores that are deep, SS, but not OOO. In MIPS-land, it may be similar.
@avrohomhousman59584 жыл бұрын
is this the same as pipelining? It sounds very similar.
@isaak.studio7 жыл бұрын
Is it (a+b+c+d)*e or a+b+c+(d*e)?
@rafaelrui74577 жыл бұрын
Do you have a PATREON page to collaborate?
@velvetsniper7 жыл бұрын
you guys really should do a video together with level1techs
@nO_d3N1AL7 жыл бұрын
I thought it took less than 100 nanoseconds to get data from main memory, not 200. How can we calculate this? Basing it on 4200 MHz RAM.
@overwrite_oversweet7 жыл бұрын
For DDR4 4200 RAM with a CAS latency of 19 cycles, the time required to fetch the first word, assuming the appropriate row is already activated, is 9.5 ns. However, each _sequential_ word after that would only need 0.24 ns to fetch, meaning 4 contiguous words would only require about 10.25 ns and 8 would require only 11.25. Of course, if the next required word is in another column, you would have to wait the 9.5 ns again, and if it's in another *row*, then you'll need to wait even longer, as your RAM will need to be issued the Precharge command, and then the Active command on the correct row before the next Read command can be issued. The ALU, OTOH, would usually only need one CPU clock cycle to complete whatever it's doing, especially for a simple operation like addition or multiplication, which is on the order of 0.24 ns. Some ALUs can even do multiple such operations in a single cycle, and if you were using floating point instead of integers, it is relatively common to do multiply and add in one operation.
@HappyBeezerStudios7 жыл бұрын
For the here shown code a second load/store unit would speed uop the execution imensely.
@terrahertz52847 жыл бұрын
I didn't see any initial Clear Carry.
@ITR7 жыл бұрын
So you're saying they're not CPU aligned? Do we have to talk about parallel universes?
@StanislavPozdnyakov3 жыл бұрын
Did he said, that ARM architecture is implied?
@EmilBozhilov7 жыл бұрын
are in order processors affected by spectre and meltdown then??
@FrodorMov7 жыл бұрын
Not necessarily. Spectre is because of speculative execution (branch prediction), not OOO execution.
@Tehom17 жыл бұрын
No, he asked if *in order* processors are affected. I expect not, since speculative execution is necessarily an out-of-order behavior.
@FrodorMov7 жыл бұрын
I'm not sure I agree, but this is a matter of definition. I had the same thoughts, but argued myself that speculative ex isnt the same as ooo execution. Maybe it is. I mean, in spec. ex. you're not executing executions in a different order.. Just sometimes you're 'backtracking' a bit, on a wrong prediction :p
@Tehom17 жыл бұрын
OK, fair enough.
@kevincozens68377 жыл бұрын
Nice explanation of "out of order" execution. I knew you were going to make one minor mistake, not that it matters for the point discussed in this video. You threw in multiply as the operation before that last variable. You didn't take into account the typical order of operations. The multiply would be executed first.
@alancurssow90307 жыл бұрын
I like this guy, thank you very much for your time - very informative
@colt45477 жыл бұрын
Excellent video. Thank you!
@mikeklaene43597 жыл бұрын
Speculative execution is NOT the problem. The fact that another process can access the results of the execution IS the problem. The WALL between separate processes is not being enforced.
@Johanniscool7 жыл бұрын
The best part is when he uses the computerphile paper in a radically different orientation
@Brutaltronics7 жыл бұрын
the whole freaking system is out of order!
@RoboBoddicker7 жыл бұрын
Cause when you stick your hand into a pile of goo that was your BEST FRIEND'S FACE, you don't know what to do!!
@Schnack217 жыл бұрын
Shouldn't we perform the multiplication fist in this equation anyway?
@irenef226 жыл бұрын
Great explanation. Thanks.
@comradepeter873 жыл бұрын
Does CPU decide all this in real time? How does it do all this?! Isn't it just supposed to be an electromechanical part? If no software intervention occurs here, this might as well be black magic to me.
@gogokowai3 жыл бұрын
I have the same question. I'm having trouble imagining how it could possibly be faster to make a bunch of checks on multiple instructions and cache states than it would be to just perform the add/multiply.
@jonahansen6 жыл бұрын
Very well explained!
@asailijhijr7 жыл бұрын
Don't most languages evaluate expressions from right to left?
@ThorkilKowalski7 жыл бұрын
I think the 386 was the first commercial superscalar processor.
@RowenStipe7 жыл бұрын
7:35 We've gone from landscape to portrait !
@retop567 жыл бұрын
Great video.
@MatkatMusic7 жыл бұрын
man, talk about a fantastic breakdown of the topic!
@dichebach6 жыл бұрын
Interesting stuff!
@peterbustin26836 жыл бұрын
Really very interesting! Thank you..
@raykent32117 жыл бұрын
Never had the problem with an Atmel s1200,
@lucianodebenedictis60147 жыл бұрын
Take a shot every time he says "load store unit"
@KX367 жыл бұрын
You're out of order! You're out of order! The whole CPU is out of order! They're out of order!
@gbhall7 жыл бұрын
Hmm interesting, I was unaware of the actual implementation of orders.
@mrblue7287 жыл бұрын
This is such a relaxing stuff for my high-level language oriented brain.
@KipIngram9 ай бұрын
This all should be done by the compiler (or the programmer) - investing logic to "correct" a less than optimal code sequence that has to be present in EVERY CHIP and operate EVERY TIME you run your program? That's just clearly not the best answer. I recognize it gave the hardware designers all kinds of opportunity to feel like they're clever, but it's a waste of resources.
@antoineroquentin22977 жыл бұрын
jokes on me, i'm still using an in-order CPU (D2700)
@sicksock4354467 жыл бұрын
This video taught me how to play the game Silicon Zeros...
@Roxor1287 жыл бұрын
Thanks for reminding me I need to put in some more time on that. I'm still in the early piece-of-cake puzzles. Well, they certainly are compared to where I got up to in TIS-100 and Shenzhen I/O.
@-42-477 жыл бұрын
Interesting, though it sounded like CPU's were out of order rather than (still) being out of order.
@KaedennYT7 жыл бұрын
Video about a graduate-level computer science topic? Aimed at the general public? Thank you.
@jasondoe25967 жыл бұрын
Hardly graduate-level.
@rcookie51287 жыл бұрын
Super informative!!
@halistinejenkins52897 жыл бұрын
a man's man
@beamjohn97537 жыл бұрын
So basically the code in the CPU is out of order and is vulnerable to machine processor errors
@beamjohn97537 жыл бұрын
Peterolen lol I’m kinda nervous a bit because I never thought this code could get exploited
@ulinvega7 жыл бұрын
I didn't understand a thing but that's cool.
@ze_rubenator7 жыл бұрын
CrashCourse has a crashcourse on computer science, they go very in depth about how CPUs work and what assembly code does, but they still keep it brief and simple enough so it's very easy to follow for the layman. Provided you have the attention span and pay attention.
@FrodorMov7 жыл бұрын
The lines of code that a programmer writes, he expects to be 'executed' by the CPU sequentially. Turns out though, that CPU's move them around and execute them 'out of order' because thats faster to do. Yet to any outside observer (the programmer) it still looks like the code (which is translated into machine instructions) happens sequentially.
@rykehuss34357 жыл бұрын
Fz Does this apply to lets say, 3D applications? for example RTS video games, where some of the things need to be executed sequentally and multi-threading is of no help. Noob here.
@FrodorMov7 жыл бұрын
OOO Happens in any program that runs on a CPU that supports it (Pretty much everything) . In the case of a video game, which consists of CPU + GPU parts, the CPU part will therefore be executed out-of-order. Fyi, this isnt something that the programmer can control. It just happens in hardware.
@ACTlVISION7 жыл бұрын
Same and I had a final exam on this a month ago...
@flamencoprof7 жыл бұрын
Plea from a software user: - No matter how efficient the code, please ask yourself how usable the code is to the end-user of it. Elegant solutions to programming are so far removed from usability that I fear the connection gets lost sometimes.
@satannstuff7 жыл бұрын
Chances are you've never knowingly seen an elegant programming solution. You'd have to actually look at the source code for that which as a self declared software user you wouldn't and probably can't do. What you're talking about is most likely just bad UI design.
@flamencoprof7 жыл бұрын
Dohh! I agree and stand corrected. I sort of knew when I made the comment that it was off topic. Please pass my concerns to any UI designers you might know.