Arm vs RISC-V? Which One Is The Most Efficient?

  Рет қаралды 129,637

Gary Explains

Gary Explains

Күн бұрын

Arm has been making power efficient processors for decades. RISC-V is relativity new and many parts of its specifications aren't even ratified, but that hasn't stopped chip designers making RISC-V processors, including microcontrollers. Can RISC-V challenge Arm's power efficiency supremacy?
---
Let Me Explain T-shirt: teespring.com/...
Twitter: / garyexplains
Instagram: / garyexplains
#garyexplains

Пікірлер: 434
@Matthigast
@Matthigast 2 жыл бұрын
2010 does indeed seem 23 years ago
@GaryExplains
@GaryExplains 2 жыл бұрын
🤦‍♂️😜 Darn. That was a stupid mistake! But I think you are right it feels like soooo long ago!
@kurakuson
@kurakuson 2 жыл бұрын
Apple's first iPad: April 2010
@ZDevelopers
@ZDevelopers 2 жыл бұрын
@@GaryExplains Future proofing the video, that's all
@TechPill_
@TechPill_ 2 жыл бұрын
@@ZDevelopers Yea that's what I was going to say
@NovasVilla
@NovasVilla 2 жыл бұрын
It’s so sad hear that 😢
@rajivpalayan7028
@rajivpalayan7028 Жыл бұрын
For measuring relative performance, it is wrong to do a per MHz calculation. The only metric that should matter is the total time needed to run the same application on both processors. A more complicated ISA means clock speeds will be reduced (which gives better per MHz performance), but that does not mean the processor is faster
@marklewus5468
@marklewus5468 2 жыл бұрын
One more comment. Most processor manufacturers produce a spec called DMIPS/MHz, or millions of integer calculations per megahertz clock speed. This allows you to do a clock for clock comparison between parts.
@ralfbaechle
@ralfbaechle Жыл бұрын
Let me wind back the clock to the mid-80s to point you at the horrors of the Dhrystone benchmark which back then was more or less the canonical benchmark for integer performance. Even in the best case Dhrystone results didn't represent real world performance very well. Dhrystone wasn't only ignoring fp math entirely, its results also got more and more comically absurd as architectures got more sophisticated (caches and out-of-order made a giant difference) but also as compilers improved and started to "optimize away" part of dhrystone. The peak was rached when certain compilers started to recognice Dhrystone and applied Dhrystone-specific optimizations for almost arbitrary benchmark results - whatever marketing orders ;-) It gives me headaches to see parts of the industry are still using DMIPS decades after it's been throughly proven to be rubbish. (It seems many folks don't know these days - the D in DMIPS stands for Dhrystone).
@marioprawirosudiro7301
@marioprawirosudiro7301 Жыл бұрын
@@ralfbaechle Thank you for this informative comment. One learns something new everyday.
@mnomadvfx
@mnomadvfx 11 ай бұрын
It's an artificial value though and not very helpful for real world performance comparisons. I know this because Qualcomm quoted quite a high DMIPS for their Krait CPU core back during the ARMv7-A generation, and it routinely got thrashed by the lower DMIPS rated Cortex-A9 based SoCs in actual performance.
@markwarburton8563
@markwarburton8563 2 жыл бұрын
I was surprised that the now somewhat venerable Black Pill did so well in these tests against the newer upstarts, especially in power consumption and power efficiency. Thanks Gary!
@dekus80
@dekus80 Жыл бұрын
Not just "black pill" but stm32f401 or 411. Today one mc on black pcb tomorrow another... And f411 has 12.7mA at 100MHz core with periph disabled. Not all periph is need to be on. I have doubts about this video test 20mA. The Chinese have a lot of analogues stm32. And for example CH32V203 (f103 clone with riscv core) has 8mA at 144MHz. CH32V30x ( riscv core with fpu) 12mA at 144MHz. And they have ever tssop20 case. As f103 clone CAN onboard, that f411 doesn't have. And 307 has Ethernet, 208 has bluetooth + Ethernet and 2.2$ in my local store. I have not been interested in buying stm32 for a long time. Only Chinese only like stm32 has CH32, HK32, AT32, GD32 and so on.
@mementomori1868
@mementomori1868 Жыл бұрын
Its not about performance only!!!! The biggest thing RISCV is OPEN SOURCE processor...
@GaryExplains
@GaryExplains Жыл бұрын
Really? You understand that only the document describing the instruction set is open source. What advantage does that give consumers?
@mementomori1868
@mementomori1868 Жыл бұрын
@@GaryExplains Pls read (even in google) why riscv and open instruction set is so important.
@GaryExplains
@GaryExplains Жыл бұрын
@@mementomori1868 😂 Or please watch my videos as I have several about RISC-V and what it really is.
@GreySectoid
@GreySectoid Жыл бұрын
When I studied computer science Risc-V was my favorite to program. Good to see they are now doing a comeback.
@jamesmcintyre2747
@jamesmcintyre2747 2 жыл бұрын
I appricate that Gary is right here that RISC-V is not yet *as* effecient as but I'm very impressed that RISC-V is already *almost* as efficent as ARM with for the same processes being run 1.36mWh compaired to the equivlent ARM board getting 1.31mWh and even compaired to the *much* more established Pi Pico, it's only 8% less effient (than the Pico). Obviously being almost 89% less efficient than the Blackpill isn't ideal for RISC-V but this is still early days for it compared to ARM and just with there being so many less RISC-V processors produced vs ARM, I don't think you can expect it to be beating the leaders of the pack in ARM just yet. Maybe when there are as many models of RISK-V processor as ARM processors the leader will be arm. Maybe with more time for tuning, the leader of the RISC-V pack will beat the leader of the ARM pack; even with less models out there. Encouraging stuff. Stating my bias: I want RISC-V to succeed as I think open source is the way forward and garding "intelectual property" like dragons over gold, is holding humanity back. Thanks for the interesting video Gary!
@kayakMike1000
@kayakMike1000 2 жыл бұрын
You're in the realm of potential compiler optimizations... And which process node these chips are made of...
@xade8381
@xade8381 2 жыл бұрын
arm & risc-v are nearly of same age. Sadly, only ARM got attention at that time.
@TheWallReports
@TheWallReports 2 жыл бұрын
I agree. RISC-V is not there yet but made a very good showing being the new kid on the block. ARM has been at this game for decades. It is unrealistic to expect the new kid to outperform the veteran. ARM has been optimized over decades. RISC-V has to pay its dues to take the crown. I am strong RISC-V advocate. I look at this as there is plenty of room for RISC-V to improve. The ground to cover in some areas are not that great to close the gap.
@BruceHoult
@BruceHoult 2 жыл бұрын
@@xade8381 that's not correct. ARM started to be designed in 1983 and the first chips and boards were in 1986. ARM the company started in 1991, when there were already 100,000 ARM-based Archimedes PCs in use. RISC-V started to be designed in Berkeley university in 2010 (27 years after ARM), the initial frozen spec was published in 2014, the first board you could buy commercially from the first RISC-V company was in 2016 (30 years after ARM).
@TAP7a
@TAP7a 2 жыл бұрын
@@xade8381 RISC-V was still an educational tool for years though, with zero plans for reaching any sort of market. Whereas ARM was made from the very beginning as a commercial ISA, and is 30 years older to boot. Not very comparable.
@LokiScarletWasHere
@LokiScarletWasHere 2 жыл бұрын
It’s always strange being reminded that people think RISC-V is inherently more efficient than ARM. That’s not why people like the architecture. It’s an open standard, whereas ARM is proprietary. Anyone who can make a chip can make and innovate a RISC-V design, not the case with ARM. That being said, this was nice to see. I’m sure it has a lot of people blackpilled now.
@GaryExplains
@GaryExplains 2 жыл бұрын
"Anyone who can make a chip..." Really? I wouldn't even know where to start.
@LokiScarletWasHere
@LokiScarletWasHere 2 жыл бұрын
@@GaryExplains I was referring to the legality, mostly. The architecture is open, you don’t need to pay for a license to make RISC-V chips.
@laci272
@laci272 2 жыл бұрын
As I watch, I get questions, and as soon as they pop into my mind, Gary already responds to them. It's rare that a tech video is this well thought out and structured this well!
@gigihanmandarin
@gigihanmandarin 2 жыл бұрын
the one and only legendary Gary Explains.
@jacobrosen
@jacobrosen 2 жыл бұрын
A nice explanation as always. But I'm missing the sleep current for the different boards. It would be intresting to see how they perform compared to eachother. It is more if a comparison between MCU brands than core architechture, but still! :D
@marklewus5468
@marklewus5468 2 жыл бұрын
Great work as always. Benchmarking is always a can of worms because it is as dependent on the application as it is on the processor. Do you need fast integer? Fast interrupt response? Floating point? DMA? If you used newer M3 and M4 parts they would have performed much better even in this integer-only test both with regard to processing speed and power consumption given that they’re built on *much* newer process nodes. And a recent STM32 M7 would’ve blown everything else out of the water.
@BruceHoult
@BruceHoult 2 жыл бұрын
Why are so many of the commenters here so obsessed with process node? It strikes me that many (not aimed at you in particular Mark, sorry) may just be reciting jargon without understanding it. Even a very old node such as 180nm is good enough for making a 300+ MHz chip (e.g. the SiFive FE-310 on many RISC-V microcontroller boards) which is plenty for anything in this test. Smaller process nodes do allow higher clock speeds, but if you're not USING that ability then they are not just a waste of money in the much more expensive design and manufacturing process, but they may actively be WORSE because of things such as higher leakage current when operated at low clock speeds or in low power sleep modes. It's also a complete waste when you're making a simple stand-alone chip such as a microcontroller with a small core and a small amount of SRAM because even with the old nodes you end up with the actual processor&memory being a tiny little square inside a huge bit of silicon with the I/O pin pads taking up 90% or 99% of the extremely expensive small process node chip area. The default assumption unless you're a real expert should be that the manufacturer has chosen the best process node to optimise what they want to achieve with their chip.
@adymode
@adymode 2 жыл бұрын
We are familiar with needing to sample the test code many times to generate benchmark results which are not misleading, but it is also essential to sample different kinds of test code, to not be misled even by random compiler differences on each bit of code tested. With the performance results between the esp-c and the black pill coming within 1% of each other, that suggests the test was entirely memory bound on those systems and the systems share very similar memory systems. Multiple programs need to be benchmarked for a picture to emerge.
@DFPercush
@DFPercush 2 жыл бұрын
@@BruceHoult In general I would think that a smaller feature size would mean less parasitic capacitance, but I didn't think about leakage current. Is that from quantum tunneling? I wonder where the sweet spot is for that. But there's also the matter of different topologies like finfet and gaa, that might reduce the switching current. Mostly I think it's an economic decision. Everybody wants better speed and battery life, but how much are they willing to pay for it? For a computer that only runs a single program continuously, all you need is "good enough". Microcontrollers often have external power anyway. The main concern vis a vis power consumption is cooling.
@BruceHoult
@BruceHoult 2 жыл бұрын
Crazy to use only a single RISC-V board as representing a whole ISA. Obviously not all ARM cores or boards are created equal, and neither are all RISC-V cores or boards. Espressif doesn't even say in their datasheet what RISC-V core it uses. Crazy also not to include Sipeed Longan Nano ($4.80, 108 MHz, been around for three years), some Bouffalo lab BL602 board (similar price to ESP32s, we know it uses a SiFive core) or even extend the price limit a fraction to include a K210 board (dual core 400 MHz 64 bit) such as Maix Bit. Still, it is interesting to see that from the same chip/board manufacturer the RISC-V does in fact give better performance per MHz and per Watt than what they were using before. A really interesting test would be the Longan Nano (GD32VF103 clone of an STM32 but with a RISC-V core) vs either a GD32F103 (same manufacturer STM32 clone with a real licensed ARM core) and/or a real STM32F103.
@Bibbatron
@Bibbatron 2 жыл бұрын
Literally searched this a few days ago with all the news about RISC V Vs ARM. And there was no video. Thank you for this one.
@GaryExplains
@GaryExplains 2 жыл бұрын
What news are you referring to? Also, did you see this video of mine? kzbin.info/www/bejne/faq6qpyhd5ebfNU
@Bibbatron
@Bibbatron 2 жыл бұрын
@@GaryExplains Talking about an efficiency specific comparison.
@magfal
@magfal Жыл бұрын
A huge factor for efficiency is compiler quality which grows with age. The major design differences ariund efficiency is stuff like dark silicon for common tasks and SIMD engine implementation plus caches.
@marcusk7855
@marcusk7855 2 жыл бұрын
Isn't the manufacturing process(how many nm) a major factor in power consumption?
@rogerdeutsch5883
@rogerdeutsch5883 4 ай бұрын
Fantastic succinct but thorough coverage. I can tell a lot of work went into this great video. Subscribed!
@ryan258147
@ryan258147 2 жыл бұрын
You also need to consider the code density. The firmware binary size is usually smaller using ARM cortex compare to RISC-V or ESP32.
@mrrolandlawrence
@mrrolandlawrence 2 жыл бұрын
There is also a compact version of arm called thumb which offers higher code density.
@dead-claudia
@dead-claudia Ай бұрын
update from the future: the code density is starting to change in risc-v's favor as compressed instruction support is maturing.
@Tapajara
@Tapajara 4 ай бұрын
You should put "power efficient" in the title.
@muha0644
@muha0644 2 жыл бұрын
1:10 I did, RV32I in fact! although i had a hard drive failure so now it's abandoned...
@Serhii_Volchetskyi
@Serhii_Volchetskyi 4 ай бұрын
Consider ploting these chips as a chart Power_consumption vs Time_of_execution. By doing that, we will see the best over all chip.
@rursus8354
@rursus8354 Жыл бұрын
Good video. 13:59: Board A uses 20mA·26s = 0.52 Coulomb = 3.2448·10²¹ electrons to accomplish the task, and Board B uses 51mA·18s = 0.918 Coulomb = 5.72832·10²¹ electrons, so Board A peruses only ~57% of the electrons that Board B uses. Therefore A is more efficient.
@vikaspoddar001
@vikaspoddar001 2 жыл бұрын
I guess Gary, you really should put out a video series explaining the differences between ISAs, microarchitecture, process node etc. to the general public, as I have watched many people are disagreeing with you on various issues. I think this video series will work as prelude to ARM vs RISCC-V video BWT i also felt that I need some more help 😅😅😅😅 on this. Thank you
@minecraftermad
@minecraftermad 3 ай бұрын
you should also list the process node for the processor, it also really affects efficiency.
@GaryExplains
@GaryExplains 3 ай бұрын
Yes it does and what is shocking is that the Arm chips were on the older process nodes, making the RISC-V even worse .
@daniellewis984
@daniellewis984 Ай бұрын
So - I've implemented an ARM and RISC-V architectures "on paper", and RISC-V is simpler in ways that pay. There's only about a dozen basic choices that even *can* be optimized out in the core ISA yielding an architecturally pure ISA. My less favorite parts: * The opcode, funct3, and funct7 not being unified in decode step. * The LU opcodes not mapping to truth table means LU operations are not simple 4BD decodes with add and mul being 2x4BD+1 decodes. AFAIK no commercially available ISA has ever achieved this, but it's been discussed widely in academic circles. ARM though has for example the Java bit which *halves* the available opcode range, and is AFAIK based on an earlier RISC platform with some commercial extensions. And sure, some of that makes it fast, but it's going to be less efficient per wire than a cleaner ISA. There's actually tons of details I don't like in it.
@psiah9889
@psiah9889 Жыл бұрын
As I see it: Arm's been around for a while. It's had an awful lot of work put into its efficiency, power, etc. over the decades. RISC-V is new, and there isn't a lot of money in perfectly optimizing it (yet). The fact that it is at all competitive now is a good sign for things to come, but it's gonna need more time, work, and support to be fully realized in this regard.
@Shrek_Holmes
@Shrek_Holmes 9 ай бұрын
frequency scaling with power usage isn't linear, its exponential, its better to have all of them at the same clock frequency
@GaryExplains
@GaryExplains 9 ай бұрын
While I agree that it isn't necessarily linear, as far as I know that is only if the voltage changes with the frequency. In my testing I didn't only use extrapolation, I did clock them (where possible) at the same freq and the results correlated with my extrapolations.
@petermolnar6017
@petermolnar6017 2 жыл бұрын
Thanks Gary for this wonderful comparison! Greatly appreciated!
@GaryExplains
@GaryExplains 2 жыл бұрын
My pleasure!
@repostor
@repostor 2 жыл бұрын
Very interesting article. I have always thought about how RISC-V would be compared to ARM. Do you have similar comparising for enterprise chips too? comparing RISC-V with x86 (Intel/AMD) and perhaps also including ARM?
@autohmae
@autohmae 2 жыл бұрын
I think 'process node' probably has a huge influence
@dahlia695
@dahlia695 3 ай бұрын
How did you ensure that wifi and bluetooth radios did not affect the power measurements?
@nateb1804
@nateb1804 2 жыл бұрын
The silicon fab processor node tech used to make the chips plays a huge role in their efficiency. It would be good to include fab node info in the comparison data.
@GaryExplains
@GaryExplains 2 жыл бұрын
Indeed, it is something I will note for future videos. As for this video the key is that the Arm Cortex-M4 is using 90nm and the RISC-V ESP32-C3 is on 40nm, which makes the performance of the RISC-V processor even worse.
@nateb1804
@nateb1804 2 жыл бұрын
@@GaryExplains Wow that's very telling. Thanks Gary!
@geoemm
@geoemm 2 жыл бұрын
Also the area of the chip also should be a criteria
@TheEulerID
@TheEulerID 2 жыл бұрын
I think it quite surprising that a 13 year old design stands up so well. I would suspect that if the power saving features of more modern ARM processor designs were to be exploited for a micro-controller SoC, then it might do better still. However, presumably the priority has switched to producing much more powerful, low-power architectures for use in servers, laptops and the like. producing the ultimate in low power micro-controllers is probably not a priority as these things are rarely required to do heavy number crunching.
@ByteMeCompletely
@ByteMeCompletely 10 ай бұрын
I just created a NAS with a Raspberry Pi 4B and an external USB HDD. This would be a good application to verify an SBC is useful.
Жыл бұрын
One detail that you missed is that the Pico and Pico W do not have a linear regulator; they have an on-board buck-boost switching power supply. Current consumption will not be constant; it will go up as voltage decreases.
@derekchristenson5711
@derekchristenson5711 2 жыл бұрын
That was very interesting to see, and I liked the different ways you picked to look at the question.
@minecraftermad
@minecraftermad 3 ай бұрын
clock speed scaling is definitely not linear enough to fix afterwards, you should down clock all of them to the same speed, if you want compare at the same speed...
@GaryExplains
@GaryExplains 3 ай бұрын
Clock speed scaling is linear on microcontrollers. They are in-order and deterministic. Plus I did actually change the clock speed on many of the units to check that, and it is.
@adrianalanbennett
@adrianalanbennett 2 жыл бұрын
Hello from Tennessee, Mr. Simms. Love your channel. Thanks for the video.
@michaelkaercher
@michaelkaercher Жыл бұрын
In general, performance of risc-5 is not up to the standards of ARM. Full stop. But this battle does not stop today. ARM just announced, that they will charge their customers in future based on the device prices instead for IP. That will drive the research in the area of Risc-V up. I expect the Risc-V to become a contender in the Mobile Phone space (low end) in about 3 years and in the high end market in 6-7 years.
@GaryExplains
@GaryExplains Жыл бұрын
ARM has not announced anything of the sort. You are repeating a rumor published by the FT.
@michaelkaercher
@michaelkaercher Жыл бұрын
@@GaryExplains It came from Softbank, the owner of ARM. Let us wait and drink tea. Maybe it is a hoax.
@GaryExplains
@GaryExplains Жыл бұрын
Again, nothing official has been said by Softbank or Arm.
@michaelkaercher
@michaelkaercher Жыл бұрын
Let us wait and drink tea. Btw. Enjoying most of your content. Great channel.
@lepidoptera9337
@lepidoptera9337 Жыл бұрын
@@michaelkaercher I am waiting and drinking my tea while the attention trolls on KZbin keep asking me for all the love they didn't get from their Moms. :-)
@georgeh6856
@georgeh6856 2 жыл бұрын
This is good for RISC-V. It is comparing ARM which has been around (and refined) for decades with RISC-V which is quite new.
2 жыл бұрын
Do you have thoughts about potential of Risc-V? I was curious about any production difference, like applied node size.
@GaryExplains
@GaryExplains 2 жыл бұрын
I talk about RISC-V's potential in my RISC-V series.
2 жыл бұрын
@@GaryExplains indeed you did! Quite a few as well kzbin.info/aero/PLxLxbi4e2mYFTkLsNYqWLrSQZtLB94wnY
@Navhkrin
@Navhkrin 2 ай бұрын
Unfortunately, there is no way to quantify which ISA is more efficient based on random boards from different manufacturers. There are too many variables in this equation to drive any meaningful data from these tests. One would need to custom engineer their own hardware while keeping CPU design really close to each other to be able to accurately quantify this
@tetraquark2402
@tetraquark2402 Жыл бұрын
Just spent three months learning the wrong instruction set. I'm a bit miffed about it
@andrewsutton6640
@andrewsutton6640 2 жыл бұрын
How do these compare with x86 chips, specifically in running programs that are designed for x86?
@GaryExplains
@GaryExplains 2 жыл бұрын
x86 chips can only run Arm and RISC-V programs using emulation. The opposite is also true.
@Schutti73
@Schutti73 Жыл бұрын
I am waiting for a fullsize PC with RISC-V CPU.
@GaryExplains
@GaryExplains Жыл бұрын
Why? What will it give you that x86 or Arm don't do/have?
@Schutti73
@Schutti73 Жыл бұрын
@@GaryExplains A useful PC instead of a developer Board that cannot do my averyday work with a open ISA AND a Open Source OS like Linux. X98_64 or the ARM Cors are not free.
@GaryExplains
@GaryExplains Жыл бұрын
@@Schutti73 When you say free, what do you mean?
@ArniesTech
@ArniesTech 2 жыл бұрын
Both are amazing and exciting alternatives to X86 💪🙏
@fjgaston
@fjgaston 2 жыл бұрын
It would be interesting to know also the idle power consumption, it would give an idea of how the boards would behave when powered with a battery.
@justinhall7819
@justinhall7819 2 жыл бұрын
I was just thinking the current measurements aren't very useful because of all the extra stuff on a lot of those boards. Plus the esp32 are not known for low power. You would have to compare active current with the idle current of each board.
@tails4e
@tails4e 2 жыл бұрын
Yes the delta power should show the true cpu energy used for the benchmark, maybe Gary can follow up?
@GaryExplains
@GaryExplains 2 жыл бұрын
The tricky thing with a delta number is that a CPU can never actually be idle. Even doing nothing is still looping and reading instructions waiting to no longer be "idle". To help in this situation there are two general solutions. 1. Lower the clock frequency and the voltage. This is something that smartphones and laptops do. 2. Put the CPU to sleep, this is a feature MCUs tend to have and it is similar to 1 but not dynamic.
@tails4e
@tails4e 2 жыл бұрын
@@GaryExplains thanks for replying. The motivation for the delta is to see the difference between the dynamic power consumption of the cpu architectures. I take the point that the cpu is never really idle, but I the case of MCUs, it should be at least the cores are idle, or running noops. I think the data would be interesting nevertheless. Idle power in itself would be interesting, so all 3 data points tells a story, idle, full load, and 'full load - idle'. Its quite surprising that a 22 year old design/process can still beat a 2 year old one.
@GaryExplains
@GaryExplains 2 жыл бұрын
I will look into this more and see if it is interesting enough for a follow up video...
@todayonthebench
@todayonthebench Жыл бұрын
A decent video. And yes, instruction set architectures don't largely impact power efficiency. Hardware implementation however impacts efficiency far more. But there is nuances on the ISA level that sets limits for actual implementations of the ISA. Be it limits on minimum transistor count, power efficiency, peak clock speed, etc. Sometimes one has to trade one aspect for another. As an example, a resource efficient architecture using few transistors will generally not offer all that great peak performance. While a more peak performance oriented ISA will tend to be hard to build with few resources. Power efficiency is meanwhile largely decoupled from this view of complexity, since power efficiency is more about how well a given piece of software can make use of the architecture provided. It is oftentimes better for efficiency to have dedicated instructions for complex tasks, but what tasks to choose is a debatable subject in itself. If one throws in everything but the kitchen sink, then it is often far from trivial to make an efficient hardware implementation of it in practice. In short, designing an ISA is all about compromises to reach a prespecified goal. And then make a good hardware implementation of that along the way. Then it is up to the market to find/make applicable software for it.
@Zhaymoor
@Zhaymoor Жыл бұрын
great video, thank you
@gamerlucky
@gamerlucky 2 жыл бұрын
waited for this for so long ... thanks to mr.sims for making it happen finally. thank you
@LogioTek
@LogioTek 2 жыл бұрын
Useful test but not good test on the topic of CPU core efficiency for several reasons: 1. likely system bus speed differences between these (system bus interfaces to on-chip SRAM) obfuscate differences between true CPU core performance/MHz/Watt unless you downclocked all of them to lowest common denominator system bus speed, 2. differences in flash memory/prefetchers further obfucate CPU core performance unless you ran the benchmark from RAM and even then some like M3/M4 could use dual-buses 1 for data and 1 for instructions making it unfair, 3. finally at least some of these probably manufactured on different process nodes
@GaryExplains
@GaryExplains 2 жыл бұрын
How would you suggest I resolve those issues?
@LogioTek
@LogioTek 2 жыл бұрын
@@GaryExplains Actually I didn't finish watching when I commented, I see you ran all of them at 1MHz later to level the playing field and I assume system bus was dropped to 1MHz also and that's a first important step. I would run all of these CPU cores at the system bus speed of the lowest common denominator system bus speed. The second step is to link to run the code out of SRAM instead of Flash on all of them. That's probably the best you can do to isolating core performance efficiency.
@hasanagera
@hasanagera 3 ай бұрын
11:20 how can current stay the same? I have searched many and many voltage regulators. They all come with no load or quiescent current. If you don't use a voltage regulator, it must use less current. This is not complicated.
@JohnnieHougaardNielsen
@JohnnieHougaardNielsen 2 жыл бұрын
I'd say that when it comes to efficiency, a number of major interest is how much power the chip/board burns while idle. Typical MCU systems are not to crunch numbers, but for control purposes. Numbers when ready to respond to Wifi may be the most interesting, but of course there are also applications not needing wifi while waiting to do a bit of work. As usual, comparing MHz across architectures is not useful, a more realistic yardstick could be a "maximally trivial" task like how fast it can count.
@borbetomagus
@borbetomagus Жыл бұрын
Hopefully you look into purchasing the DeepComputing/Xcalibyte ROMA RISC-V laptop (or a related RISC-V laptop or desktop) for a future video, but much more refinement will probably be necessary for it to reach it's full potential.
@robonator2945
@robonator2945 9 ай бұрын
minor but architecture can 100% be relevant for effeciency, speed, etc. Yes, a good x86 implementation can always sip power in comparison to a shit ARM implementation, but that doesn't mean implementation is all that matters. A slow algorithm on a super computer will outpace a fast algorithm on a microcontroller, but that doesn't mean picking the right algorithm doesn't matter, it just means that it's not the sole deciding factor. These architectures were invented to solve specific problems and to suggest that architecture is irrelevant is really just disengenous. No, the differences won't be direct, but the architecture influences the implementation; different architectures lend themselves better or worse to different designs, and some designs are better in some functionality than others. Intel was *_far_* ahead of AMD for a good long while, but then AMD started going batshit and putting dozens of cores on their CPUs and now at the ultra-high performance they're pretty unmatched. In single core they still lag a tiny bit IIRC, but in multicore it's real hard to beat 16, 32, 64, 128 seperate cores. Speed isn't just an RPG stat, there is a *_lot_* of nuance and 'speed' is really just the composite of how fast it can go and how easily it can go that fast. If your chip is the fastest thing in the world, but it takes 500x more work to develop for, it'll never take off. (outside of niche use cases of course) On the other hand, if your chip is 25% faster and a drop-in replacement, it'll spread like wildfire. One thing I really think RISC-V needs to work on is making sure that they go out of their way to make cross-compilation as easy as possible; that or invent a damn good emulation suite. (but only Apple has really ever pulled off performant cross-architecture support AFAIK. I hear a few projects are getting pretty good, but I've never heard of one *_really_* bridging the gap outside of Apple) A new architecture just can't demand people spend time porting their software unless they have something *_really_* good to offer, and really RISC is more of just an incremental improvement than anything.
@JamesFraley
@JamesFraley Жыл бұрын
Great video! I'd love to see one where you analyze just power efficiency. I use microcontrollers around my house to monitor just about everything. I'd love to know which would last the longest on a battery. They need WIFI so they can report in. But my requirements use very little processing. Just check the sensor and report in. Thanks!
@lepidoptera9337
@lepidoptera9337 Жыл бұрын
You aren't doing anything around your house that requires more than a lemon battery's worth of power. What you would need, though, are low power drivers for your network, which are hard to get, it seems. Just use whatever works and plug it into the wall. Who cares about a couple of Watts of extra power consumption.
@gadlicht4627
@gadlicht4627 2 жыл бұрын
It might be better to run multiple types of programs bc different ones may compute using different power drawing
@darssmare915
@darssmare915 2 жыл бұрын
Nice. You mentionned design importance but, to reiterate, the designer of the microcontroller is important here. I think your results show STMicro expertise.
@peterschets1380
@peterschets1380 2 жыл бұрын
Thanks Gary, now i must think about an application that does allot of calculations.
@adriancoanda9227
@adriancoanda9227 Жыл бұрын
On the eficient test, what data was used and where it was stored
@darthrainbows
@darthrainbows 2 жыл бұрын
May have already been mentioned, but Amps != power. When you change the input voltage to 3.3V and the current doesn't change, that indicates a change in power. I'm not fasmiliar with these boards, so IDK what the initial input voltage as, but if we assume 5V, and the current doesn't change when switching to 3.3V, then that is a 34% decrease in power.
@GaryExplains
@GaryExplains 2 жыл бұрын
Yes, of course, but that doesn't change the relative results, does it. What exactly is the point you are making?
@volodumurkalunyak4651
@volodumurkalunyak4651 Жыл бұрын
@@GaryExplains yes it does change relative results. Rpi pico does have a switching regulator not lineal one that outher boards have.
@-Slade-
@-Slade- 11 ай бұрын
Its kinda wrong to average out the performance. The esp32, esp32-s2 and esp32-c3 have an adjustable clock ( 80 Mhz,160 MHz and 240 Mhz). The newer Esp32-s3 can go as low as 10 Mhz. You can set the esps to 160 Mhz to compare to each other. You can also average the time it takes for fixed set of operations etc
@GaryExplains
@GaryExplains 11 ай бұрын
But the point is the power efficiency per MHz, which is what I showed. I don't think you understood the video.
@samiam4039
@samiam4039 Жыл бұрын
The comparison is not with new hardware. The visionfive 2 board looks to be 4 core risc v and by having a risc instruction set allows for better parallel processing, making the possibility of higher efficiency. The ability to boot from an nvme and the concurrent processing will need better coding , to achieve faster processing .
@GaryExplains
@GaryExplains Жыл бұрын
What has booting from nvme got to do with the efficiency of RISC-V?
@samiam4039
@samiam4039 Жыл бұрын
@@GaryExplains just a big improvement on visionfive 2 board efficiency’s. Not Risc-v specific. Currently no soc boards have nvme boot up and processing, not even raspberry pi.
@GaryExplains
@GaryExplains Жыл бұрын
Nvme boot doesn't improve efficiency, it improves IO performance, which isn't related to RISC-V in any way.
@GaryExplains
@GaryExplains Жыл бұрын
Also, I have a VisionFive 2 board, and looking at it there doesn't seem to be support to boot from NVME.
@AndersHass
@AndersHass 2 жыл бұрын
I do wonder how much current ran through them at the same clock speed.
@aneeshprasobhan
@aneeshprasobhan 2 жыл бұрын
Top notch work ! Thanks for the video :)
@GaryExplains
@GaryExplains 2 жыл бұрын
Glad you liked it!
@kasperlhde7893
@kasperlhde7893 2 жыл бұрын
Interesting video :) I do not think it is enough to just to power the 3.3v rail since there are other onboard electronics which also require a power (usb to serial converter) on the esp32 chip. It could have been interesting to see it compared to the datasheet :)
@broccoloodle
@broccoloodle 2 жыл бұрын
Back in uni, I still remember the active power (total power - leakage current power) is proportional to square of frequency. Can we use it to extrapolate the power usage of the pi to 160 or 240 mhz?
@GaryExplains
@GaryExplains 2 жыл бұрын
Or better still watch my previous video on this topic where I actually changed the clock speed of the Pico and measured the power usage.
@volodumurkalunyak4651
@volodumurkalunyak4651 2 жыл бұрын
Active power is proportional to Vcore^2 * frequency. Not frequency squareq but just frequency multiplied by core voltage squared. You may get around frequency squared when cores are pushed harder than above mentioned microcontrollers (not as hard as full boost latest Intel or AMD chips, frequency still has to be supported by changing core voltage).
@leonardosabino2002
@leonardosabino2002 Жыл бұрын
@@volodumurkalunyak4651 The formula I remember from university is proportional to voltage and to frequency squared (P ∝ V * f^2).
@volodumurkalunyak4651
@volodumurkalunyak4651 Жыл бұрын
@@leonardosabino2002 i literally wrote the very same formula: Vcore^2 * frequency power is proportional to frequency and to voltage squared. Power scaling does also resemble frequency squared at some part of volt-frequency curve (probably 0,7 to 1V region for latest chips)
@leonardosabino2002
@leonardosabino2002 Жыл бұрын
@@volodumurkalunyak4651 Not the same formula. Look again, it's the -frequency that's squared.- EDIT: I just looked up the formula, looks like voltage squared is correct. Sorry about that.
@kayakMike1000
@kayakMike1000 2 жыл бұрын
Hmmm ... Efficiency is largely dependent on the implementation and which extensions are used...
@GaryExplains
@GaryExplains 2 жыл бұрын
Did I not say that?
@kayakMike1000
@kayakMike1000 7 ай бұрын
​@@GaryExplainsyeah, sometimes I type out my thoughts before I watch the whole video. You did great.
@angeldude101
@angeldude101 Жыл бұрын
You mentioned that you encryption algorithms don't use floating point or integer division, but does use bit manipulation. I'll ask if it also uses integer multiplication, because multiplication by default comes in the same extension as division, but was also made available on its own as Zmmul. Bit manipulation instructions beyond basic bitwise logic are also their own extension B and its parts. Did the RISC-V processors used support these extensions, and if so did you tell the compiler to use them when compiling your code?
@Chris-wf2lr
@Chris-wf2lr 2 жыл бұрын
Why not transistor count instead of energy used, too many variables. Assuming transistor numbers usually correlate to cost ultimately… to show what architecture more efficient for the theoretical cost of production (if they were same fab, same node)
@GaryExplains
@GaryExplains 2 жыл бұрын
Transistor count doesn't correlate in any meaningful way. It won't help you decide what size battery to use etc. Power usage is the most important thing, everything else is just statistics.
@toorero
@toorero 2 жыл бұрын
I would have loved to see more different benchmarks hitting different areas of the MPUs, since concluding based on one very specific crypto-benchmark not even using floats seems quite off to me...
@GaryExplains
@GaryExplains 2 жыл бұрын
LOL, other people complained when they thought I was using floats (as some MCU's don't have an FPU). I just can't win. KZbin comments for the victory! 🤪
@BruceHoult
@BruceHoult 2 жыл бұрын
Outside of very specialised areas, almost no software uses floating point on desktop computers, let alone on microcontrollers! I've been programming professionally for 40 years and 99% of C programs I work on don't even have the word "float" or "double" in them. Gary's previous "Primes by division" benchmark was quite unrepresentative of normal programs, but this one sounds pretty good (I don't know if the actual source code is available?) so I for one applaud this change.
@Winnetou17
@Winnetou17 2 жыл бұрын
@@BruceHoult "almost no software uses floating point on desktop computers" u wot mate ? Browsers and games are "almost nothing" ? Though to be fair, I don't know much about other software, but I'd be surprised if these would be the only major ones. Still, I'd also say it's kind of irrelevant what desktop-level software use and then compare to what MCU-level software uses.
@BruceHoult
@BruceHoult 2 жыл бұрын
@@Winnetou17 "outside of very specialised areas". Games and browsers are specialised. A lot of people run them, it's true, but they constitute a very small proportion of the lines of code written or programmers employed.
@GaryExplains
@GaryExplains 2 жыл бұрын
Bruce, the code to Oceantoo is in my GitHub repo, there is also an accompanying video here on this channel.
@fluiditynz
@fluiditynz 2 жыл бұрын
Gary, M4 has an FPU speced on core. C3 has cryptographic modules. I'm very impressed with the quite new C3's placing on the list, but do you know if the C3 cryptographic processing components were used in your compiled code? This influences your results quite significantly.
@JustAnotherAlchemist
@JustAnotherAlchemist 2 жыл бұрын
The cryptographic co-processor in the C3 accelerate very specific algos (SHA and AES), and need to be expressly enabled in code through C headers as well as through the NVM configuration. His crypto algorithm is very custom(?), so I doubt it can even take advantage of the co-processor, let alone the fact that putting code forward that used co-processor on the C3 would kill compilation for all the other chips, since the header would have definitions for C3 specifics.... unless of course Gary was a complete A-hole and put #IF guards around that part of the code. (which would absolutely give the C3 and advantage.)
@jpjude68
@jpjude68 2 жыл бұрын
Isn't power consumption also a function of speed though? i wouldn't be surprised if the microcontroller's power consumption is directly proportional to the speed
@GaryExplains
@GaryExplains 2 жыл бұрын
Of course it is proportional to clock frequency.
@adriancoanda9227
@adriancoanda9227 Жыл бұрын
Arm is a risc chip. Also, it stands for reduced instruction set. Actually, you will nrrd to have the same motherboard with a socket mount in order to exclude other factors in the testing, but even then the fastest chip was at 240 mhz y won't se where those can make a use maybe in remote controls, elsewhere those are to slow, or use them I a insane cluster 999999999999x cluster but you will need a dam fast cluster management running within the firmware
@stefandebruijn3167
@stefandebruijn3167 Жыл бұрын
First off, nice that someone takes the time to do benchmarks; we can really use some more of that. However, I also think any benchmark that leaves out the different basic types is inherently flawed. An int32 benchmark is nice for pure int32 operations, but it still tells me nothing about int64, float32 and float64. For example, the ESP32 has an FPU for float32, but not for float64. It also leaves out any peripherals - but that's okay (if you need a certain peripheral you should just select on that)... For example, I have a few ESP-S2's here that use the TinyUSB stack. They are great, but whenever you feel like using the native USB in instead of the hardware uart, it starts to eat up your cpu cycles like cookie monster... it'll be the same story for the RP I suspect. Especially float can give very nasty surprises, I suspect it will be the same in terms of power consumption / efficiency.
@GaryExplains
@GaryExplains Жыл бұрын
I think the general wisdom is that floating point code accounts for less than 1% of microcontroller code. So doing a test that focusses on floating point is inherently flawed.
@stefandebruijn3167
@stefandebruijn3167 Жыл бұрын
​@@GaryExplains Where did you get that "general wisdom"? I know I've never seen it in my 30+ years of professional software engineering... Not saying it's incorrect, but in my experience it very much depends on the application how much floats are being used... Source? But even if it is correct, I don't think you understand how bad it really is. I actually did some benchmarks on the esp32 a while back, because I couldn't make heads or tails of the performance numbers. It has roughly 600 MIPS and just 1 MFLOPS (!) for common operations. That means that even if only 0.2% of your code is using floating point, it will consume 50% of your cpu power. It's that bad...
@GaryExplains
@GaryExplains Жыл бұрын
When I say general wisdom, I mean general wisdom, there isn't a particular source. However over the years I have seen multiple presentations that analyze real-world code and FP code is minimal, certainly on microcontrollers. That is why some microcontrollers don't even include an FPU, not needed really.
@stefandebruijn3167
@stefandebruijn3167 Жыл бұрын
@@GaryExplains Right, and as I said, I'm no amateur, and I've seen a lot of issues with FP over the years. At the end of the day it doesn't matter what the exact percentage is: since FP is so much slower than integer operations (for obvious reasons), the effects on the application as a whole are still significant. Whether or not FP is required for applications at all is a totally different discussion. Again, such discussion is eventually irrelevant; the fact is that regardless if it's a good idea or not, people use it for everything from motion control to PID loops and from UI's to signal processing. That is why there's a tendency for vendors to add an FPU: because it is needed. ESP, STM32F4 seem to agree with me. The RP2040 does not have one.
@etmax1
@etmax1 2 жыл бұрын
mA/MHz or mWh/MHz is a useful metric.
@GaryExplains
@GaryExplains 2 жыл бұрын
They shouldn't be too hard to calculate, I think all the data you need is presented.
@etmax1
@etmax1 2 жыл бұрын
@@GaryExplains absolutely, just saying that when comparing different chips they're a useful metric. I once did the OPS per MHz comparison for ARM to AVR/Microchip/MSP430 where the ops were x/+ - and I/O operations, it explains a lot of why all these architectures survive in the "32bit" world we now live in.
@GaryExplains
@GaryExplains 2 жыл бұрын
I agree. I will think about including that in any future videos. However, if you read the comments everyone has their preferred metric. People are asking for all kinds of variations. Some are even saying that per MHz anything isn't valid. It is a jungle out there!
@oidpolar6302
@oidpolar6302 2 жыл бұрын
It's never been about efficiency, was always about pcore license dependency
@GaryExplains
@GaryExplains 2 жыл бұрын
So the Raspberry Pi Pico is expensive at $4? 🤔
@TheShorterboy
@TheShorterboy Жыл бұрын
Your difference may be compiler, you would need to check the assembler out with gcc -S
@kychemclass5850
@kychemclass5850 2 жыл бұрын
V. Informative. Tq :)
@Andrew-rc3vh
@Andrew-rc3vh 11 ай бұрын
ESP32 also has an ultra low power processor.
@pnachtwey
@pnachtwey Жыл бұрын
4K fits in the cache. What about external memory access speew?
@DataSmithy
@DataSmithy 2 жыл бұрын
What do you mean by efficiency? There's performance efficiency, and then there's energy efficiency.
@GaryExplains
@GaryExplains 2 жыл бұрын
Did you watch the video?
@bobweiram6321
@bobweiram6321 2 жыл бұрын
Can't you measure the current directly from the VCC and GND pins?
@cheebadigga4092
@cheebadigga4092 11 ай бұрын
Damn the M4 is really nice!
@GaryExplains
@GaryExplains 11 ай бұрын
Indeed. I think it is my favorite Cortex-M processor!
@BrianKelsay
@BrianKelsay Жыл бұрын
Not sure if this is a valid question, but here goes. Based in these clock speeds, could one of these chips act as a processor in a micro DOS or Windows environment? Thinking kiosk that runs a corporate webpage and allows customer data entry or order entry on-site. Or tiny web book or a tablet just for web or ebook reader where its mostly text. I know that the Pi, which is more powerful and has a video decoder is slow at video and graphics. Just thinking that if not much computing power was needed, you could pair with a mid power graphics chip for running the display and decoding video streams. Then maybe you get TVs with minor computing and networking power. Or is this how they are making smart TVs?
@El.Duder-ino
@El.Duder-ino 2 жыл бұрын
3:21 22/23 years ago? R u sure Gary?😂🤣🤣🤣 Anyway Gary, well done comparison, thx!
@daniahmed
@daniahmed 2 жыл бұрын
Gary, there was an article that i read about a week age that Apple may be shifting away from ARM to RISC-V. What do you think that Apple will switch to RISC V or continue with ARM for the time being?
@GaryExplains
@GaryExplains 2 жыл бұрын
If we read the same article it says that Apple is using RISC-V for some of its small co-processors, that is all. It is a good engineering choice, if it has to design bespoke hardware blocks then RISC-V is a workable solution.
@daniahmed
@daniahmed 2 жыл бұрын
@@GaryExplains Maybe but that Article had some text about moving to RISC-V that Apple might be considering. Moving to RISC-V would benefit Apple in long-term as they wouldn't have to keep paying ARM for royalties or whatever deal they have with ARM. What's your take on this?
@GaryExplains
@GaryExplains 2 жыл бұрын
No, that part was just pure speculation because otherwise it would be a boring article and no one would read it.
@daniahmed
@daniahmed 2 жыл бұрын
@@GaryExplains ok, thanks for clarifying.
@TheLouKou
@TheLouKou 2 жыл бұрын
Garry, please, you;re killing me! It's ESPRESSIF, there is no X in there! XD
@schizoidman9459
@schizoidman9459 2 жыл бұрын
It's not that surprising that the winners are the ones with faster clocks especially when in dual-core configurations the second core is just set idle (why?) This test was designed to benefit single cored and higher clocked processors. Multicore processors are known to have better performance at much lower clocks frequencies and consequently more energy efficient. I'm having a very hard time to understand what's the motivation here. However, obviously the performance is never about a difference in ISAs, especially if all of them are RISC architectures. Put some CISC ISAs in the mix and you will see huge differences, though. Also chips that run at faster clocks generally consume more energy. That's the whole deal about multicore architectures, to have high performance with low frequencies. The whole deal of ARM processors and their use on mobile devices is exactly that. No surprise here either, since this is obvious. The only surprise is to see a processor running at 72 MHz consuming more than one at 160 MHz. I think this must come from the fact that you are measuring power at the board level, not at the processor itself, otherwise we would see this reversed. Now about RISC-V. There is no way RISC-V processors could compete at any level with ARM processors that have been far and wide used in smartphones. RISC-V processors are new kids in the block that are running far behind. RISC-V is still lacking the support needed to be better than ARM processors. But there is a huge advantage of RISC-V, though, that cannot be measured for now. It's potentially much cheaper to produce RISC-V than ARM processors, since it is an open and free ISA. However, we cannot still see the advantage in prices because they are still not produced in high volume. Volume production is everything in chips prices. But we can expect to see much cheaper RISC-V processors in the future to the point of beating ARM processors prices. I think that's where the RISC-V will position itself as a competitive ISA.
@GaryExplains
@GaryExplains 2 жыл бұрын
I am planning a dual core follow up video. Also the devices with a higher clock speed didn't "win".
@schizoidman9459
@schizoidman9459 2 жыл бұрын
​@@GaryExplains: Thanks for your comment, Gary. You are probably referring to the hypothetical comparison if the processors were all running at 1MHz. You know that just multiplying the time by the clock frequency is not a very accurate performance indicator. I am looking forward to see the comparison between dual-core and single core processors. For the kind of comparison (very repetitive and computing intensive tasks) you are doing, you would generally be better with dual-cores. However, that's not always true. As I stated in another comment in another video, modern architectures have lots of intrinsic parallelism (that translates into several instructions executed per cycle) that simply don't work when you impose atomic execution to synchronize threads. That benefits single cores better than multicores. In my estimate, to start having clear cut better performance in multicore you need at least 8 cores, unless you don't use atomic operations. That's the reason smartphones have dedicated cores for certain activities, because in this way you don't need synchronization. The advantage of these configurations is simplicity, you don't need load balancing. But the problem is that you will have most cores idle if their correspondent activities are not taking place.
@GaryExplains
@GaryExplains 2 жыл бұрын
Except for the raw performance test (ie how many Ms to complete the task), none of the higher clock speed microcontrollers won. As for the clock frequency, in my previous video I actually changed the clock speed, and while performance isn't perfectly linear it is quite close, certainly close enough to make meaningful comparisons.
@schizoidman9459
@schizoidman9459 2 жыл бұрын
@@GaryExplains : Thanks. I didn't see your previous video, so I just assumed you multiplied the frequency by the time. It seems I will have to see this video again to understand what you mean with "raw performance". I probably overlooked that. Sorry.
@GaryExplains
@GaryExplains 2 жыл бұрын
Also, I don't think MCUs have much in the way of ILP, and certainly not out of order execution.
@lepidoptera9337
@lepidoptera9337 Жыл бұрын
Efficiency is not something you can achieve at the component level, kids. It's a system architecture issue.
@zizlog_sound
@zizlog_sound Жыл бұрын
Architecture is made by components on the micro and nano level.
@lepidoptera9337
@lepidoptera9337 Жыл бұрын
@@zizlog_sound Yes, and if you put the wrong algorithm on your CPU it performs like dog-poo on a stick. ;-)
@ps3301
@ps3301 Жыл бұрын
Risc v only has open source as their selling point.
@u9vata
@u9vata 2 жыл бұрын
I actually disagree that architecture does not decide efficiency just implementation. It is not so simple! You can say that one can implement both using steam engines and valves and silicon and make a HUGE difference - but you can also come up with instruction set architectures that are "easier" to implement well on silicon or harder to implement well. The same is saying that you cannot basically tell which programming language is faster because they are just a language - a specification - not an implementation of that. Yes you can make a very dumb C compiler that makes it slower than python, but for some reason all well-made C compilers are tons faster... Why? Because the specification was made that way (or ended up that way) so that it can be more efficiently implemented both on average and on best effort! Trust me: I can come up with an instruction set architecture any day that can be only implemented garbage slowly if you don't understand what I talk about or parallels... Also would be good to have riscV SoC variants that does not have the integrated wifi... being in sleep or not existing likely makes a huge difference (if not for other, then it takes space on the die which could be utilized to have lower consumption solutions for example).
@GaryExplains
@GaryExplains 2 жыл бұрын
True, but we aren't dealing with a specification that you "can come up with", we are dealing with Armv6 and RISC-V. That is the context, not "u9vata's amazing ISA".
@u9vata
@u9vata 2 жыл бұрын
@@GaryExplains Sure. That is an extreme example, but it shows well that ISA specs - just like programming language specs - do have a leverage on the possibility of implementation. I expect you to know that and don't get me wrong, this is all interesting data worth really measuring (I would even argue that if most ppl literally use whole boards among makers, whole board consumption / efficiency is also not bad to measure and makes it more real life stuff). Only saying that it could be misleading also to think that spec or ISA cannot affect performance of average, worst and best implementations efficiency range. It is not as simple to measure that like measuring a product and I guess it is actually too early for that but it is not just business randomness that all mobile devices have not x86 but arm for example and generally fare better. It is not just the implementation, but the spec can be better suited for this or that.
@GaryExplains
@GaryExplains 2 жыл бұрын
The reason why I emphasize the implementation over the ISA, is because people think that RISC-V is magic and that the ISA itself will somehow solve problems of power and efficiency. That is nonsense. Since the context is two well designed and well defined ISAs then I think my statement stands and isn't misleading.
@u9vata
@u9vata 2 жыл бұрын
@@GaryExplains I think to be honest it is too early to draw conclusion on which ISA fares better when it comes to efficiency. I am pretty sure there will be difference, not yet sure what direction. Has hopes and that is all. So just like you emphasize the implementation over ISA so that people might thing ISA is the only factor, I want to emphasize that it is indeed a factor and a defining aspect - but on a different granularity: Implementation more tied to the product directly. ISA more tied to class of products overall statistically relevant effficiency generally. I think we do not really contradict each other here to be honest.
@FranzzInLove
@FranzzInLove 2 жыл бұрын
Some feedback: - Current draw is not exactly directly proportional to clock frequency, for instance at lower frequencies, efficiency can be worse because there is some "idle current" that doesn't change much and becomes more important relative to the clock based current. So I think it would be better to set the clock frequency of the MCU at the same speed, and do the same tests at different clock speed (because they might have different sweet spots). - If the goal is to compare architecture and not simply the MCUs, I think this is only a fair comparison if the chips are manufactured using the same technology node, I do not know if it is the case. - I think measuring the board current instead of the MCU current is not great either, I don't know for those specific circuits, but there are many ICs which easily consume a few mA doing nothing, some of them even when they are "turned off" (shutdown current in datasheets is usually low, but not always). One way to measure just the MCU current would be to completely remove other circuits from the board (yes, it's more challenging, and destructive to the board).
@GaryExplains
@GaryExplains 2 жыл бұрын
Some feedback on your feedback: - I did that in the previous video on MCU power efficiency. - The goal was to show the current state of RISC-V MCUs and to debunk the myth that just because a processor is RISC-V, it somehow means it is inherently better. - I covered that in the video and made the same point myself, did you miss that segment?
@FranzzInLove
@FranzzInLove 2 жыл бұрын
@@GaryExplains Thanks for your reply, I had not seen the other video. Your graph at around 12 mins shows what I mean. For instance, at 240 MHz, rpico consumes 0.16 mA/MHz, while at 50MHz, it consumes 0.26 mA/MHz. Similar results are seen for ESP32. If it was linear, it would be the same number. That's actually a larger difference than I thought it would be. It is counterintuitive, but I believe MCUs tend to be more efficient at higher clock speed (likely up to a certain threshold). Hence, comparing the energy usage at different clock speed seems to favor the boards running at higher clock speeds. If the goal is simply to show that a risc-v chip can be less efficient than an arm processor, it is achieved, but then IMO, the title "Arm vs RISC-V? Which One Is The Most Efficient?" is a tad misleading, I was hoping to get a comparison of efficiency of risc-v compared to ARM, which would need to control the other parameters (especially the technology node, since it is likely a huge factor). Still an interesting video nonetheless. You did mention it in the video that you measure the board current. Depending on what's on the board this may have a huge impact. I now had a quick look at some schematics and it looks like the boards are quite bare (though I'm not sure what's the exact board you use in some cases), so it may not be that important in the end. One thing I noted though is that most board use an LDO while the Pico apparently uses a DC/DC converter. Boards that use an LDO should indeed have the same current going in 5V as in 3V, however this should not be the case for the DC/DC converter. Efficiency of those LDO is 3.3/5 ~= 65%, while efficiency of the DC/DC converter of the pico is mentioned "up to" 90% (though this varies with consumption). This is an advantage towards the pico board, not related to architecture. If you indeed measure the same current when supplying the pico from 3.3V, it is either because the efficiency of the DC/DC converter is actually 65% as well, or because there is some leakage to the DC/DC converter when there is a voltage applied to its output while its input is floating (which is possible since it is likely not an intended use case). Just to make it clear, I just wanted to provide some constructive feedback, I'm subscribed and enjoy watching some of your videos, I hope this doesn't come off as arrogant.
@TorbjrnViemNess
@TorbjrnViemNess 2 жыл бұрын
​@@FranzzInLove I agree; if the goal was indeed to compare the efficiency of Arm vs RISC-V, the best way to do it (aside from getting two different chips that are identical, apart from the CPU core - so same node, same class, same memory, same speeds etc.) would be to record the actual number of instructions executed for a given benchmark - i.e. the _dynamic instruction count_. This is the only meaningful number to look at when comparing one ISA vs another. Otherwise you're just comparing chip vs chip. And the direct comparison of cycle counts that was done in this video isn't realistic either, for the exact reason that Gary actually explained just before showing the comparison; memory systems are running slower than the cores themselves and often have a somewhat fixed latency when reading data (and instructions), so you'll typically waste more cycles waiting for memory when running the CPU at a higher frequency. So Gary: nice try and I really appreciate that you focus a bit on my field (MCUs) as well, but for this particular comparison it could've been a bit better - at least from a "comparing ISAs" point of view, from a "comparing MCUs" point of view it was great! :)
@marcwagner3762
@marcwagner3762 2 жыл бұрын
What about some more RISC-V Boards...
@bjarnenilsson80
@bjarnenilsson80 Жыл бұрын
Well is ut really fair to compare a first gem dev kit( for risc-v) to something that , I assume,a set of products that had years of optimisation ?
@GaryExplains
@GaryExplains Жыл бұрын
In fact that is the whole point.
@ole.petersen
@ole.petersen Жыл бұрын
But shouldn't the power (aka V*I) be more relevant than the current (I)?
@GaryExplains
@GaryExplains Жыл бұрын
Of course. That is why I present mWh towards the end. But when V is constant then I is important.
@ristekostadinov2820
@ristekostadinov2820 Жыл бұрын
Are all these microcontrollers fabbed on same process node (and by same manufacturer), for example fabbing m4 or 40nm and 20nm will differ in performance and power efficiency.
@GaryExplains
@GaryExplains Жыл бұрын
The Arm one is on 90nm, the RISC-V on 40nm.
@TheFlashPod
@TheFlashPod 2 жыл бұрын
I have to say that only the last plot (mWh to the task) makes at least some sense... But in general I would say that you can not generalize these boards and compare them directly. MHz is not linear to power comsumption. It's quite simple: The esp32 boards can run at 240MHz and are there for the fastest. It does not matter if the M4 can "compute more per MHz", if it is capped at 100Mhz and therefore is still slower to do the task... If you are looking at power efficiency you probably do not need those high clock speeds anyway. You can power down the Modem of the ESP and that will cut down the power substantialy. If you want to compare the ESP32 to the M4, you should clock down the ESP to comparable levels and run the tests again.
@GaryExplains
@GaryExplains 2 жыл бұрын
Hmmm... If you look at my previous video about microcontrollers you will see that I actually did change the clock speeds. While it isn't linear it is very close.
@NNokia-jz6jb
@NNokia-jz6jb 11 ай бұрын
6502 is the best.
@PrivateSi
@PrivateSi 2 жыл бұрын
The ISA does make a small difference, and the fetch-decode speed was a large factor up until high mHz and pipelined branch prediction. A clean(ish) slate approach to both the ISA and IMPLEMENTATION SPECIFICATIONS of Risc-V working in tandem is what gives RISC-VECTOR the edge. -- A proper Vector Processing specification instead of SIMD (an ISA DISASTER that should have stopped at SSE4 on the X86, and should never have been introduced into the ARM ISA... A Vector processor would have been vastly preferable and the tech was well proven., -- A major benefit is to combine CPU + GPU programming into one (much more) bare metal ISA for both, eliminating a ton of API translations and JIT compilation, Short but quite efficient or very long and perhaps more efficient pipelines can be experimented with by LOTS MORE CHIP (PART) DESIGNERS, while developers get a STANDARDISED ISA.. -- Bare metal GPU Compute will be much EASIER. Integrated graphics and general purpose Vector processing compliment each other, but software-only graphics systems using just the vector processor and a few CPU cores could be more efficient and good enough for web + office..
@GaryExplains
@GaryExplains 2 жыл бұрын
You think microcontrollers have branch prediction?
@PrivateSi
@PrivateSi 2 жыл бұрын
@@GaryExplains .. not yet, and hopefully never! I agree on the microcontroller front RISC-V is no better than the Pi Pico spec. It's also less RISC than the pico.. The low end RISC-V spec now includes basic, MMX level integer SIMD, probably FP SIMD when it's finalised then extended, so quite bloated compared to Pi Pico ISA. -- I'm an ARM fan but think the High End RISC-V spec is a better idea (Vectors vs fixed sized SIMD).. Risc-V is an ARM killer, X86 never was... ARM is still the most likely X85 killer but Intel and AMD will probably race to replace X86 with native Risc-V and emulated X86. 10s to 100s of smaller SOC designers and manufacturers will obviously also prefer Risc-V. -- Sadly ARM's days are numbered. It may well have to abandon its ISA and many core implementation details when it too goes Risc-V.. Open Standards are very powerful forces.. Look at IBM PC, HTML + CSS, Unicode. For better or or worse, these royalty-free technologies alays dominate. -- I actually prefer 2 byte opcode ISAs using a few tricks and vector processing over SIMD. de-bloats the cache and pipeline. Risc-V is getting more bloated despite its lack of SIMD. Too many cooks spoiling the broth will be the reason Risc-V fails, if it does, which it probably won''t. A (US) Big Boy could buy out the project I suppose, and ruin or bury it, but that's unlikely too.
@SlugCatLife
@SlugCatLife 2 жыл бұрын
I don't feel like I got an answer. Also you did not list what the architecture (risc or arm) was in the graphs.
@GaryExplains
@GaryExplains 2 жыл бұрын
The processor is shown along the x axis.
@AndersHass
@AndersHass 2 жыл бұрын
But which is more efficient RISC-V or ARM in Minecraft lol. But still important point that what is used to handle the instruction sets matter way more than instruction sets themselves.
RISC-V is Coming to Android (Eventually)
11:16
Gary Explains
Рет қаралды 22 М.
The Genius of RISC-V Microprocessors - Erik Engheim - ACCU 2022
1:01:17
ACCU Conference
Рет қаралды 92 М.
когда не обедаешь в школе // EVA mash
00:57
EVA mash
Рет қаралды 2,9 МЛН
The joker favorite#joker  #shorts
00:15
Untitled Joker
Рет қаралды 30 МЛН
Поветкин заставил себя уважать!
01:00
МИНУС БАЛЛ
Рет қаралды 6 МЛН
Explaining RISC-V: An x86 & ARM Alternative
14:24
ExplainingComputers
Рет қаралды 454 М.
X86 Needs To Die
1:09:15
ThePrimeTime
Рет қаралды 491 М.
Jim Keller: Arm vs x86 vs RISC-V - Does it Matter?
10:11
TechTechPotato: Clips 'n' Chips
Рет қаралды 87 М.
Arm vs RISC V- What You Need to Know
22:19
Gary Explains
Рет қаралды 305 М.
The PC industry is changing: RISC-V goes mainstream
15:20
Jeff Geerling
Рет қаралды 308 М.
The Magic of RISC-V Vector Processing
16:56
LaurieWired
Рет қаралды 307 М.
DIY 256-Core RISC-V super computer
10:29
bitluni
Рет қаралды 257 М.
когда не обедаешь в школе // EVA mash
00:57
EVA mash
Рет қаралды 2,9 МЛН