CPU Micro Architecture Levels Are Not Real

Рет қаралды 16,681

Күн бұрын

Пікірлер: 202

@thomasburette9129 Ай бұрын

What a nightmare. Imagine being the bedrock of the entire software architecture then looking down and realise the hardware under you is quicksand.

@bigpod Ай бұрын

Have you ever seen arm its even worse

@oserodal2702 Ай бұрын

@@bigpodFor all its shortcomings, ARM actually does this specific situation marginally better.

@bigpod Ай бұрын

@@oserodal2702 really take 3 random ARM CPUs what the chance that their instruction sets are completly different, but with jsut common base

@leftybot7846 Ай бұрын

@@bigpodif things will continue to go in this direction we'll soon be back in the days where code could not be executed on another machine. True cyberpunk dystopia

@bigpod Ай бұрын

@@leftybot7846 well i forsee even worse compatibility on Risc V at least with ARM arm kinda mandates common base

@Archbtw_ Ай бұрын

at this point i'm surprised devs can even agree on the meaning of the word "standard"

@jamesbeebe2870 Ай бұрын

What are words even in this day and age

@autarchprinceps Ай бұрын

That's because it isn't a standard, it is a grouping for simplification.

@hubertnnn Ай бұрын

They cant. It reminds me of the situation where github replaced keyword "master" with "main" to not upset some minority. The effect was that our entire automation chain broke down and I had to spend 2 days figuring out what is going on, that 20 applications started failing at the same time for no apparent reason. Yes it took me so long, because "main" and "master" look so similar that neither me nor 3 other developers didn't even spot the difference for a few hours. A lot of money was lost that day.

@nectarinetangerineorange Ай бұрын

We know what 'standard' means.... Whatever we pretend to do in front of management

@jamesbeebe2870 Ай бұрын

@@nectarinetangerineorange I like this definiton lol

@nintendoeats Ай бұрын

A point: you are talking about Instruction Set Architecture, not microarchitecture. The ISA defines what instructions and registers the CPU has. The microarchitecture is how a specific CPU implements the ISA that it supports.

@stefanalecu9532 Ай бұрын

To give a concrete example relevant to the video: the ISA is x86_64, while a microarchitecture would be Skylake or Nehalem Similarly, AMD implements the same ISA, but an example of a microarchitecture would be Zen 4 or Excavator or K6.

@MechMK1 Ай бұрын

I don't think CPU microarchitecture levels are a good idea, precisely because of what Linus said: They're not a strict linear progression. There exist concurrent CPUs by the same manufacturer, which include different feature sets. The only correct way to handle this situation is to query specifically which features an individual CPU supports and use these features on-the-fly. This also does not cause worse performance. Querying CPU ID on startup is quick, and setting pointers to certain calls making use of such functions is fast too.

@olnnn Ай бұрын

That was literally what these levels were first designed for - to be used with the glibc-hwcaps mechanism to that loads different versions a library depending on the cpu. The levels adds some simplified baselines instead of having a million different files for each variation of CPU instructions.

@kuhluhOG Ай бұрын

@@olnnn the only problem is, the levels give the impression that it's linear, the features are added in the form of a line in reality, it's a tree (or even something more complicated) and even more when modern CPUs remove certain extensions

@nobodyimportant7804 Ай бұрын

@@olnnn Leave it to the GNU project to overcomplicate things but if it was "designed" for glibc, then the kernel has no reason to follow suit.

@jonathanbuzzard1376 Ай бұрын

Compiling to v3 gives a 5 to 10% improvement in performance. I work in the HPC space and there a substantial performance benefits to get your compiler options right. For most people not flogging the CPU at 100% for days on end over hundreds of nodes it is not worth it though.

@kuhluhOG Ай бұрын

@@jonathanbuzzard1376 the argument is not that these optimizations aren't desirable, but that the way these are being done (in "levels") is misguided since these levels imply some sort of linear progression even if there isn't such a thing

@ninele7 Ай бұрын

x86_64v3 is mostly fine. The only problem is that somewhat recent mobile CPUs as you mentioned don't support AVX, so they fall into x86_64v2. There are no x86_64v2/v3 CPUs. All hybrid CPUs support x86_64v3. And on all hybrid CPUs advanced instructions aren't available when E cores are enabled (and advanced instructions aren't advertised so no one is using them). The problems starts with v4. While AVX512 was developed, the consensus was that all cores sooner or later will include 512bit vector support. But, Intel couldn't implement 512bit vectors in small cores in efficient way, so they came up with new spec AVX10, which specifies 512bit vector support as optional. Now v4 is only universally supported on zen4+ and Intel server CPUs (which is quite a lot). But the problem is, that there will be new CPUs which would support lots of great instructions above x86_64v3 but wouldn't support x86_64v4. And no one knows what to do about it. Maybe, we should just drop v4 level, and create something in it's place that would be universally supported sooner or later.

@jamesbeebe2870 Ай бұрын

No we need the AVX512, I need it for my PS3 "backups" to run right lol

@oscarsmith3942 Ай бұрын

v4 is fine as long as you don't care that Intel releases new chips that suck. Sure there will be new CPUs that fall in between V3 and V4 (just like Atom falls between V2 and V3), but it's not Linux's fault that Intel doesn't know how to implement an instruction set that they designed.

@No-mq5lw Ай бұрын

I guess we could create minor point revisions (e.g. x86-64v3.1) like what ARM does, but I don't expect that to make anything clearer. Walking back AVX-512 from its own major version would 100% help greatly.

@ninele7 Ай бұрын

@@jamesbeebe2870 With fast enough CPUs RPCS3 will ran fine even without AVX512. So while now it improves performance, in 4-5 years it wouldn't be necessary.

@niteriderevo9179 Ай бұрын

sorry, but i agree with the linux kernel's creator, the whole 'x86 levels' thing is a hard no, confusion at best. cpuid register will give what's available or not already.. the 'levels' thing implies that there is a whole generational subset of instructions that has to be fully present, that kind of really isn't a thing with how both intel and amd kind of really pick-and-choose what x86[-64] ISA addons are supported or not.

@MonochromeWench Ай бұрын

E-cores and P-cores having different instruction sets is not a problem with feature levels but an unnecessary complication caused by intel. Intel wants to say they have AVX512 but only half the CPU actually supports it. Get rid of feature levels and the problem still exists As the CPUs only support AVX512 sometimes. This is an intel problem that screwed the v4 feature level. We can just say that all intel CPUs with E-cores are are V3 and the problem goes away.

@autarchprinceps Ай бұрын

I think Arch with ALHP provides a pretty good solution for most of this. The base libraries of Arch are v1 (or in special cases complex enough to select the vector extension to use themselves, largely for CPU video encoding or the likes), and if your CPU supports it you can add each feature level on top with ALHP, and whatever package still compiles properly with it will be available at that feature level, and you use the newest available feature level of both the package's and your CPU's limitations. That way no one with old hardware get's left out, and you still get the speedup for pretty much 99% of packages. Win win. The primary issue remaining is indeed if you have a big little feature difference situation as you describe. But that is just plain a stupid idea. It even breaks compiling it locally with -march=native then, or from JIT. You'd have to pin software using the new features to P cores or compile each programm for each core type and then pin each process to a core type. What a nightmare of a design. I think even if they are not properly faster on them, the efficiency cores should have to support the same features fundamentally at least, or else I don't know how Intel thinks any software will be properly able to make use of them. Big little is a pretty good feature on ARM, but Intel seems to just have issue after issue with it.

@mohammedgoder Ай бұрын

Dude, this isn't micro architecture. This is about ISA. I was thinking you were going on a deep dive on micro arch. Fix the title. Micro arch is the physical layout of the chip. ISA is the interface.

@somenameidk5278 Ай бұрын

The sources shown in the video call it microarchitecture levels.

@mohammedgoder Ай бұрын

@somenameidk5278 I'm not surprised. People in the industry misuse words all the time.

@YouSeeKim Ай бұрын

This seems like some reincarnation of the browser detection versus feature detection debate from back in the days when we saw a lot of emerging features in some browsers at the same time as older tech like Internet Explorer 6 still had a not so insignificant market share where polyfill libraries only can get you so far. Browser detection is seen as bad practice but that is easier to say when it all can be done at runtime. so it will be interesting to see what best practice will prevail in the Linux community with the constraints they have to work with.

@MixMastoras Ай бұрын

Guess what! There was a time when some PCs didn't have all the latest SSE instructions! This stuff happens all the time but the burden is always on the developers to strictly limit or widely support all architectures and their instruction sets!

@IbilisSLZ Ай бұрын

There was a time not everyone had math coprocessor ;P.

@AccSwtch50 Ай бұрын

Now is the time where not everyone has all the avx extensions.

@elmariachi5133 Ай бұрын

@@IbilisSLZ There was a time computers ran using water! Can you go earlier? xD

@IlluminatiBG Ай бұрын

This reminds me in the time where we were writing JavaScript for specific version of a browser. At some point you realize, why not check for features than versions. Of course, for CPU to do that they need to augment CPUID instruction to provide that information, hopefully they do, so compilers use that for optimization instead of levels.

@uis246 Ай бұрын

march=native uses CPUID

@Daktyl198 Ай бұрын

Distros have to maintain large repos of compiled software. The fewer copies of a program they have to compile, the better. Runtime checks are never going to have the same performance as being optimized during compiling/linking. It makes far, far more sense for distros to check CPU features at install time and then assign a V2/V3 repo and mirrors depending on the result than to compile all of their packages with a hundred runtime flags and thus codepaths, none of which are particularly fast and all of which balloon the size of the binary.

@mk72v2oq Ай бұрын

@@uis246 no it doesn't. It just sets the march to whatever CPU model you are currently running. $ gcc -Q --help=target -march=native | grep 'march' -march= znver4

@uis246 Ай бұрын

@mk72v2oq it also sets cache sizes and supported instructions.

@Linuxdirk Ай бұрын

A 4 years old chip is v2 ... Looking at my 10 years old CPU. [chuckles] "I'm in danger"

@AnEagle Ай бұрын

If it's an intel cpu, it may well be v3, the 4000 series supports it

@SlinkyD Ай бұрын

8:10 found that out the hard way a 5 years ago. Still dealing with it now on a different machine. I just wanna compile a program to run cus I got the code. Not wait until "hopefully" it make it into a repo cus somebody else built it, or a damn flatpak. Good code I know that compiles on somebody else's machine, won't on mine cus of that one/four things that my CPU ain't got but "should" have. I see why mom used to get computers from her job in the 90s and NEVER bought a new one. She "need it to work, not waste my fucking time". Best computer lesson after "garbage in, garbage out".

@forivall Ай бұрын

Torvalds is still making Monty Python references... What a champ

@RandomGeometryDashStuff Ай бұрын

05:14 does program that use avx2 sometimes work and sometimes crash depending on what cpu scheduler decides?

@hubertnnn Ай бұрын

Kinda yes. That's exactly what was happening on older Windowses when intel first released those CPUs.

@Daktyl198 Ай бұрын

V3 and prior are really good to have for distros to compile against a "group" of CPUs. It's trivial to query a CPU during install time to determine if the CPU contains the feature-set required for V3, V2, or V1 and then assign the proper repositories/mirrors based on the results as CachyOS does and what Fedora is planning on doing. CachyOS proves that there is performance to be gained (sometimes quite a bit) simply by compiling a program with support for newer instructions. V4 is a mess, but even just using V3 compiled packages on a V4 CPU is far, far better than compiling the program with no newer instructions than basic SIMD. Using V3 as baseline and V2 repos for "classic" CPUs detected on install seems fine as well in Ubuntu's case.

@nobodyofconsequence6522 Ай бұрын

CPU microarch levels are a good compromise between never ever deprecating and letting old hardware hold us back vs building a unique package for every sodding cpu family. I like them. I think we should be creating a new one roughly every 5 to 10 years on based on "every CPU released in the last decade supports this instruction, right? Yes? Yes! Great, add it to v6 or whatever"

@fireztonez-teamepixcraft3993 Ай бұрын

Have you watch the video? Because he clearly explains what is the issue with the microarchitecture levels as it is implemented. I'm not an expert, but in my opinion the best way would be to simply check if the technology in question is supported and if it is, enabled it, if not just disabled it. CPU architecture and technology change with time, new technology is implemented, older is deprecated or completely abandoned. For me microarchitecture level seems more limitative than beneficial at the end, especially when the exact things that are supported changed from CPU to CPU and not linearly. It could work if Intel and AMD would have hard set standard of what will and will not be supported by their next CPU generation, but this is not the case at all. Both brand also have a tendency to rename older low or mid-tier CPU with new name, so you could get a brand new CPU with new naming conventions from a 5 yrs old architecture, just to make things even more complicated.

@thewhitefalcon8539 Ай бұрын

You can define compatibility levels but they have nothing to do with the CPUs, only your package builds. You can say we build four versions of the package: one for the latest consumer desktop generation, one that supports all CPUs back to 2019, one that supports all CPUs back to 2005, and one that supports all CPUs back to the 386. That's fine.

@stefanalecu9532 Ай бұрын

If x86_64-v4 is such a shitshow already, wait until we get to x86_64-v5, that will be so much fun

@nobodyofconsequence6522 Ай бұрын

@@fireztonez-teamepixcraft3993 "I'm not an expert, but in my opinion the best way would be to simply check if the technology in question is supported and if it is, enabled it, if not just disabled it." This works if you compile your own packages. Most people don't. I fucking don't. Compiling is a heavy workload. A little package like ffmpeg takes 2 minutes and 52 seconds and I have a Ryzen 9 3900. A browser like chromium will take 2 hours and eat 32 gigs of ram in the process. You do not build your own browser. It's not worth it. Any time that would be saved from the more optimized binary over the 1-2 weeks the binary stays current will easily be eaten by the compiling process. So that leaves anyone with a brain stuck with a precompiled binary. And nobody is going to build a separate binary for every CPU family. Why the fuck would anyone build "chromium for haswell" and "chromium for broadwell" and "chromium for skylake" and "chromium for coffeelake" and "chromium for bulldozer" and "chromium for zenv1" and "chromium for zenv3", each build taking 2 hours of compute on a decent CPU when they could just build "chromium for the lowest common denominator, who cares about your cpu's fancy features nerd?". The people who'd even heard of the difference are in the hundreds. x86_64_v3 is the next lowest common denominator. It would be ridiculous to have 20 different versions of the chromium binary which would take a total 40 hours to build. Anyone who cares is probably building their own packages anyway. But x86_64_v3? That's most CPUs from the past few years! You may find a distro willing to build its entire set of packages just one more time. Like they did when i386 was a supported architecture. That's much more reasonable an ask. Yes I watched the video. I've also been through gentoo and back and know from that experience exactly why your suggestion is absolutely pie in the sky. Nobody is offering to build you a special snowflake package designed to run perfectly on your specific hardware. Your options are x86_64 baseline, x86_64_v3 if you're on a really forward thinking distro (which is still like the common denominator of 10 years ago) or build your own damn packages, and enjoy not using your computer ever.

@hoefkensj Ай бұрын

gentoo level : does -march=native and -O3 count lol

@Winnetou17 Ай бұрын

Gentoo FTW! I like how so many problems/issues are not a problem/issue in Gentoo.

@guildpilotone Ай бұрын

forgive my ignorance here - can user apps still be compiled to use "advanced" instructions (given that your CPU has them) independent of what the kernel is compiled for?

@atiedebee1020 Ай бұрын

Yes

@guildpilotone Ай бұрын

@atiedebee1020 Tanx. I knew that used to be true for PPC Macs, but not sure w/Linux.

@nivayu Ай бұрын

I first thought, that the CPU levels were similarly defined like in Vulkan. Basically define a common set of features flags and minimum available resources (like min memory cache size). Over time, as hardware and software advances, add new levels with the now more modern generally available features, that software developers then can target. Having it clearly defined as a feature set would mean, that it's just a mapping of the standard to the list of features flags it contains. Basically, for example in the kernel, it would internally only use feature flags as it does today and the cli in the very beginning translates the chosen feature set to the defined list of feature flags. But in the case here, the feature sets aren't clearly defined, if i understood it correctly, which makes them pretty much useless.

@esra_erimez Ай бұрын

I'm glad you asked, E5-2699 v4

@volodumurkalunyak4651 Ай бұрын

5:28 NO, you don't. Alder lake, Raptor Lake and Arrow Lake are 100% x86-64v3 (official Intel specs, some Alder lake parts could have AVX-512 aka v86-64v4 unlocked if E-cores aren't enabled). Intel currently gives NO WAY to have asymmetric ISA across desktop / laptop CPU's. Windows doesn't support x86-64v2v3 or v3v4 CPU's so Intel doesnt allow that configuration.

@onceuponaban Ай бұрын

We can now add "CPU architectures" to the list of things that don't actually exist, next to fish, adjectives, and trees.

@youcefg9760 26 күн бұрын

x86 is not a microarchitecture, rather an instruction set architecture (ISA). x86 is a CISC ISA not a CPU microarchitecture. Zen, Haswell, Broadswell, Kaby Lake... are CPU microarchitectures.

@sjzara Ай бұрын

I assumed that high and low efficiency cores on the same CPU would have the same instruction sets.

@alexturnbackthearmy1907 Ай бұрын

Problem is that they are basically 2 different processors in one. Even different architecture.

@13thravenpurple94 Ай бұрын

Brilliant video! Thanks a ton 👍

@lesh4357 Ай бұрын

I'm sure someone (maybe the processor manufacturer) could produce a config (feature definition) file for each individual type /model / release of processor. If they feel that is too much work for the multi-billion dollar chip they are selling, then they could have a base file for features that all processors so far support, then a feature set for a model of processor with + or - for both 64bit and 32bit modes (or any x-bitness that may come along). They should release this file at the same time as they release a new processor. If it's not a makefile itself, it could be used to automate the production of anything else needed. This "level" thing is creating confusion, and I'm sure it will lead to situations where features of a particular processor are not utilized because it's BETWEEN categories / levels !

@GrzesiekJedenastka Ай бұрын

It exists. That's CPUID, and it isn't a "config", just something you can get from your CPU if you ask it for that. This levels talk is not because you can't optimize for a specific CPU, it's because most people DON'T optimize for a specific CPU, as that'd require recompiling the entire OS on every computer. Having well defined levels would allow developers and distributions to create builds that would work on all CPUs supporting a given level, and can be easily downloaded.

@lesh4357 Ай бұрын

@@GrzesiekJedenastka what I was talking about is a file in a standard format that can be used to automate the process of building optimized kernels for any specific cpu type. You would not need to be on a machine with that type of cpu. If you want / need an optimized kernel, it could be used during the install process, either by choosing or building or linking to get an optimized kernel.

@GrzesiekJedenastka Ай бұрын

@@lesh4357 What would that be for? You can (in theory, though probably in practice as well) target any capability set when compiling, not only your own. The help of CPU makers is not needed here, at all. Thing is, other than some specific use cases, this is completely useless. It doesn't solve the problem of vendors wanting to offer optimized builds, because you would *not* build and distribute one program a thousand of times for _every CPU released ever_ separately.

@vilijanac Ай бұрын

In future there will be a probisto just to figure out what hardware you have and adequate kernel can be build. Then you can chose to download any distro.

@complexacious Ай бұрын

There are even issues amongst the supported features. I have an ancient Atom that supports SSE2/3 and some extensions but when I run ffmpeg with certain encoders it tells me that it's specifically avoiding using certain instructions because they are slower than the code path without them. Runtime evaluation just seems to be a necessary evil. Us lazy programmers dream of being able to add some compiler flags and get significant speedups across the board, but more features just always winds up meaning more testing and more if_this_then_that overrides to fix specific edge cases.

@GnBst Ай бұрын

"Tis a silly place" sums up the entire concept. CPUID has been around forever. Packages, kernels, etc get compiled with these requirements included and that has worked for decades (Intel MMX comes to mind). No MMX, you don't run that compiled package, that's all there is to it. The AVX extensions thing and it's hit and miss implementation through the last few generations makes this a problem. The whole i386/i686 architecture differentiation was pretty clear-cut in comparison (although it did suffer the same problem with CPUs that were i386 being sold well after most i686 offerings were no longer in production (AMD Geode). I would also add that with both Intel and AMD being involved in the classification process, this sounds like an easy way for them to assist in the unnecessary demise of old hardware in order to force sales of new stuff. They need to stop forcing the old stuff to become obsolete and entice me to buy newer hardware when it actually brings something better to the table. Running Rocky 9.5 on an Ivy-Bridge E5-2670v2 already gives me a warning during boot that it "may not be supported in a future version".

@mercuriete Ай бұрын

I use Gentoo BTW... march=native

@PanduPoluan Ай бұрын

Hell, yea! "-march=native" squad, all the way!

@mercuriete Ай бұрын

@PanduPoluan One question. I saw you use Gentoo. If gcc is doing unroll loops and vectorization with -O3 .... This could end up creating AVX512 instructions? This could end up with your system only executing in Intel p-cores? Or the -march=native is not reporting AVX512 unless is in all cores? (I use AMD ryzen but I am curious)

@PanduPoluan Ай бұрын

@mercuriete Ehehe I'm on AMD as well so I'm not sure how it will end up on Intel hybrid.

@Wkaelx Ай бұрын

Just to know, the v1, v2, v3, v4 in Xeon line processors means the MicroArch version? Like a Xeon 2666 v2, only suports v2 stuff and the Xeon 2666 v3 supports the v2 and v3 stuff?

@marcosmagalhaes6174 Ай бұрын

Not at all. The vs in Xeon line are something else entirely.

@Wkaelx Ай бұрын

@@marcosmagalhaes6174 Thank you bro, I didn't found any relevant info online at all.

@beatadalhagen Ай бұрын

Same fun in the 32-bit era, 'optional' instructions and all.

@DelticEngine Ай бұрын

What CPU do I have? My main machine has a pair of AMD Opteron 6380 processors in it. I know it's archaic, but it works. For the most part it does the job, even if not as quickly as a current system might be. One of the stumbling blocks is the AVX512 extensions. It seems in some situations there are significant performance gains to be had, but otherwise the gains are marginal at best. I would be very interested in a video exploring CPU instructions and extensions so that an informed decision could be made.

@crayzeape2230 Ай бұрын

You have FMA4 instructions on the 6380 too, they won't be turned on by any of the version levels. It's a crazy mess.

@Winnetou17 Ай бұрын

I remember with Rocket Lake aka Intel's 11th gen appeared, with AVX512, that in the applications that were able to use AVX512 it was literally 5-6 times faster than the rest of the CPUs. But those apps were quite niche, things you'd use on a specific workplace or for some simulation I think.

@rars0n Ай бұрын

x86 v(-1): CPUs that support MMX and 3DNow!.

@hubertnnn Ай бұрын

Those versions that you are montioning are not called v1, v2, ... They are called extensions, ones I can remember out of my head are mmx, avx, sse. edit: I see, they added some weird standard, and Linus Torwalds said exactly what I think about it.

@KeefJudge Ай бұрын

I like the levels thing as a concept, but it has got very messy, because AMD/Intel don't subscribe to it and expose each CPU feature support separately, mixing them up for each CPU model, so it's much more of an abstraction than a standard. I remember coding for D3D back in the day where you checked the caps bits for whether a particular GPU supported each feature, except back then the driver would often lie to you and say "Yeah, we do this", when in reality the driver was sometimes emulating the feature slowly in software, or would say the GPU had full support for something when it would only work in certain cases. On modern GPUs you just test for D3D level 12.0 or 12.1 (or the Vulkan equivalent), and it's much easier for the programmer cos you know (barring driver bugs) that it'll work, though this only works because AMD/Nvidia/Intel are all on board with it as a standard, unlike the CPU microarchitecture levels.

@rawrrrer Ай бұрын

I remember Raymond Chen's book "The Old New Thing" has a portion talking about GPU drivers inappropriately implementing D3D. It dives into detail about how these drivers attempt to cheat WHQL.

@complexacious Ай бұрын

I suppose it's not that different really when "supported" and "fast" are too very different things. It's fine for the broad brush things like RT where you can just offload the decision on the player to enable it or not, but for the most part the old "if gpu == atiragexl then dothis(); else if gpu == tnt2 dothat(); else dosomethingelse();" only really sort of went away due to a two-party system where one is a far more equal party than the other meaning that most devs simply stopped caring if stuff was slow on AMD GPUs and Intel was simply not supported. Now Intel is a bigger player again and nVidia is slowly losing dominance due to outpricing the market you'll start to see players taking gamedevs to task when things are slow on other GPUs leading to a return to board specific optimisation and all the troubles that causes.

@the_real_bitterman Ай бұрын

You missed openSUSE also providing v3 optimized packages for Tumbleweed, while openSUSE Aeon (Or just Aeon) is the only openSUSE variant which will automatically install them on supported hardware.

@jouniosmala9921 Ай бұрын

Architectural levels is a good idea poorly executed. However, hardware manufacturers should AGREE to keep designing new stuff to fit new levels, instead of abandoning them. Basically Intel should have kept ISA level support in a way that future mainstream CPU:s would be a superset of previous generation CPU:s. Either kept big only cores, or adding wider registers (not execution units) to little cores. Going back in AVX-512 support is something that seems absolutely horrendous decision. Just be clear. You could get main benefit of AVX-512 with 128bit execution units and 512bit registers. The main advantage of AVX-512 is the conditional execution per channel, combined with scatter and gather, with scatter and gather implementations being capable of handling element per cache port per cycle with conditional bit checked BEFORE issuing the load or store to memory subsystem. Those three things combined is what's really needed to expand vectorization of algorithms. And all those exists even in Skylake-X. (And minimal reasonable implementation of that standard wouldn't harm the little cores, as the goal isn't to add maximum flops per cycle with wider vectors but to add ISA support to existing width, the downsides of AVX-512 are mostly around power management effects of very wide execution width involving multiplication.)

@mathgeniuszach Ай бұрын

what's the point of even having new cpu features, if you don't make it possible for developers to consistently rely upon whether or not all recent chips will have those features? They're not gonna build software for it. So it may as well not exist.

@GrzesiekJedenastka Ай бұрын

I mean, you can still query for support during runtime, and programmers do that for tasks that are performance critical. But it does bare the compiler from optimizing the code as it would see fit, so yeah.

@bigpod Ай бұрын

So amd64 architecture has same problem as arm just on per generation not on per cpu

@zhongj Ай бұрын

what level does core 2 duo P7350 fall under? Is it the same for GPUs as well?

@russjr08 Ай бұрын

No clue about the core 2 duo, but GPUs don't have an ISA level in this manner (to my knowledge that is) - rather usually you'd query and check which OpenGL/Direct3D/Vulkan version the GPU/Driver supports which is the equivalent for this situation.

@3lH4ck3rC0mf0r7 Ай бұрын

Maybe the correct thing to do here would've been a "Compile for the given CPU feature dump" flag. Something like -march/mtune=cpuid.bin If you wanna pick what featureset you wanna support and to what extent, nothing would be more flexible. If you wanna come up with any kind of profiles or generalizations, you can. If you wanna compile code that would only work in a specific CPU like -march=native but don't wanna do the compile _on_ that CPU, you could. But that's on GCC to implement, not the Linux kernel.

@hubertnnn Ай бұрын

I think its already supported, at least I remember seeing something similar in gentoo's compiler documentation. The whole discussion was about what should be the default, not what user can set himself, and if there should be few default presets.

@MonochromeWench Ай бұрын

That is pretty much what -march does when you give it a specific cpu to use. or you can go an specifiy all the individual instruction extensions you want to enable in long form

@tagKnife Ай бұрын

Something needs clearing up here. Intel and AMD were not involved in the creation of the levels. Infact, levels were not even created for x86-64. Levels were created by ARM, as their instructions are clearly seperated by these level subsets. It was GCC/GLIBC that decided to take ARM levels and apply them to x86-64

@mercuriete Ай бұрын

So if I understand correctly... If you have a modern Intel cpu and your gcc is doing loop unroll and vectorization you could end up with a system that only works on p-cores? Thats a little bit crazy, because Gentoo is defaulting to march=native -O3

@mohammedgoder Ай бұрын

It's a feature not a bug. Although, I heard that newer Intel E cores are getting vector extensions.

@nullplan01 Ай бұрын

Why does the kernel even care about AVX or AVX512? It is compiled with -mgeneral-regs-only, meaning those extensions can never be used.

@complexacious Ай бұрын

It's two different discussions smashed into one. The kernel part was just about how compiling a 32bit kernel takes on compiler flags you might not want it due to host specific feature sets and setting a -march= by default was proposed which led to Linus saying it's a mess and let's not but then backing down partially and saying maybe we should at least use generic x86_64 which is reasonable. AVX is just a good example of why x86_64 feature sets are confusing but not specifically related to the kernel.

@VarriskKhanaar Ай бұрын

I have the Zen1/Zen2 options in CachyOS. I imagine those are more targeted based upon AMD's generational architecture.

@TheUAoB Ай бұрын

It was always a terrible idea. It seemed to all begin when I was on Phoronix criticising the practice if assuming specific extension support instead of writing code with conditonal support, and pointed out how it wasn't as much if an issue in ARM due to the architecutre level support. I suugested what was happening was defacto x86-64 architecture levels, but didn't think that was a good idea. That seemed to trigger this all to start. Sorry!

@patw1687 Ай бұрын

One of my desktops is an old i7 3rd Gen. It works like a champ.

@Steeeved 24 күн бұрын

I don't see this madness going anywhere good any time soon, if ever. Like, the industry in general is moving towards all these advanced mixed hybrid processors, as you see with P and E cores, and that's only going to get even wilder with some of the ideas the industry is looking at in the coming decade. Feature-testing is always going to be superior in every way. And more flexible! SO much more flexible! Both for the developers and the CPU architects in general. It's as dumb and crazy as suggesting a webdev target a browser user-agent to detect features - that has never worked reliably in the history of browsing with the exception pretty much being the targeting OLD browsers well after the release. (like back when people had fixes for IE6/7/8 only using those IE comments or other methods) Then you look at the nightmare that is "Living Standards" throwing a wrench in their nice perfect attempts at versioning. Just detect features guys, don't waste time with silly versioning when you can't even agree with your own standards...

@FAYZER0 Ай бұрын

Levels would be great if they weren't defined it retrospect. In order for them to work you would have to rid the world of the current ones and instead set up levels agreed upon moving forward. Problem is, that would basically make them useless until all the non compliant CPUs die, and we don't want everything to become e-waste. So, yeah, it's just generally a bad idea, or at least a flawed one. I do understand why we want them as it is nice to make sure that new features actually get taken advantage of in cases where they do speed things up. (Even though benchmarks show that is not an easy calculus.) But hey, I use CachyOS now that it's stable on NVIDIA, so I can use v3 on my Alderlake while still using the AUR. Gentoo is just frustration.

@stephenreaves3205 Ай бұрын

This makes Gentoo with `-march=native` look sane. EDIT: Since you asked, I'm still rocking my i7 4790k

@asmod4n Ай бұрын

cant they just make a installer which picks the right thing and downloads it. you can detect that all at runtime of a program.

@GrzesiekJedenastka Ай бұрын

If distros targeted all CPUs then there would be way too many "things" for it to make sense. So no, they can't. That's what we need some standardization for.

@confushiarch3421 27 күн бұрын

feels like this can only work for some static hardware that is well known like raspberry pi or a steamdeck....

@kevinpaulus4483 Ай бұрын

I used to try to build the smallest linux kernel for slackware that I could (got it to 1.2 mb back in the day ...) to obtain the best performance I thought I could get but it took a lot of time. And sorry gentoo guys -- or who's still left - ricing is bad mmmkay. That was the consensus and the joke. Have fun spending weekends for 0.5-2% more performance improvement. However that was then and this is now and there are a lot of new SIMD/Vector and virtualisation instructions that I've heard of since that time and not using them is giving away an edge that has real costs (performance/electricity, ...). There should be in my opinion easier/distro tools to recompile the desktop part of a distro . Just like the HPC guys have different toolchains with exotic enabled extensions and hardware here and there and even proprietary compilers (icc for example) for some of the differing nodes. And what can be done with good and fast run-time CPU detection ?

@GegoXaren Ай бұрын

I run an FX-8350... Works well enough in most games... Though, having by games on a spinny boi (HDD) is not great... The cpu waits more than it should in some games, causing stuttering.

@needsLITHIUM Ай бұрын

I have a laptop with an Intel N4020 and that is v2 and it's from 2019 in an ASUS laptop from 2021. It came with Windows 11 (ugh) and just being able to get to the bios from W11 is a PitA. The splash screen is completely hidden by default, so you HAVE to setup an online Microsoft account user in Windows just to be able to get to the god damn boot menu by holding shift as you reboot - staying offline it just tells you to try again when you have a network connection and prompts you to shut down. And, as soon as I did that I put MX Linux KDE on there. My desktop has better specs, so I really just need the laptop for watching movies on flights/in hotels on vacation, or being able to use amp sims to play guitar in scenarios where my actual amps aren't viable, which the laptop can do in Linux. On Windows, would imagine the same tasks are doable, save for the amp sims - my fiancee has the same laptop, and I tried. Neural Amp Modeler kinda works, but Audio Assault Amp Locker/Bass Locker, Neurontube Debut, ToneLib GFX, they all crap out on the N4020 on Windows. I can't even get Guitarix to work in WSL on my main machine. She had various distros of Linux on her laptop for 2 years, trying Feren OS, then Kubuntu, then finally settling on MX Linux KDE, same as me, then put Windows 11 back on it, just because she was curious as to how bad it is, and she kept forgetting to update her packages to point things would nag at her or break, and she kept forgetting to setup auto update. Now she rarely uses the laptop, lol. She doesn't hate Windows 11 as much as she thought she would, but at the same time she still doesn't really like it, especially compared to Debian and Ubuntu based Linux or Windows 7/10, because even when she sets scheduled updates it just does it whenever it feels like it, and Windows is slower than Linux on that hardware. Now she's stuck in a spot where she doesn't like Windows because 11 is a hassle, but Linux requires too much attention for her, and she's just annoyed.

@cheako91155 Ай бұрын

I'm surprised there isn't an intel/amd split... are they working together?

@alexturnbackthearmy1907 Ай бұрын

Always has been. And then they implement different instructions in "AVX-512"...

@SlyEcho Ай бұрын

cmpxchg = compare and exchange

@deadeye1982a Ай бұрын

Fun Fact: Definition of Industry 1.0, 2.0, 3.0 and 4.0. The definition came after the inventions. World War 1,2, [3]... same

@Rohambili Ай бұрын

14:43 Bruda, i have all of them you can imagine...

@JonBrase Ай бұрын

Really, if manufacturers are going to implement big.little architectures, they need to make sure that all of the cores used have the same feature support. If the P cores support AVX-512, the E cores had better do so too. The E core implementation can be dog💩, it can work at one uop per cycle and take 100 uops to implement one AVX-512 instruction and have a 1-port register file for the 512 bit registers and whatever you need to shave space, but it had better run the instructions, however slowly. And when an AVX-512 instruction is retired, it needs to set a bit in some control register so that the scheduler can see that this is an AVX-512 process and only schedule it on P-Cores after the current timeslice.

@serras_ Ай бұрын

At the end of the day all I want is a binary that uses my hardware to the uptmost of its ability, and not to feel like i have to (potentially) leave performance on the table to support 'legacy' (for lack of a better term?) cpus. Is the naming condition bad? Kinda. Is mfrs still putting out 'modern' cpu's with cut down feature sets stupid? Absolutely Do I want to compile everything with --native instead of using a precompiled binary, because the above 2 points are stupid? Absolutely fuckin not

@foznoth Ай бұрын

Nice to see Linus is a Monty Python fan.

@KeinNiemand Ай бұрын

AVX10 is going to make this even more of a mess then avx512, not every cpu that supports avx10 will support the 512bit instructions, AVX10 instruction come in 128/256/512bit. I guess AVX10 and APX will be v5 but most cpus probably won't support all of v5 and we could get cpus with AVX10 256 support and apx but no avx512

@NFvidoJagg2 Ай бұрын

Unless AMD and intel come together and get strict to what the different levels are. It doesn't do any good.

@elalemanpaisa Ай бұрын

the idea in general would be great but only if and really only if you would have the full distro to embrace it otherwise it is just.. meh.. the biggest chunk still is user land not the kernel on mordern machines who would care..

@ai-spacedestructor Ай бұрын

my cpu is an Intel i7-8700 because that is good enough for VR and will be enough probably a few more years and is more powerfull then some of the newer cpus that cost more.

@husanaaulia4717 28 күн бұрын

Because of no avx support, I am being forced to use V2 😞

@medicalwei Ай бұрын

Meanwhile RISC-V... RV64IMAFDC

@linuxguy1199 Ай бұрын

CPU Microarchitecture levels sound awful, heres a better idea, just provide an argument like amd64-num, where num is substituted with the hexadecimal value of the required CPUID bits. So lets say you have something that requires sse (0x0080), avx512 (0x1000), and mmx (0x0020) would be in the amd64-10A0 packages

@blinking_dodo Ай бұрын

Optimize for local machine WOULD be nice...

@Poldovico Ай бұрын

you can just do that if you want to. The downside is you have to compile your own stuff.

@rashidisw Ай бұрын

I'm in favor of [Supported CPUID] list.

@GrzesiekJedenastka Ай бұрын

So basically software vendors define their own "levels"? I mean yeah that could work.

@StephenMcGregor1986 Ай бұрын

On Intel it doesn't make sense because Intel

@DrewWalton Ай бұрын

Like many things in the tech world: good concept, horrendous implementation.

@bleack8701 Ай бұрын

Someone call the code of conduct crew. Linus called something idiotic without provocation

@GrzesiekJedenastka Ай бұрын

Something. Not someone.

@siyiabrb8388 Ай бұрын

Completely broken is right, breaking "legacy" systems and maintaining different repos for v2...v3... for 1% performance gain is not worth it.

@GrzesiekJedenastka Ай бұрын

It can be significant, and it can be worth it. If I bought the whole CPU why can't I use the whole CPU? Whether older systems will still be supported depends on the vendor - I am sure Debian will support base x86-64 for the next 20 years, remember that Debian 13 will still support 32-bit x86, and who knows, maybe 14 will too.

@_Jayonics Ай бұрын

I went down the same rabbit hole around compiling for Intel P and E core architectures over multiple weekends. Asking whether you can run code compiled for instructions present on the P cores but not on the E cores on those CPUs, would it run, what if you enforce thread affinity... There's still no answer on my stack overflow post. another poop 💩 from Intel

@jonathanbuzzard1376 Ай бұрын

You need AVX2 for v3, AVX is not enough

@medicalwei Ай бұрын

The CPU actually using: Apple A17 Pro (sorry)

@jan_harald Ай бұрын

hey stupid I love you also lol, even though my cpu is reported to be v4 compatible, enabling the unofficial v4 repos for arch made every program segfault but v3 works fine, from same maintainer

@somenameidk5278 Ай бұрын

Segfault? Odd, i would assume they would terminate with SIGILL (illegal instruction)

@knghtbrd Ай бұрын

I have all four levels. 😅

@nicholasbrooks7349 Ай бұрын

interesting

@TheRedFoxPlayz Ай бұрын

The 10th-gen Celeron and Pentium CPUs are 'v2' because these chips lack the AVX/AVX2 instruction set. Even though they are recent enough, they cannot run an Ubuntu compiled for x86_64-v3, which is ridiculous.

@Winnetou17 Ай бұрын

I wouldn't say it's ridiculous. Not all new chips really need to have support for all the instructions that appeared. At least theoretically, having fewer things included (or, to rephrase it, to remove things that aren't really needed) will make the chip smaller, cheaper and more efficient. In practice this might be on the cents level but, hey, I still say it's a decision that Intel or whomever should be allowed to make. In ARM-land, it's like making Cortex-M0 (which is REALLY barebones) chips. Those are still manufactured today.

@TheRedFoxPlayz Ай бұрын

@@Winnetou17 Intel's decision is understandable; that wasn't what triggered me. It's the 'Let's optimize for x86_64-v3' approach, given the number of devices out there. x86_64-v2 would offer a performance improvement over regular x86_64 without excluding chips that were released in the past five years or so. They are effectively turning those chips into e-waste (at least for desktop use cases).

@Winnetou17 Ай бұрын

@@TheRedFoxPlayz Ooh, sorry, my bad. But isn't Ubuntu also shipping non-v3 ISOs ? I thought they only offered that as an extra option. Though come to think of it, I think Fedora also did something like this and without the non-v3 part. Frankly I'm baffled why is Fedora so recommended for new users. To me it seems a distro for people which already know what Fedora's limitations are. Which are quite subtle. Anyway, rant off.

@NikoNemo Ай бұрын

Really...

@VerbenaIDK Ай бұрын

i have a Ryzen 7 4800HS

@aavvironalex 27 күн бұрын

i9-10980XE Cascade Lake-X

@gr33nDestiny Ай бұрын

Why can’t a code be use it v3.2 or v4.2.1 etc. I’m guessing they thought of that?

@rj7250a Ай бұрын

Linus say it is unofficial, when these arch levels have been developed with help from Intel and AMD. I dont get it

@polinskitom2277 Ай бұрын

Linus is getting a bit too old, and should be replaced so that Linux can survive the future to be honest.

@JEM_Tank Ай бұрын

He's saying they are unofficial because the hardware people haven't stuck to them as they should, thus they aren't following their own standard they created

@Poldovico Ай бұрын

Linus probably made an assumption that they were unofficial (possibly based on how poorly CPUs that are sold map onto the levels) and was mistaken. The level are official, just ill-defined.

@rj7250a Ай бұрын

@JEM_Tank oh, i understand. I mean, Intel's fault for not creating a 256 bit encoding of AVX512 earlier, now they will create AVX1p.

@alex-oc1wo Ай бұрын

Meme while cachy os repo user in side be like 😅😅😂

@degenincel 28 күн бұрын

AMD Ryzen 7 5700G (16) @ 4.67 GHz

@realmwatters2977 Ай бұрын

intel atom 32-64

@DJDocsVideos Ай бұрын

Architecture Levels have been stupid from the get go. Something only a marketing drone could have come up with. Also your explanation is not completely correct as it's perfectly possible for software to detect cpu features at compile time and there is no technical reason to come up with more or less dumb groups. You can actually do it at runtime alas with bigger binaries and for little gain.