NativeAOT in .NET 8 Has One Big Problem

Рет қаралды 29,084

Күн бұрын

Пікірлер: 92

@DamianPEdwards Жыл бұрын

Few things that cause the slightly lower performance in native AOT apps right now. First (in apps using the web SDK) is the new DATAS Server GC mode. This new GC mode uses far less memory than traditional ServerGC by dynamically adapting memory use based on the app's demands, but in this 1st generation it impacts the performance slightly. The goal is to remove the performance impact and enable DATAS for all Server GC apps in the future. Second is CoreCLR in .NET 8 has Dynamic PGO enabled by default, which allows the JIT to recompile hot methods with more aggressive optimizations based on what it observes while the app is running. Native AOT has static PGO with a default profile applied and by definition can never have Dynamic PGO. Thirdly, JIT can detect hardware capabilities (e.g. CPU intrinsics) at runtime and target those in the code it generates. Native AOT however defaults to a highly compatible target instruction set which won't have those optimizations but you can specify them at compile time based on the hardware you know you're going to run on. Running the tests in video with DATAS disabled and native AOT configured for the target CPU could improve the results slightly.

@parlor3115 Жыл бұрын

My thoughts exactly

@proosee Жыл бұрын

Same old, same old - you trade startup time, executable size and memory consumption for performance like in almost every software before. But it was nice to see details, especially I didn't know that CLR is able to recompile with different settings some paths, so thank you for sharing - it is quite smart actually.

@SlackwareNVM Жыл бұрын

I'm curious, would it be possible in the future for a JIT application with Dynamic PGO that has run for a while and has made all kinds of optimizations to then create a "profile" of sorts that could be used by the Native AOT compiler to build an application that is both fast in startup time _and_ highly optimized for a given workload?

@proosee Жыл бұрын

@@SlackwareNVM you said it yourself - you need to create a profile, keep it updated, save it and load it on startup - there is always a trade-off, but you can always make it smarter, for sure.

@terjeber Жыл бұрын

I also think this type of performance testing is not particularly good. For example, the client hits the same endpoint every time, which gives the JIT compiler ample opportunity to radically tune performance for that specific code path. In theory the JIT might stop executing most of the code, since it doesn't change at all under way. It would be far more interesting to pop some DB in there fill it with a few million records, and vary the test so that it retrieves different (or random) dataset each time. That would remove a opportunity for the runtime to optimize the code, and having a server responding with the same data on the same url over and over is nowhere near realistic.

@jimmyxu3819 Жыл бұрын

Your docker image base is different, your result can't be compared each other. Make sure you using same linux version.

@wknight8111 Жыл бұрын

JIT is interesting because on one hand you're starting an un-optimized application and expecting to compile and optimize at runtime so people think it's going to be slower. BUT, JIT has access to all sorts of runtime statistics and runtime type information that the AOT compiler does not have. This enables some very interesting and aggressive optimizations, in theory. I don't know the full details of everything Microsoft's CLR JIT attempts to do, but the possibilities are there for JIT to be better performant, especially for long-running applications. AOT will always win for startup time and short-lived applications, but for long-running applications it's not as clear and JIT often has some advantages.

@_iPilot Жыл бұрын

So, if we will share that statistic with the AOT compiler it will produce even more effective application code, won't it?

@NicolaiSkovvart Жыл бұрын

@@_iPilot it seems extremely likely that Static PGO + AOT would competitive if not better than Dynamic PGO + JIT. Sadly the Static PGO experience is pretty poorly supported

@wknight8111 Жыл бұрын

@@_iPilot The problem is that you can't get runtime statistics until runtime. Everything else is just a guess, and if you guess wrong the AOT may optimize for the wrong types and make the situation worse.

@modernkennnern Жыл бұрын

@@wknight8111you could theoretically run the app in JIT mode and then use that metadata to compile for AOT

@_iPilot Жыл бұрын

we are in age of telemetry, so runtime data can be uploaded somewhere like logs (it is actually logs, btw) to be analyzed by external application.

@FatbocSlin Жыл бұрын

When comparing docker performance, you are comparing apples to oranges. Docker base image does make a difference, you use Ubuntu 20.04 as base for your native image, .NET 8 SDK uses Debian 12 as base. I have compared standard .NET docker images against one based on Clear Linux, and there was 9% difference, more than the difference you found in your test. .NET depends on libraries included in the docker image.

@caunt.official Жыл бұрын

4% losses on performance doesn’t really matter. What is interesting here is the actual point of bottleneck. Does NativeAOT perform better or slower with encryption algorithms? Does it perform better or slower with heap allocation? What’s exactly does affect the performance

@BlTemplar Жыл бұрын

AOT will always perform slower than CLR because it doesn’t have JIT and can’t optimise hotpath. But it will consume less memory because the code is already compiled and optimised to some extent by default. CLR needs to do all that work during runtime this is why it will also consume more memory and some extra CPU resources until the code is optimised.

@nocturne6320 Жыл бұрын

@@BlTemplarAOT should absolutely be faster than JIT. If AOT is performing slower, then the compiler is garbage. If a program written in C++ is slower than one in Java, it means the C++ code is bad, not that Java is faster than C++

@BlTemplar Жыл бұрын

@@nocturne6320 I am talking not about C++, I am talking specifically about AOT in C#. It’s a highly dynamic object oriented runtime which is hard to AOT. It won’t be faster than CLR in the nearest future.

@nocturne6320 Жыл бұрын

@@BlTemplar True, but with smarter compilation it definitely has the potential to outclass JIT, I wonder how much the MethodImpl attribute affects the performance currently

@maxdevos3201 10 ай бұрын

Yes, it does! 4% matters a lot! This type of thinking is why software bloat has managed to completely undermine the hardware advancements of the last 30 years

@robwalker4653 Жыл бұрын

This is my go to channel for all things .net. Gets to the point straight away!

@astralpowers Жыл бұрын

I really want to use native AOT in our AWS lambdas. In my testing using the NET7 AOT lambda template, the startup is faster, and the performance is more stable. For one application , in the normal non-AOT lambda, the performance performance deltas are all over the place, ranging from 2ms to 400ms, but the AOT version had performance that was between 1.2ms-4ms, all the while using less memory.

@Denominus Жыл бұрын

We are doing early experiments with .NET 8 AOT. So far the latency stability, lower resource consumption and startup time improvements, even in long running apps, dramatically swings the cost/performance ratio in AOTs favor (in our tests). The sacrifice of some theoretical techempower peak performance for perf that actually matters, is completely worth it. We have some services that were rewritten in Go some time ago. The .NET AOT side has a ways to go before it can match that cost/perf ratio, but it’s looking promising.

@raduncevkirill Жыл бұрын

I am wondering if the comparison is consistent when having different base images for the two APIs. Default one running on debian-slim and native-aot running on ubuntu. It shouldn't make a significant change, though, as Microsoft's benchmarks yield the same results.

@nickchapsas Жыл бұрын

It doesn't matter. The biggest difference is on the OS level. The only real difference btween the slim or alpine versions are image size which doesn't play a role in runtime performance

@dimitris470 Жыл бұрын

I have a feeling that any such difference is going to be swamped by I/O latency IRL anyway

@magashkinson Жыл бұрын

You can drag and drop csproj file from explorer to editor tab to open it

@ChristofferLund Жыл бұрын

Big smarts

@nickchapsas Жыл бұрын

You can also F4

@BozCoding Жыл бұрын

I'm interested in using it within chiseled docker containers :) I'm sure that more changes will happen in the future to improve these too, especially as we'll see less memory usage and probably less CPU usage.

@viko1786 Жыл бұрын

The AOT might be a great idea for something like Lambda in AWS. Quick spawn, go and kill process

@psaxton3 Жыл бұрын

The runtime also changed from Windows to Linux when you ran containerized. Would be interested to know the numbers on a Windows container.

@emjones8092 Жыл бұрын

Where did we land on memory consumption and cpu consumption comparisons. A smaller distribution already conserves lots of resources. Which is one of the big points: Scale to zero, cold boots, better memory efficiency, and smaller binaries are what I’m after.

@protox4 Жыл бұрын

How does it compare with ReadyToRun? It's a mixture of AOT + JIT so you should get the best of both worlds in terms of speed (maybe not file size).

@tarun-hacker Жыл бұрын

Hey Nick, You should probably check profile guided optimisation for AOT in .NET for better results 😅

@wangshuo8619 Жыл бұрын

did native aot support the reflection? Some of doc says no. Some of them say some of the reflection. I am not sure if I should migrate my code which heavily use mediatr to nativeaot. The docs are confusing

@another_random_video_channel Жыл бұрын

I noticed that the based images are not the same. One in Ubuntu while the other is debian. Also the running container may have different resource constraints

@nickchapsas Жыл бұрын

It doesn’t make any difference, feel free to grab the code and check for yourself

@jwbonnett Жыл бұрын

Unfortunetly a lot of the required NuGet packages I need use reflection and will never be reflection free, so I will not be able to use AOT. Personally I would use AOT even if it has a slight drop in performance, but it's just not there for me.

@zoltanzorgo Жыл бұрын

That was interesting! I am currently working on a project that has one component running on PLCs. Yes, it is a PLC with an embedded RT Linux on top of a 600Mhz(ish) single cored ARM. The flash is somewhat limited, and it is also coumbersome to install the runtime, because there is no app repository like you have for mainstream distributions. Hence I decided to publish to linux-arm with AOT. As it will also run the CoDeSys 3.5 PLC runtime alongside, I need to be careful not to stress the resources. I was very curious what difference I could expect. It is a somewhat different workload, but still, It is good to know that I might have to consider installing the runtime anyway.

@mauriciobarbosa3875 Жыл бұрын

I'm wondering, is the same performance hit on a environment non WSL? wsl is known for being I/O slow with docker, what if the docker images are run on a full blown distro? just thinking also, i think you used `dotnet publish` to publish the AOT for the docker image version and `dotnet publish -c Release` for the non-aot, isnt the default publish being to Debug on aot? i have not coded in dotnet for a while, so sorry if i misunderstood

@nickchapsas Жыл бұрын

To your first question, it doesn't make any difference, it aligns with MS's full environment performance delta. Both of them are released using "dotnet publish" because in .NET 8 -c Release is the default.

@mauriciobarbosa3875 Жыл бұрын

@@nickchapsas I've tried running the same benchmark on my machine, its a M3 pro base model (11core/18Gb/512Gb) the results are actually surprising: M3 Pro - no docker AOT 139596.985677/s Normal 139472.800011/s M3 Pro - docker (colima on vz) AOT 45329.935323/s Normal 44474.530778/s so running outside of WSL did impact the result, on my machine AOT is still slight faster 🤔 EDIT: (using the stress test with 100VUs for 60s as well)

@nickchapsas Жыл бұрын

@@mauriciobarbosa3875 Were your tests hitting over 100% CPU util on the container level? Was your Macbook's CPU util less than 100% ? There are many variables. NativeAOT for this particular example will always be slower if run correctly.

@mauriciobarbosa3875 Жыл бұрын

@@nickchapsasi've run the test again but now on my M1 from work, got similar results 🤔

@_iPilot Жыл бұрын

It looks like Microsoft is focused on reducing containers startup time including delivery to registries and to host machine. Actually, some huge applications can have several Gb container sizes, but when split to microservices that inevitably have duplicate container layers that leads to huge overhead on data transfer during deployment.

@lylobean Жыл бұрын

@Nick Did you check these differences are still valid when your project uses the setting OptimizationPreference Speed, for its AoT compilation. As I think it defaults to size.

@warrenbuckley3267 Жыл бұрын

I'm also wondering if you can specify what CPU instruction sets are available for a given target in the build settings (like you can for a C/C++ application), e.g., AVX2 or AVX512 etc.

@TheAzerue Жыл бұрын

WIll Native AOT create any issue if it is used with other nuget packages like FluentValidatrion, MediatR, Serilog etc. ?

@VoroninPavel Жыл бұрын

If a library is not marked as trim or AOT friendly, you'll get warnings from trim analyzer when publishing the application. Unless those warnings are disabled like it's currently with Blazor in .NET 8

@MatteoGariglio Жыл бұрын

Hi Nick, thanks for your nice work and videos, very instructive and helpful. Could you do one about JIT compiler and the CLR? THANKS!

@FraserMcLean81 11 ай бұрын

Thanks Nick. Whats your terminal plugin that shows different file types in different colors?

@VoroninPavel Жыл бұрын

What about comparing with ReadyToRun/CompositeReadyToRun mode?

@msafari1964 Жыл бұрын

Hi, which cli u use for publish and so on?!

@gregoirebaranger1696 Жыл бұрын

Performance is good enough in all cases, if you run into this kind of requests per seconds in prod I doubt you should be running a serverless / cloud container. I'm much more interested in the reduced resources required to run the app, (hat's the big big selling point of AOT in my opinion.

@BigYoSpeck Жыл бұрын

Requests per second is obviously useful for an application you expect to process lots of simple requests, but I would find a more useful benchmark to be how fast computationally and memory intensive requests can be processed I currently work on an application that gets a relatively small number of requests per day, but those requests involve huge data models that then go through a lot of very time consuming processing, somewhere in the region of 15 minutes for datasets in the tens of thousands. So how does AOT compare with JIT when the responses aren't simple pieces of data but there is actually some heavy computation performed on large data?

@simonegiuliani4913 7 ай бұрын

It's just a really bad benchmark the one he's using and he shouldn't generalise the results so much. If that is the benchmark we should refer to, then using .NET doesn't even make sense and we should all switch to GO

@dukefleed9525 Жыл бұрын

ok, interesting, but WHY is it happening? i suppose that JIT can better take track of the *register pressure* and in a resource constrained environment this does the difference, is this the reason? would be interesting to see what happens for a single threaded application (or anyway apps with different code path for each thread)

@zwatotem Жыл бұрын

I would love to hear, how exactly do these JIT optimization work. Right now this sounds like black magic to me.

@maxpuissant2 Жыл бұрын

Is AoT somewhat safe or safer to deliver DLLs to clients without fear of decompile?

@souleymaneba9272 Жыл бұрын

Yes. Blazor WASM already got its AOT (WASM AOT not Native AOT). These technologies are very good especially for .NET developper because IL code is easily decompiled.

@Hoop0u Жыл бұрын

What about when hosted in IIS?

@yoanashih761 Жыл бұрын

Any reason for switching from Postman to Insomnia?

@nickchapsas Жыл бұрын

I prefer the UX, it is correcly responsive and i hate Postman's forced account stuff

@Quique-sz4uj Жыл бұрын

@@nickchapsas Insomnia changed and now it's quite shit like postman. It doesn't let you save your collections as files and is pushy about the account too. I prefer Bruno which is a fork of Insomnia and it saves the collection files on the file system as markdown files, which is good if you want to version control it.

@raykutan Жыл бұрын

Bruno isn't a fork if Insomnia, it's a completely different project. It also doesn't store requests in markdown but in a special ".bru" format

@mad_t Жыл бұрын

You wanted to ask if there's any reason for NOT switching from Postman to anything else, right?

@IncomingLegend Жыл бұрын

@@nickchapsas why delete my comment? I didn't say anything bad, wtf? you're on their payroll or something?

@T___Brown Жыл бұрын

I think its a new thing and MS will make it super fast each new release. But they want to see us using it before they put effort into it.

@simonegiuliani4913 7 ай бұрын

Your corollary only applies to application endpoints which are not computationally intensive. It's really wrong saying "it's faster, it's slower", perhaps it should be contextualized better on the type of workload.

@cwevers Жыл бұрын

You did the warmup call after the k6 test started

@nickchapsas Жыл бұрын

It doesn't change the results, k6 takes that into account

@patfre Жыл бұрын

Fun fact: the change in the csproj was bugged, it should only be on NativeAOT but was in all API templates, I reported it and got it fixed. Talking about the InvariantGlobalization

@BlTemplar Жыл бұрын

AOT isn’t supposed to be faster. It offers less memory consumption fast startup but not better performance.

@lhcyt 6 ай бұрын

So does that mean if p >= q and both p and q is an integer, then (p!)/(q!) = Π_(n=p-q)^(p) n

@the-avid-engineer Жыл бұрын

Im sure the 1% of devs who are affected by the loss of 85k RPS are pushing MS to address the issue. Possibly a way to sample the JIT optimizations once they stabilize and then apply them to AOT at compile time. Kinda sounds like a form of ML

@jimmymac601 Жыл бұрын

Just here for the comments from the Microsoft apologists.

@IllidanS4 Жыл бұрын

To be fair I don't get why MS tries so hard to make NativeAOT the "modern thing" everything has to revolve around. Sure, you might run .NET in constrained environments or architectures where the JIT cannot run, but I really feel without it there is so much "power" that .NET has that it is losing. For quick startup they had ngen for ages, so even that point is moot. How often do you need to run a .NET program that changes so often and needs to be restarted so quickly that ngen or JIT are actually the bottleneck? Without JIT some code has to be interpreted which has huge performance downsides. I don't see much point for pushing NativeAOT when it breaks when using full .NET features like Linq.Expressions, reflection and MakeGenericMethod, or the DLR.

@ByronScottJones Жыл бұрын

In lambda and other on demand invocation environments, it can make a huge difference. It's not for routine Windows desktop apps.

@IllidanS4 Жыл бұрын

@@ByronScottJones That indeed sounds like a very constrained environment, still I am not convinced that all of these improvements are just due to ditching the JIT. There could still be a way to use ngen to pre-compile a lot of what is used, or other tricks ‒ for example you can run .NET in WebAssembly where I have seen people running a warm-up code that runs JIT on a few important methods, runs some static constructors etc. and then takes the whole memory image, so you essentially end up with a pre-compiled image without any effort on .NETs part.

@sikor02 Жыл бұрын

I have the same CPU :) It's hard to saturate this beast

@testg4andmmm129 5 ай бұрын

5% performance reduction. you're making click bait cover.. waste of time.... ++++Compile to native code. - 5% perf reduction. 5% is nothing....

@LukasPetersen-bm4ep Жыл бұрын

First :D

@AhmedMohammed23 Жыл бұрын

cpu buddies

@ilanb Жыл бұрын

I don't think NativeAOT will be adopted quickly.. it's like nullables, too much of a pain in the ass to use, and the benefits aren't worth the work IMO

@juliendebache8330 Жыл бұрын

How is nullable references a pain in the ass? Pretty straight forward to use and not having to worry about NREs anymore is quite nice.

@EraYaN Жыл бұрын

I mean if you are using serverless it might be worth it pretty quickly.

@protox4 Жыл бұрын

@@juliendebache8330 It's a pain in the ass to convert older huge projects. It's just fine for new projects.