UE5 - Rendering Path Performance Overview (Targeting 8th gen and Older Platforms)

Рет қаралды 2,943

Күн бұрын

Пікірлер: 82

@WildOxStudios 3 күн бұрын

I basically cover what Is discussed in this video. kzbin.info/www/bejne/iauQkIGtba-Bm5Y There seems to be some misconception that we have only started doing massive open world games or high fidelity (high res texturing) and that cannot be further from the truth. Depending on what "optimization" workflow and "tech" you want to use you simply have to go about this differently. Nanite can do "open world" and LOD/HLOD can do "open world" they just go about it differently and use either newer or older technology to get you there. Nanite cost you upfront and scales well as you add to your scene however the tech required DX12 SM6 as well as the sofware rasterization comes with a huge up front cost which forces most cards to push texture resolution and overall screen resolution down (thus the need for upscalers like TSR). Older rendering pipelines cost you very little up front and cost you more as you add to your scene thus you manually handle culling, draws, LOD groups and HLOD groups. Forward pays for lighting in each material at the base pass while deferred pays for it in the "deferred pass".... deferred essentially allows "more dynamic" lights to be cheaper as you scale but cost a bit up front for the Gbuffer. Forward being so cheap and not using much memory allows you to have the extra memory overhead to "bake lights" which can be very realistic and run on very very old cards and even mobile. kzbin.info/www/bejne/ioSppoajn9Gdq68 ... I'm going over each of these in this video and explaining how to get back down to the cheapest up front cost possible. DX11 uses less overall memory while supporting SM5 and DX12 uses power extensions and caches memory which "can" impact performance of older cards that either don't have alot of vram or that do have 16gb of vram but lack the bandwidth (memory clock speed) to optimally move that information around quickly thus "increasing" your overall frametime (lower FPS) by simply going from DX11 to DX12 even though the shader model [sm5] isn't changed (most gtx cards and your lower end RTX 20 series suffer from this). Optimization is give and take and some tech works in harmony better with older cards and can give you results that cost less yet look as good or better than just checking that nanite box and requiring everyone to have 16GB of GDDR6/7 memory ready to eat SM6 shaders with DX12.

@saisnice 3 күн бұрын

@@WildOxStudios Thanks for great explanation. Your video is interesting, because if you want to do some simple project (for example a puzzle game) it's frustrating when your fps hits the cap when your scene is almost empty. When you explained that by using this new tech in unreal you "paying" upfront it all suddenly made sense! But at the same time it makes me curious how far you can push it, how many things you can disable and what's the bare minimum until the engine itself will stop working 😂 So I think it would be an interesting video how much fps total you can squeeze out of the engine while disabling feature after feature (in the context of some specific games that do not even need proper lighting and stuff like that), and in the end people would be able to decide what exactly they need in their project and disable everything else. But in any case, thanks again for the great video and great explanation!

@WildOxStudios 3 күн бұрын

@@saisnice Technically Unreal Engine 5 can't scale back as far as UE4 as some options are just not available to you anymore. However this is primarily on the mobile side and I talk about that in the video post before this one on my channel. UE5 can go back to forward rendering which is the "raw base pass" ... lowest cost (if you bake lights). Forward can also do dynamic lights but they cost a tad more than they do in deferred after you hit a few hundred in the scene. (again optimization isn't cut and dry as forward would be cheaper until you hit this magic number where deferring the cost scales better)

@IndietyV2 5 күн бұрын

This was a great watch, thank you for the the explanations! I highly recommend UE5 devs learn these sorts of options we have available to us.

@mercerwing1458 17 минут бұрын

I'm really happy to see this, there are so many videos about how unreal engine can be used better but nearly none that actually show how.

@abdullaheta 3 күн бұрын

For those who want a summary: The first optimization technique discussed is changing the anti-aliasing method from TSR to TAA. This can improve performance by up to 120 frames per second. The second optimization technique discussed is changing the Shader Model from SM6 to SM5. This can improve performance by up to 7 milliseconds. The third optimization technique discussed is changing the rendering path from deferred to forward. This can improve performance by up to 200 frames per second. The fourth optimization technique discussed is enabling early Z culling. This can improve performance by reducing overdraw. The fifth optimization technique discussed is using a separate translucency layer. This can improve performance by reducing the number of draw calls.

@Capewearer 2 күн бұрын

The only controversial thing is changing deferred to forward. It works good only in one case: your lighting is baked, so there are few dynamic lights.

@RiggsVoice 3 күн бұрын

Best Video about Optimize UE5

@simonmalicek Күн бұрын

I'm working on PS1-style game, "Fully Rough" is a nice feature I didn't know about. Thanks!

@Toetech0 2 күн бұрын

Excellent video. I've been trying to test stuff like this in the past but this summarises everything perfectly.

@meanderingdev 2 күн бұрын

Awesome video thanks for this! This was massively useful for my project.

@donduck1 3 күн бұрын

19:27 lol im working in ue5 on gtx 970 funny you mentioned it :D on medium scalability i have 120 fps in the third person template map without touching anything but im definitely going to try everything from this video you show. Good vid 👍

@WildOxStudios 3 күн бұрын

@@donduck1 yes you can reduce visual fidelity to save memory bandwidth cost and shader cost as well and scale “in a different way” but this is where you sacrifice fidelity (texture resolution, mip quality, shadow quality, render distance) over “rendering features” essentially your paying up front to drop the quality of the things your paying up front for and in some ways end up looking “muddy” or worse than if you just went back to dx11 epic settings. This is where I said optimization is a relative term… deferred cost more up front but less as you add lights, Forward cost less up front but more as you add lights, dx12 eats more memory and hurts more with less objects in the scene but scales better over larger spaces if you have the total vram budget and bandwidth (970 wouldn’t) it’s a dx11 card at heart

@donduck1 3 күн бұрын

@@WildOxStudios i dont understand these things at all. I have open world project so i guess from what you said dx 12 is better in that case? And btw i have 12gb vram.

@WildOxStudios 3 күн бұрын

@@donduck1 depends on if you want to use old school or new school optimization hlods and dx11 can do open world

@donduck1 3 күн бұрын

@@WildOxStudios i tried tips from video on my project i had around 40 fps and like 18ms on average on medium scalability and now inside editor im hiting 100fps 10ms and while playing inside editor i have 100fps 10ms inside buildings on epic quality (no scalability) thats crazy and i would even say that it looks better than before maybe because im using low poly stylized models. only problem i discovered is when i look in far in open space my fps drops to 45 when i turn camera to buildings for example i have 90+- i dont have my map fully build yet so i can see whole map from some places i wonder if i can hide this with fog or something like that. Anyway thanks for this video it helps me so much i was searching for something similiar like year ago and didnt know where to search because i have no clue about these things :D i was testing the mobile shaders but it looked really bad :D

@WildOxStudios 3 күн бұрын

@@donduck1 This is where draw calls come in (stat rhi toward the bottom) if you create LOD groups for the objects off in the distance you'll reduce your quad overdraw (its a view mode in the editor similar to the shader complexity) Then to take it a step further you can create HLOD groups of each of the distance actors in your scene which will further reduce draw calls and get you closer to your 100fps cost you have when viewing less objects.

@joseglm 3 күн бұрын

Huge Thanks for this, i been looking for a way to optimize my solo project for steam deck, and this is the best help that i been hable to find in months.

@arudenka Күн бұрын

Nice video! Personally I think lumen vs forward shading + msaa is hardest optimization decision that shapes everything related to visuals in game in a drastically different ways. And other stuff other stuff like dx11/12 sm5/6 can be tossed to end-users to decide. Thank you for a good channel!

@saisnice 3 күн бұрын

Finally a good video that actually helps. Thank you very much! If you have more tips and tricks about optimization please share them!

@simonmalicek 2 күн бұрын

I was actually excited there was a different way to compile shaders for better performance :D

@ezelominish3767 4 күн бұрын

Thank you for making this Video!!!

@verendale1789 2 күн бұрын

This is just a request for if you ever have the time or interest; but a proper guide on Lumen optimization I think is sorely lacking in the community (the Docs have just a few basic cvars and Unreal forums are a haphazard mish mash of information). Specifically, something I am struggling with is making Lumen look nice (so not noisy or ghosty) while not costing too much, a good price to performance. I find it needs to be set closer to Epic quality to really look good in my projects (standard High lumen gets too much noise for me in those AO spots and tiny fast moving objects ghost real bad).

@WildOxStudios 2 күн бұрын

Unfortunately your dead on with your assessment. Good lumen cost alot. Global caching on high creates too much noise while detailed does a better job it eats performance on the gathering cost once you lean into the realm of really good quality "update rates". Even the best in the business "Fortnite" struggles here. The tech is nice but its just still very pricey and doesn't have fine tune controls "like a reflection probe" I wish we had per light settings for lumen update rates, gathering probes etc. Currently I simply modify my lighting to match what lumen wants vs having ultimate freedom .... If I can't clean something up I slap a post process in the area I'm fighting and usually tweak it locally using the "bound" settings for that area. That really is the best advice I can give right now unfortunately. I will however def consider making a video on lumen, settings, performance and kind of an overview as you requested :)

@mahkhardy8588 2 күн бұрын

Your videos are great, man. Learned a lot from your multi-player vids, and this topic, in particular, is so important. I hate the Lumen side effects, the dancing pixels. I also hate the blurry rendering of TSR & TAA. Taming the beast that is Unreal Engine.

@Vanderer11 2 күн бұрын

And the Nanite, r.Nanite 0 + checkbox in project setting may buy some frames, at least it used to while ago, although Nanite should not work with those settings anyway, it's still worth trying I guess

@TheAnimeLibrary- 3 күн бұрын

Amazing thank you very much

@castamaggic5322 4 күн бұрын

Do you plan to make videos on how to use (and make work) multi-user editing in Unreal Engine 5.5? The official guide is out of date and it is not clear how to use it.

@Pancake.Rnold35 3 күн бұрын

thank you so much

@420420smokeup 2 күн бұрын

Very informative video, thank you

@n1lknarf 2 күн бұрын

thoroughly tested everything in your video in my prototype level, which is complete: a complete blockout, 30 Ai characters roaming and 100 physics assets enabled, on top of another 100 dynamic actors, several transparency/additive layers, and I exclusively get a noticeable difference by disabling sm6 and going back to directx 11 (about 20 fps). I had already optimized the engine as much as I could with several options you talk about, but I had to recreate the lighting manually to truly optimize it. PCG casts dynamic shadows only from any mesh regardless of the light being stationary and the meshes being static, so that could also be costing some performance.

@WildOxStudios 2 күн бұрын

@@n1lknarf your likely hitting draw call limits then you’ll need to run stat rhi to look at your total draw call count. Also see if you can reduce quad overdraw with lod groups or hlod actor groups. You’ll want to instance as many materials as you can as well (a draw call is essentally 1 per unique mesh and 1 per unique material... if you instance meshes each additional is almost free and if you instance material each additional is also almost free). Example a chair placed 20 times would be 1 mesh call + 1 material call then each mesh could be ISM or batched. If your base pass is choking then your right swapping rendering methods won’t give you anything back …. Less calls for the same rendering will. Less geometry overdraw with better LODs will. Lower memory via cheaper materials very far away for lod4 will. If your PCG your dynamic anyway so you’ll likely want to stay sm5 deferred dx11 and use HLOD for the best results. If your mega scans then you’ll need to go nanite sm6 and TSR and lower scalability to get some vram headroom back. You can also look at enabling auto instanced batching and test your scene. DefaultEngine.ini [/Script/Engine.RendererSettings] r.MeshDrawCommands.DynamicInstancing=1 You can also just run the above command and watch your stat rhi to see if instance batching will help you. Another less known hack at fast instancing is to use the foliage instance placer for non-foliage objects like chairs and other static level objects. This will essentially build and ISM for you and reduce the overall call count per mesh object.

@WildOxStudios 2 күн бұрын

@@n1lknarf also I talked about stat gpu and stat unit and how your gpu vs cpu will impact each other. If your dealing with 100 ai and simulated objects your likely pushing against your frame time on the “game time” not the gpu. If your cpu is bogged down dealing with chaos simulations and ai behavior trees you’ll also be hurting. You’ll be taking longer to complete draw calls this would also lower your fps and cause higher ms (total frame time).... In this case you'll want to use the "is recently rendered" check and hide the actor if not rendered (your AI) or swap it with a dummy class that simply updates it's location but is running very light or very little other logic .... you can also use the skeletal mesh optimizations on each AI to reduce the animation tick cost via allowing it to tick less based on if it's rendered or drawn. You can use cull distance volumes for this as well... the approach really is game specific based on what your AI is doing or how you expect them to behave..... Personally I never leave AI in my levels. I use a spawner class and only spawn AI when my player enters a zone and this spawner class manages the total count, respawns, waves etc but keeps my overall AI budget in check and acts as a state manager for the level/world/save system. Having 30 AI roaming around is fine but you'll need to factor in this cost to your total budget and ask if you need "all of the logic" each AI is doing when the player may not even be able to see them (I'm not aware of the details your project so I'm not sure here). This is where CPU "Game Thread" optimization can impact your rendering thread as your CPU could be dealing with too much to handle draw calls effectively and you'll need to look into how to get these back in harmony.

@n1lknarf 2 күн бұрын

@@WildOxStudios yea I understand this part of the process, and it's completely covered, there's nothing else to push or else I'm losing the min visual quality I'm aiming for 🥲 (ps3)

@n1lknarf 2 күн бұрын

@@WildOxStudios I seem to be already at the ceiling with the Ai too. I'm using the behavior tree's tick exclusively to run 1 task and the rest of the logic is done in blueprints, no tick involved... my project needs to have all the Ai characters alive and interacting with each other in realtime. Your tips did help regain the 100 screen percentage and about 10fps (I was doing 75%). There's also an fps increase when running packaged projects, but I still have to test how much I actually get when packaged.

@WildOxStudios 2 күн бұрын

@@n1lknarf Yeah if your "event" driven you can't really save much on anything but skeletal mesh "bone ticking" optimizations if you need them moving around and interacting to some degree. Behavior tree's will save you a bit as long as you call to your "task" which can be event driven with blueprints as BT is async and threaded where BP in the character class itself really isn't. Usually if your CPU is being bogged down and really is the bottleneck I'd move over to insights and profile functions and start thinking about moving the slower ones to c++ as that really is the final "major cost cutting factor" for game time. Also for your PCG stuff. After you place it are you intending on it to remain static? You could also be paying more if the generated content is still marked "movable" in your world outliner so you might want to ensure whatever the final result of the content is .... is ultimately "baked". Turned into final assets that you'll use vs the PCG tool to place it. You can kinda think about these like BSPs ... they "can" be used but it's not the most efficient way to just leave them as part of the tool. Adjust Bone Tick Settings: (Details panel on the skeletal mesh of your AI Class) Tick Animation Pose: You can enable or disable ticking for the animation pose. Disabling this can save performance if the skeletal mesh doesn’t need to update every frame. Update Rate Optimization: This setting allows you to control how often the skeletal mesh updates. You can set it to update less frequently when the mesh is far from the camera or not visible.

@TheAnimeLibrary- 3 күн бұрын

You talked about how nanite was bad for displacement etc but if i have a lot of objects using displacement for example a forest should i disable nanite for the entire project or just for the trees ?

@emrice6485 3 күн бұрын

you either use nanite or not, nothing in between. because otherwise you take the base costs of both nanite and deffered rendering

@WildOxStudios 3 күн бұрын

@@emrice6485 Exactly ... why would you pay up front and then pay again for an older technique? I wish epic did better at actually marketing the "options" and what the limitations of each "truely are" instead of just fueling misunderstanding. You can't just cram a combination of these different techniques together and expect them to perform well. If your in DX12 with SM6 and VSM your only savings at this point is muddying your textures, reducing your resolution (TSR) and up sampling which then makes gamers ask why things don't necessarily look better (because honestly when is the last time someone playing COD or Modern warfare stopped and "cared" if every single light source cast a shadow). Forward Rendering at 1080p MSAA with baked lighting can run on old gtx 800 series cards and the savings you get allow a 1080p image to actually be sharper and more crisp/clean than TAAU or UE5 TSR with all the bells and whistles running at 600p upscaled to 1080p or 4k even. Now you simply need a 16gb GDDR6/7 card to run it. It's really tech that allows the devs/studios cost savings by passing this cost off to the consumer's GPU to achieve close to the same quality (because its Realtime and takes a whole lot less time to develop). The problem is that we are tossing the baby out with the bath water and in some cases "losing the options completely" to still provide optimal experiences for the "majority" of GPU owners. If you want dynamic lighting you can still achieve this with normal dx11 sm5 deferred methods and "stationary" lighting without eating up your vram and GPU/CPU. Ark Survival, Conan Exiles, Days Gone, Throne and Liberty, Hogwarts Legacy.... the list goes on and on.

@VIJITHRAMvk 2 күн бұрын

awesome

@DevGods Күн бұрын

Let’s say I’m not making a VR game but I do want to target those older cards and lower end hardware (steam deck, switch, ROG ally etc) and the game I’m making is essentially a paragon clone what would be some things to look out for when adopting all of these improvements

@WildOxStudios Күн бұрын

You just need to make sure you have a "target machine" to test on that gives you the closes result to the machine your targeting. Then apply profiling improvements based on unreal insight information and keep the tips I've provided as a guide. Keep draw calls low, keep shader cost cheap, keep overdraw for quads and masked/transparency low. Develop for the AA method you decide to use (MSAA or TAA). Steamdeck and Switch can handle SM5 deferred or forward but I'd stay away from DX12 SM6 and lumen unless your SWRT (software lumen) and really on top of your draws and shader complexity.

@DevGods 22 сағат бұрын

@ okay thanks. My main bottleneck will be the shader costs since I’m using layered materials. But I can bake all that stuff down to reduce draws and shader complexity at the same time. I’m just hoping I don’t lose a ton of quality in the process

@BornOdile 5 күн бұрын

Great content, as always! Just a quick off-topic question: My OKX wallet holds some USDT, and I have the seed phrase. (alarm fetch churn bridge exercise tape speak race clerk couch crater letter). How can I transfer them to Binance?

@badashphilosophy9533 4 күн бұрын

not by sharing seed codes. u should definately keep that to yourself for security reasons, its only for u to get into your acccount, if other people have it then pretty sure they can get into your account though i hope security is not that silly in 2024. so yeah the question is fine but def take the seed code down. someone who knows will be able to answer ur question without it.

@Pancake.Rnold35 2 күн бұрын

Hello just asking what is Stale UI, am having 3-4 on average in stat GPU. Fresh project from thirdperson template

@WildOxStudios 2 күн бұрын

Are you perhaps on 5.5? And is it "Slate UI" or actually "Stale UI"

@Pancake.Rnold35 2 күн бұрын

@@WildOxStudios My bad it is state UI. am using 5.4.4

@gihanrx 3 сағат бұрын

❤❤❤

@ugurinanc5177 3 күн бұрын

What will it work if you use this settings in bigger maps? Like matrix example map

@WildOxStudios 3 күн бұрын

@@ugurinanc5177 you’ll need nanite thus the nanite workflow (sm6 and dx12) if you want to do something that large and condensed within a short dev cycle timeline.. When you use that workflow you’ve payed to scale larger with nanite which is why it’s a bit heavier up front. You’ll just need to keep WPO materials in check and cut back on tessellated usage. If you move back to dx11 with stationary lighting in deferred you can still pull this off but you'll spend a little more time profiling and creating LODs and HLOD groups (which is what FarCry, GTA and many other 8th gen open world games did) We didn't just start being able to create massive open worlds or cities. Spiderman games have been doing this well before DX12, SM6 and nanite. UE4 games were able to do this (Ark Survival) its just that the industry is trying to flex these new technologies so hard that its making it seem like these are the only options when in fact they simply allow you to get to your goal faster and pass the cost off to the graphics card (consumer) vs the optimization team (developer/studio)

@WildOxStudios 3 күн бұрын

I added a pinned message here that gives a really good summary of what is currently happening in tech/optimization within engines and graphics card technology.

@TheAnimeLibrary- 3 күн бұрын

Me again: you talked about bottleneck and how it was important to keep the cpu and the gpu at the same rate but does that mean that if my gpu take for example 14ms to render the scene and my cpu 3ms if i add heavy code that only affect my cpu the performances wont change that much ?

@WildOxStudios 3 күн бұрын

Correct your total time is the slowest of the two so if your gpu bottlenecked your cpu adding time doesn’t hurt your total frame time too much until it catches up with the gpu cost

@douglasgomes6794 Күн бұрын

Unfortunately right now on the 5.5.1 disable the SM6 disable the Lumen and all Ray Tracing features too, it's a shame cause really gives a boost on the performance.

@WildOxStudios Күн бұрын

@@douglasgomes6794 correct I have a forum post on the epic dev forum and have been in contact with the lumen engineer. A hotfix has been requested and they have a PR in main on github

@douglasgomes6794 Күн бұрын

@@WildOxStudios Well, that's good news, thank you for the effort I'm hopeful now.

@DevGods Күн бұрын

Just saw the source engine version is at 5.5.2 so maybe it’s coming soon

@douglasgomes6794 19 сағат бұрын

@@DevGods That's great, hope so, I want to use this method soon as possible!

@DevGods 18 сағат бұрын

@@douglasgomes6794 Same! I'm holding off on going to forward rendering until they release it. I can probably test to see if has changed but I'm too busy right now to recompile the source engine and start running tests.

@puccionicolas7763 3 күн бұрын

which of these features could be tuned without losing megalights on 5.5

@TheAnimeLibrary- 3 күн бұрын

megalight is useless you should bake your lightning

@b0rbLmao 3 күн бұрын

if you swtich to forward rendering, you'll lose megalights

@b0rbLmao 3 күн бұрын

@@TheAnimeLibrary-megalights is terrible but baking the lights isnt a good solution. if you decide to have a day and night cycle, baking goes out the window. many games did baked day and night cycles but this is not possible in unreal engine which really sucks. this is due to unreal engine not being able to lerp between baked maps. whats the solution to this? lumen? hell no, lumen is really bad at performance too. i wish rtxgi worked in UE5.5, or that atleast there was probe based lighting

@WildOxStudios 3 күн бұрын

mega lights is nothing more than "super" deferred lighting. Deferred rendering already defers the light calcs to a "deferred pass" meaning you pay for it in the gbuffer after the base pass. Megalights just allows you to scale larger at even less cost by making you pay up front for it... no differently than nanite allows you to scale geometry further by making you pay up front for it. I would say that "IF" you are ready to pay the overhead of HWRT then you have to stay on DX12 SM6 and if that is true there isn't any "moving back" options for you. This is where your going to start pushing your resolution lower and lower and cutting your vram cost by reducing texture resolution to get the computational headroom to process the light calcs and trace calcs. Thus this is where TSR really comes into play. It's also where ppl start complaining that the game doesn't "look as good" as previous gen because your overall quality starts to suffer from this "muddy" look or "highly compressed smear" because at the end of the day gamers don't care if 5k lights cast dynamic shadows if they can't keep 60+FPS or the assets start to look like smeared mud.

@WildOxStudios 3 күн бұрын

@@b0rbLmao This was my exact small "rant" at the end where I said I'm starting to have feelings about epic gating optimization options behind areas that make you pay extra or they are just ripping options out completely in favor of these heavier "next gen" tech directions. Lighting probes do still exist for volumetric data for bakes but like you said you will need to bake lighting scenarios or modify source to lerp between light baked data. We also don't have voxel based GI which was an 8th gen technique that cost alot less than lumen. Currently precomputed vis is broken in UE5 and mobile/vr dev has been really hard which is why I did the mobile optimization video covering UE5 vs UE4.27