Batch Rendering - An Introduction

Рет қаралды 97,357

Күн бұрын

Patreon ► / thecherno
Instagram ► / thecherno
Twitter ► / thecherno
Discord ► thecherno.com/discord
Series Playlist ► thecherno.com/opengl
#OpenGL #BatchRendering

Пікірлер: 118

@laurensscheldeman4121 4 жыл бұрын

"Hey! what's up guys, my name is The Cherno and welcome..." Now I can press play and continue like nothing has changed

4 жыл бұрын

Just finished text rendring on my own library and fps dropped from 5k to 500 just by drawing a paragraph. This series hit just in time

@RedCraftRPG 4 жыл бұрын

If you are using OpenGL have a look at glDrawElementsInstanced or glDrawArraysInstanced. I wrote a text renderer that renders quads using triangle strips, so therefore i don't need indices and save on memory.

@RedCraftRPG 4 жыл бұрын

@Svetlana V If you write them all into the same vertex buffer and use a divisor of 0 then yes, that's true. Instanced rendering allows for attributes to be "per instance", therefore you can have one vertex buffer with 1 attribute (position, probably 0,0 0,1 1,0 1,1) with a divisor of 0 and then another vertex buffer with multiple attributes which store information about the position and the uvs, but with a divisor of 1. That way these are used per quad and not per vertex. In the shader you can then position the quad by offseting the position by the values stored in eg. attribute 1, and the UV coordinates can be calculated by linearly interpolating between the two uv coordinates that make up the glyph using the position as factor (eg. vec2 uv = mix(corner1, corner2, position)), since position, corner1 and corner2 are all vec2's it is supported by GLSL. Corner1 and corner2 would be stored as vec2 in attributes 2 and 3, or as vec4 in attribute 2 and you can swizzle when mixing.

4 жыл бұрын

@@RedCraftRPG i know the existence os instanced render calls but i have no idea how to render stuff when they change textures. Hoping this series will shed some light on the topic

@RedCraftRPG 4 жыл бұрын

@ Why would a text change it's texture? If the text changes you just generate a new text model and delete the old one. Libraries like NanoVG even generate new models every draw call because it is very likely that if you draw a text again it has changed, although i'm not very sure how much this is true for game development.

4 жыл бұрын

@@RedCraftRPG Well, let's say I'm rendering "hello". I first need to render a quad using h then another quad using e and so on. So the texture of each quad in my text must change in the draw call. I guess I need to upload all the letter textures as a uniform and select the correct the texture during the fragment shader. I couldnt set this up.

@asmaloney 4 жыл бұрын

Been following the OpenGL playlist. Did I miss something or did you just jump to a totally different codebase from the last video?

@Hector-bj3ls 4 жыл бұрын

I have to say, I really like this newer format. I've been watching your videos for a long time now and when i was first learning OpenGL i liked the full line by line, from scratch type videos. Now i know some things, however, I think this is better for me. I just watch your stuff on the bus on the way to or from work. Thanks :)

@lucarox1818 4 жыл бұрын

Going through the theory with a pen and paper before diving into the code worked very well. Also, having the sandbox code as a starting point to work off of is fantastic. As always, well done mate.

@wireghost897 7 ай бұрын

?? he literally wrote 3 lines on paper. wtf are you talking about?

@allanrocha4647 3 жыл бұрын

Trade off between memory and performance, for the case where we have the same geometry instance rendering is the way to go.

@rickyspark3364 4 жыл бұрын

Great explaining! Loved the on paper theory / code demo duality

@thehambone1454 2 жыл бұрын

I can't thank you enough for this, implementing this soon!

@harshrathod50 4 жыл бұрын

Watched all the Hazel videos but this was the one that I actually understood everything till the last minute.

@vamidicreations 4 жыл бұрын

Amazing!! Can't wait for next video! You actually applied the feedback (pieces) from the community. At least in my opinion !

@user-jn9pg3lx7g 4 жыл бұрын

like your series! i would love to see your approach to ECS :)

@zaaimhalim6212 4 жыл бұрын

the best channel so far THANK YOU man

@PrepStorm Жыл бұрын

Listened to you talk for a few minutes, put on some coffee in the meantime with my headphones on and the music in the middle of the video scared the shit out of me when it started

@StaticSchematic 4 жыл бұрын

Yo Cherno, I like the way you are presenting this subject. You explain verbally what it is you're trying to accomplish, show diagrams to reiterate what you're talking about and to help form a picture of what's going on, and then you show the code while explaining how the code relates to the concepts you were describing earlier. :thumbs up: I think you should apply this same approach to your other videos, it's a lot easier to follow what you're saying and doing. Also, I second the idea of you covering an entity system and how one would go from building up entities --> managing them --> rendering them.

@rokkos 4 жыл бұрын

+1 Totaly agree with you! Really quality content and I like "new" style of explaining is more condensed and straight to the point. You can see that Cherno put a lot of work in it and pre-rehearsed rather than just jumped in code and try to recreate stuff from his dev branch. I also think that coding and talking can be really hard... So all in all really good content keep it up Cherno!!!

@Wimachtendink 4 жыл бұрын

I'm glad to see that he has good taste in pens.

@gregorybell4175 4 жыл бұрын

Im glad someone else noticed it lol

@feschber 4 жыл бұрын

I heard about this before but this is just an awesome explanation - makes it seem like such an obvious thing to do. Basically trading memory for cpu time.

@thijsjansen2052 4 жыл бұрын

*trading memory and cpu time for gpu time

@befikerbiresaw9788 3 жыл бұрын

Hey man, can you make a series about camera and lighting in opengl. Specially camera rotation and look around. I found that you can explain things well and if you have the time for it it would be great. And this are topics that are very important and under explained. Thanks for explaining things so simply and easily.

@Corgamos 4 жыл бұрын

I like this series :) just wondering when will you implement it in hazel? And also I would love to see a video about ECS, that would be great!

@ayoubbelatrous8080 4 жыл бұрын

you dont have to worry about ECS it just technic to orignize memory

@JayAnAm 3 жыл бұрын

Ok nice, really simple for the positions, but will you also come up with an example for dynamic colors for the vertices in the batch? - ah ok, found it, thx!

@totallynuts7595 4 жыл бұрын

That guitar hang at the end tho

@bies_moron4404 4 жыл бұрын

Thank you Cherno, batch rendering is awesome !

@monchytales6857 2 жыл бұрын

literally clicked on this to get a better understanding of how to make games only to realize I've already started doing a rudementary batch system for my fonts by having my system determine which quads need to be drawn and just adding them to a large vertex array, instant speed boost

@AlexZillion 7 ай бұрын

Great video!

@TheWhitde 3 жыл бұрын

For my engine (Goblin 2D+) had been using a scene manager to sort by mesh and distance then render instances of the same mesh. Worked not too bad as each mesh was usually used multiple times. Thing is... because of shadows I have to do 2 passes per frame so any inefficiencies are doubled. Still not too bad but there were some objects that were "1's". These "1's" are meshes that are only rendered once. (as an example. My turret is made up of 12 difference meshes of which some are rendered 4+ times and some just once). First step to solve this was combining a lot of 256*256 textures onto a single 1024*1024 texture with a good texture atlas. This by itself didn't do much as each mesh was rendered in an instanced batch still. Took about a week of reading and research but finally decided on a "hack" with structured buffers (which I had never heard of at that stage) that had multiple meshes in a fixed length buffer. Each mesh could in theory be accessed as the instance buffer now had a "model_index" field that was used to access the correct vertex for the model and vertex number. The problem I had was vertex counts were between 200 and 2000 and a fixed size meant the count/fixed size had to allow for the largest (2000 in this example). Any mesh with a size < the 2000 would be padded with degenerate triangles which would not be passed to the pixel shader. Phase 1 took a few days but got it working with a small subset of models as a proof of concept. Ended up creating a class that combined any mesh with a given criteria that had access to my model manager class. Instead of 1 batch for all sizes I decided on 3 batches of varying sizes that meant 30+ batches became 3 efficient batches rather than a single inefficient batch. Batch 1 is for meshes with 0-499 vertices, 2 is for 500 to 999 vertices and 3 is for 1000-1500 vertices. Have now cut the number of batches per frame from 50ish down to 16 (myGui is responsible for 5 of these). This now means I can add more variants of models without adding the the batch count. Another big thing for me was swapping from qsort to a radix sort for sorting the scene. The radix sort is lightning fast as it's on a single 64 bit integer that's made up of a 32 bit mesh number and integer distance of model to camera * 1000.

@LuisAngel-qm6vl 4 жыл бұрын

Hey Cherno! I'm loving your videos. I've notice that in this new base library for OpenGL you use the OpenGL function glDebugMessageCallback() when debugging. You mention back in "Dealing with Errors in OpenGL" that you would explain these kind of functions. Could you do it soon? I'm quite interested in this topic

@temych02 4 жыл бұрын

Thanks for your videos! The most useful videos could be found only on your channel!

@dr3d3d 2 жыл бұрын

What happened? I was following the entire playlist from the beginning and now your codebase is suddenly different. Edit: I figured it out, the Video he mentions in the middle is the missing video from the Playlist and you need to switch to that framework... since time has passed its not a perfect 1:1 but its close enough.

@hrvojebolesic9516 4 жыл бұрын

I think that somekind of blackboard/whiteboard would help you explain the concepts that you are trying to explain by waving your hands :). Great job, keep up the good work!

@had1t1x85 4 жыл бұрын

"Hey what's up guys...."

@zuhail339 3 жыл бұрын

I'm so thankful

@tomascabrerizo647 4 жыл бұрын

great video, thanks!

@rastko614 4 жыл бұрын

I don't really know how or why i got here, but this guy is really interesting to listen to

@rs4artha 4 жыл бұрын

Love your videos brother......

@nikolaiarsenov1595 4 жыл бұрын

The following Batch Rendering technique works perfectly for sure! However, WebGL for example, has a limitation for vertices amount per drawCall (~64k points). Nice video! 😊

@Girugi 4 жыл бұрын

You realize that is still thousands of quads per draw right? :D probably more than you want anyway. For other reasons. If you talk real 3d environment batching though, 64k can easily be hit. But also easy to design the system around :) though might of course not be as optimized as you can get it for other platforms.

@nikolaiarsenov1595 4 жыл бұрын

Kakure this is true, fully agree, it’s a great hit. I was developing in production data-visualization software and we didn’t know how this technique is called, we just started to chop the data into groups :D I would say this thousands are not enough in the era of Big Data.

@Neran280 3 жыл бұрын

Hm i have a few questions about this Are glDraw* calls really that expensive or is setting the uniforms for each geometry the slow part? When loading geometry from a file, the positions must be multiplied with the model matrix before loading into the vertexbuffer? This example was about static geometry batching so these do not move and the buffer does not need to be updated, correct?

@hashcash7340 4 жыл бұрын

thanks for the great work. more c++ series videos for beginners pls

@rickarmbruster8788 4 жыл бұрын

i like both the advanced and the beginner stuff, but as i am a advanced programmer i take more of advanced

@MiRRoRRek 4 жыл бұрын

I love u bro

@jamesrush6219 4 жыл бұрын

YES!!!!!!

@dudezuper3240 4 жыл бұрын

Waiting for more videos on your game engine

@sam_is_people1170 2 жыл бұрын

thanks!

@lastyhopper2792 Жыл бұрын

I can't seem to understand that outro of yours... The batch rendering explanation though, I can :)

@KennyTutorials 3 жыл бұрын

Is it possible to group other types of geometry such as a line or triangle, or maybe even a single point in a single batch? Just for quads, we always use 6 indexes and repeat this process for all the maximum quads in the batch, and this is not particularly suitable for a line or point, because a point should not have any indexes at all, since it is a just point, a vertex. The line is also just 2 vertices and maybe we will need one index to connect them, but then again there is a problem, because by default we load 6 indexes for each primitive, there is no other way out, since we also want to use our quads. Perhaps we can somehow skip 5 indexes for a line and 3 indexes for a triangle, and already knowing what type of geometry we have, we can simply add vertices and fill them with data (positions, uvs, etc.). Or would it be better to just create another batch specifically for lines, points, triangles?

@shavais33 3 жыл бұрын

I'm looking for a way to apply many different model matrices to a handful of different quads in a single draw call. I thought about using a vbo with just position, rotation and object type, and having a geometry shader produce the quad and the associated texture coords, which would map into an image atlas that is in a single texture. But I've read that geometry shaders are really slow. And I've read that using arb instanced method can also end up slower than just plain unrolled vertex data. But unrolling means applying all those model matrices on the cpu side which just does not seem like it can possibly be faster.

@santitabnavascues8673 10 ай бұрын

Give a unique ID to each quad and use a uniform buffer with the array of transforms you want to apply to each quad, referencing them through the quad ID

@shavais33 10 ай бұрын

@@santitabnavascues8673 So each vertex buffer element has the quad id and the transform to apply to it, and the shader has hardcoded knowledge of all the quads' dimensions? Is there a way to get that knowledge into the shader without hardcoding it?

@santitabnavascues8673 10 ай бұрын

@shavais33 it is a technique similar to shader skinning, but in this case, each vertex buffer element has a unique ID, a number. Then, on the uniform buffer you have an array with all the transformations, which you can access in the vertex shader through the quad ID. That being said, I suggest you to take a look at the Ocornut IMGUI gui library. It builds a large buffer of quad elements, all of them of the proper screen size, that renders UIs with seemingly very low overhead. It batches all the elements into a vertex and an index array that you can stream per frame, achieving very notable framerates, each quad also uses a font texture, in a similar fashion as what you ask in the first comment, so you can render text like that, among many other things.

@shavais33 10 ай бұрын

@@santitabnavascues8673 I didn't know you could pass a buffer as a uniform? And access it like an array in the shader? That's very interesting, that has a lot of potential uses, if it performs well. Even small buffers could be very useful. I have little spaceships and shots flying around, explosions, things spawning and despawning and going on and off screen, so it's a bit different use case than UI in general (although I also have a UI). I have to do a bunch of updating between frames, but I'm trying to minimize that and do it with SIMD as much as possible, and do as much as possible in the shader.. I think.. well, I'll have to see which way works better. That's an interesting thought to pass the quad dimensions in a buffer as a uniform to the shader. But you'd have to do it every frame, right? It would be smallish I guess, compared with, like, texture data, but.. I would not expect it to perform as well as just having the dimensions hardcoded in the shader. It's tempting to create a build step that generates the shader code from a template that has most of the code in the template, but has a replaced string for the quad dimensions, which would come from the image atlas data, which is already getting exported to json files by my atlas generator. I'm somewhat familiar with IMGUI. Historically it has been too slow for my purposes, but I've heard that it's performance has improved in recent years. I do have a UI with thin borders and such, maybe a little bit of text, and one of my difficulties is that the user can resize the window, so.. I'm facing the need to maybe use SDF's for the UI elements. I would like to continue our conversation, do you have a discord? Mine is shavais_zarathu.

@santitabnavascues8673 10 ай бұрын

@shavais33 uniforms have their limits too, but they can store quite a lot of data, at least 16KB guaranteed by OpenGL standard, I can't tell about DirectX. I suggest you take a look at how uniform buffers work, but really, even on old DirectX 9 grade hardware I could pass 5000 particles as a quad array, so, I think that even on the CPU, you can have lots of quads generated and rendered in a single go, the gist of it is that you draw the whole set as a single draw call. And no, I don't have discord, sorry 😅

@sajalgupta1255 4 жыл бұрын

Please put a video on cpp storage classes..

@viko1786 4 жыл бұрын

I was wondering. Did you import the colors of the text in VS or did you make it yourself? And is there a way to get your colors, because they help so much at making a difference what is what

@ianjames5940 Жыл бұрын

I'm not a fan of this moving off of the all of the code I worked on up until this point and into something much bigger and more complex with a lot of things I didn't write and don't necessarily understand. I was really enjoying the "build it yourself one line at a time" approach and to suddenly switch over and abandon all the old code is a bit frustrating.

@raduhabinyak6860 4 жыл бұрын

Is there any way to create a pattern that just draws a prespecified quad at specified points or something like that so you only have to feed it on quad, something like a tile map.

@totallynuts7595 4 жыл бұрын

You could make a quad object that is just a bunch of data and a pos variable, and keep adding the same quad to the batch while changing the pos variable between batches. Might be a little slow though

@ari6448 4 жыл бұрын

Hi, I've quite interested in game programming, but I don't really know what to do. I have been learning C++, but I'm not sure how or what to use for game development. I'm don't really like using engines like Unity because it's quite awkward and limiting to use, but I'm not sure where to go. Should I use OpenGL for this(like a simple 2D game), or is there something better?

@unsafecast3636 2 жыл бұрын

Hello! I know I'm kind of late to the party, but we have glMultiDrawIndirect as well. Is this any different? And I'd love if you would be able to cover this too, there is a complete lack of documentation for these functions. Thanks!

@lengors1674 4 жыл бұрын

How do I know if I should use gl_dynamic_draw or gl_stream_draw? I mean, I know the difference, but I still don't know how to decide which one to use... what would be your approach to this?

@feschber 4 жыл бұрын

Just commenting to see answers :)

@patrickwildschut5750 4 жыл бұрын

Maaaannn no intro D: (Loved the rest of the vid though)

@mclegendxtra2942 3 жыл бұрын

I thought he meant rendering in a .bat file. There goes all my dreams. oof

@dragonminz602 4 жыл бұрын

Very interesting

@satyammishra5582 4 жыл бұрын

ok so isn't it same as Instancing technique? btw, awesome vid as always❣

@hjups 4 жыл бұрын

This is different from instancing. Instancing repeats the same geometry over and over, while batching can be used for different geometry (i.e. if you wanted to draw different shapes and not all quads). I believe that you would have a similar effect with the examples given in the video though. It's been a while since I have done this stuff, but from what I recall, indexing would be the original list of 4 vertices and 6 indexes. Then your draw call would tell the GPU to repeat the stream N times (for N instances), giving each instance a unique ID. You would then create an array of transform matrices to upload to the vertex shader, which would do the transformations for you based on the ID. Obviously that wouldn't work if each shape was different.

@satyammishra5582 4 жыл бұрын

@@hjups yehh, instancing is different technique, and you've mentioned it right, thnx for clarifying

@robertmoats1890 11 ай бұрын

I know you probably get a lot of comments from viewers who are not interested in building game engines, so I just wanted to let you know that there are still some of us engine developers out here. We really appreciate these videos. There are not a lot of us out here these days. Most of the sane developers have hopped over to Unreal or Unity at this point, but there are still some of us out here building our own engines. Those with questionable sanity and stubborn fortitude.

@hjups 4 жыл бұрын

What about the overhead of transforming all of the vertices in the array, as well as culling? By sending all of the vertex data at once, you must first transform the quads using the transformation matrices, which would need to be multiplied through on the CPU (effectively doing all of the work that the vertex shader does). If you generate your quads once at load time, then that's won't have a significant impact, but if you are doing it every frame, the overhead of the matrix transforms may be larger than multiple draw calls. Furthermore, if you have a large array of quads that you are drawing, for multiple objects, then your aggregate object will become larger, and you will be unable to break it up to do things like frustum culling, which will result in unneeded GPU. What about using instanced rendering, where you upload a large packet of transform matrices to a vertex shader? How would that compare? You would still have the culling issue, however, you would allow for the parallel nature of the GPU to compute the new transforms. In fact, if all of your instances are only translated and scaled uniformly, then you could reduce the memory footprint to a single 4-vector per quad (isn't that how billboarded particles typically work?) Obviously instancing wouldn't work for objects with different geometry, but the examples you gave were simple quads. I wonder if you could combine some sort of hybrid method, where you instance a quad (or a triangle), and use a method similar to skinning with bones to re-assemble the instances into multiple objects on the GPU. That wouldn't work for smooth models, but could work for the faceted low-poly style.

@Girugi 4 жыл бұрын

Most of it is not an issue. For a sprite/tile map you usually only work with position offsets. That is cheap and your CPU can do it no issue. And you can do it in chunks of 1k sprites at a time, that way you can cull them as well. And for static or rarely changing environmental stuff you don't have to do it every frame. As for text as well, it's very similar and only need to update when you change the text. And even if you go with a simple 2d transform matrix for thousands of sprites, your CPU can do it quickly, and if you want to optimize you can vectorize that math. Perfect with 4 vertices per quad and the 4 float simd. One draw calm is very expensive for the cpu. So you can justify a lot of work in minimizing the calls. If you reeeaally want high performance sprites, you can do it with geometry shades or point sprites. But they have other limitations, and then it's probably also more efficient to simulate and update positions and such on the gpu too. Like for gpu particles. It's a big subject though, and your questions are valid. But it's a no issue in most cases as it's always better than the single draw calls, and if you want really good speed there are a lot of related techniques. Batch rendering is the best for more static/rarely updating things like text and backgrounds.

@hjups 4 жыл бұрын

@@Girugi For a sprite / tile map, as well as text, would you not be better off with instancing still? Especially if the quads exist in a grid, then you don't even need the transform. I'm not convinced that the overhead from a draw call would necessarily outweigh the cost of doing the matrix-vector multiplication every frame (to update a dynamic list). I'm sure there is a break even point, but it may be much smaller than you think (for example, a batch list size of 100 particles could be the break-even point, where you start spending more time on transforms than you would function calls of multiple sets of 100 particles). Another issue to consider, is that depending on the hardware, there will actually be a limit to the number of triangles that can be drawn in a single call. If you end up creating a batch list larger than that limit, then the driver will have to do extra work to split up the draw call (which in the end will probably be far more expensive). Also, you can't always do the sprites in the geometry shader, since not all platforms support it. For example, many embedded GPUs only support vertex and fragment shaders (those are also the ones which will have a much lower triangle per draw call limit, since the entire stream will have to fit into an attribute FIFO in the GPU).

@Girugi 4 жыл бұрын

@@hjups you are not tackling the issue the right way. You need to consider the real use cases. You would not update a tile map every frame. You would only build up chunks close to the camera and then let them be. Using a camera matrix to transform the whole chunk as a whole on the gpu. Also, instancing is not without its own issues and overhead. It's not free. You would never use instancing on such low poly objects like a 4 vertex sprite. And letters/text rendering is not that simple at all. Real text rendering need formating, kerning, line breaks and so on. You will absolutely need every letter to have unique coordinates and even take into account the letter before it. And sending that in an instance array buffer is a lot of data to write up and send as well. Trust me. I've been working on a professional game engine and written a font and text rendering system, as well as a 2d batching system for sprites that was pretty much free from a performance stand point. Also been writing 3D batching systems and used instancing for various purposes. As for geometry shaders. Sure, not all platforms support geometry or compute shaders. And then your best option is absolutely point sprites for small particles. But due to some limitations, larger particles would absolutely be batching based. There are still many options for the details though. Like you could do sprite expansion in the vs, and just set up the data in a big bathed vertex buffer. But you would absolutely not do 2d particles with instancing. If the platform is low end enough to not have gs, it will surely have a bad enough overhead on instancing to not be worth 4 verts too. As for limits on vertices. For sprites you will never realistically run in to it. Even if you just have 16k verts. You get maaaany sprites for that. And you would probably not batch more than about 1k per batch anyway. Of course depending on your individual use case.

@hjups 4 жыл бұрын

@@Girugi While what you said is true about tilemaps, it wouldn't be true for general geometry or sprites / particles. I'm not convinced that instancing for a 4-vertex sprite would cause a significant performance impact. That would depend on how the GPU is designed. Keep in mind that typically, they will do instancing in a scatter-gather mode, so the overhead would be identical to drawing them in a batched operation. Although, the difference would be the extra compute time to create and multiply through the transformation matrices for each vertex. Depending on the hardware, it could potentially be more efficient to draw them via instances, due to the limited size of the attribute buffer in the GPU. I would have to see benchmarks comparing the different methods on different hardware to trust that conclusion. As for formatting, kerning, leading, etc. You could implement that via a array of multi-parameter structures in a constant buffer. Two 4-vectors would be less memory than the four 4-vectors needed for a single object. As for anecdotal evidence, it may be free from your perspective, but all that means is that the impact was not noticeable. You would have to benchmark the different options to say for certain, although, that's probably not worth the time / effort if you do not see a performance impact right now. Without a geometry shader, you could still do large particles using instancing, in fact that would probably be far more efficient, since it can use the camera matrix to align the billboard in the GPU. You wouldn't be able to simply feed points to the GPU though, and expand those into quads in the geometry shader. What is a "point sprite" in the context? Are you referring to drawing single-vertex points with a size? That probably wouldn't work in most cases, since they would lack detail. I think you over-estimate the low-end embedded GPU hardware like what's in the Raspberry Pi 2/3. I don't believe those can handle more than a few K verts at once, certainly not 16K. Going back to the overhead. I have been working on designing / implementing a shader based GPU for configurable hardware (for an FPGA), which has required a deep dive into how the devices are architectured. There is no significant additional overhead for instancing vs batching. It's possible that some additional overhead was introduced when the architecture transitioned to unified shaders, however, I doubt that is the case. Perhaps the overhead you are referring to is in the driver? Though the hardware should be capable of implementing instancing directly.

@matthewexline6589 5 ай бұрын

So in Batching, you are in fact sending a NEW vertex buffer to the gpu for each draw call, is that correct?

@Mystixor 4 жыл бұрын

Will you be using a Texture Atlas? It might be the strategy I should go for in my project

@QckSGaming 4 жыл бұрын

Texture atlases are pretty industry standard. Use it.

@Mystixor 4 жыл бұрын

@@QckSGaming Yeah I guess that's why it seemed like the solution suggested by a quick search on the web. A video b by him would be really neat but I should do it one way or another

@xeridea 2 жыл бұрын

The issue isn't the GPU, it is the CPU, and API. Before DX12, and Vulkan, due to abstraction, the APIs were, in practice, mostly single threaded, with heavy API overhead. This greatly limits draw calls you can do per frame to keep good performance. Batching things can significantly reduce draw calls. Batching reduces GPU workload somewhat, but mostly it is the CPU. Either way, it is definitely good to do.

@santitabnavascues8673 10 ай бұрын

Even though vk/dx12 remove driver overhead and have support to parallelize command recording, batching definitely helps, and more, you can defer the batching to a compute kernel and generate the batching entirely on the GPU

@Guopeiran 2 жыл бұрын

Is drawing a QUAD a kind of batch rendering two Triangles?

@meanmole3212 4 жыл бұрын

So you do all the position transformations on CPU then?

@MakuDraw 3 жыл бұрын

I wanna know this too :(

@pelegini50 4 жыл бұрын

How can I get your VS settings? (Colors etc.)

@animania6947 4 жыл бұрын

When ???

@homematvej 4 жыл бұрын

2:48 did he say the bottle neck is GPU in case of large amounts of drawcalls? Cause almost all the time the bottle neck in this case is CPU.

@thijsjansen2052 4 жыл бұрын

yes, when drawing large amounts of quads without batching the gpu has to run for every quad separately and wait till the previous quad is drawn whilst with batch rendering the gpu has to only run ones per frame and that is a lot faster because the gpu is very good at drawing multiple triangles in parallel

@homematvej 4 жыл бұрын

@Genosse MendesI don't. You can ask some rendering engineer, or do tests yourself. Even on devices with unified memory CPU processes DrawCalls at least two times slower than GPU. Something might change with DX12/Metal/Vulkan APIs, but with older APIs drivers had to make huge amounts of preparations on CPU before the call.

@hjups 4 жыл бұрын

@Genosse Mendes I did a quick search and was unable to find a good reference describing the overhead. However, you can look through the MESA3D driver in Linux, tracing the glDrawElements function, and see the rabbit hole of function calls + extra overhead. Aside from the function overhead, you also have the memory transform overhead to the GPU, to issue the drawing commands. While it's difficult to saturate the PCIe links in modern devices, they are not instantaneous, and the commands must be copied before the GPU can start drawing. For the above, you could look at the latencies, where for N drawcalls for M vertices, you have t = N*(driver_overhead + dma_overhead + gpu_startup + M*vertex_cost), vs t = driver_overhead + dma_overhead + gpu_startup + M*N*vertex_cost. If it's a dynamic list, you also have dma_cost*M*N in there (which is the same for both). Notice that M*N*vertex_cost happens in both, so unless M is very large, there will be a noticeable speedup by combining the draw calls (though that's assuming the vertex list is already generated and that doesn't add extra overhead).

@silvalgalewalker503 Жыл бұрын

I have a question how many texture slots should I use for a game? My laptop has 32 texture slot but I don't think using all of them is a good practice.

@_EnVyUs 10 ай бұрын

the smaller the better, but don't worry if there are 6 or more

@bluesillybeard 2 жыл бұрын

What if instead of using a uniform for the transformation, I use a layout? It seems Inefficient though, having a whole transformation matrix for every point in the geometry. Also, I just realized how obscenely inefficient my current engine is. It binds the shader, then the vertex array, then sets the transforms, then renders, unbinds them, repeat for each object.

@milo20060 Жыл бұрын

So basically. How I would see it be useful if the objects are static. Otherwise the vertexbuffer needs to be set somehow dynamically?

@Netryon 9 ай бұрын

I want to make them not only for me but for everybody in camp so this is what I do, I do batch/bulk rendering in a car of your divine selection. Should you try to play a game and like companies that are not hiring you? This is so better then your designed in on a sheet you once slept on and had no debugger, because i'm not carrying a thousand dollars worth PC. I have a priceless pen. That UML drawing is not machine made. Don't you dare to try it, unless it's C# for your mobile game and not longer than ten minutes.