Doubling the speed of my game's graphics [Voxel Devlog #18]

Рет қаралды 15,373

Күн бұрын

Online demo: github.com/DouglasDwyer/octo-...
Additional voxel models: drive.google.com/drive/folder...
With some clever tricks, I've managed to halve the frame times in my voxel game engine! Join me as I explain three essential optimizations for voxel rendering. These optimizations include using DDA for traversal, bitwise masking to filter out potential intersections, and performing a low-resolution depth prepass. In addition, I talk about ray marching in general and discuss the other new features that I added to my engine this month.
Music used in the video:
Chris Doerksen - Perfect Parry
Corbyn Kites - Orbit
White Bat Audio - Athena
White Bat Audio - Chroma
Chris Doerksen - New Groove

Пікірлер: 133

@DouglasDwyer 19 күн бұрын

Want to learn how to write powerful, high-performance code? Then be sure to check out CodeCrafters using the link below, and help support my channel: app.codecrafters.io/join?via=DouglasDwyer They have one project which is completely free to complete during their beta, and you can begin any of their projects for free! Get 40% off if you upgrade to a paid account within three days.

@dleiferives 19 күн бұрын

It’s amazing how much you’ve grown as a programmer. Keep up the good work

@DouglasDwyer 19 күн бұрын

Thanks! I intend to do so!

@GabeRundlett 19 күн бұрын

Great video as always! A couple of suggestions: Regarding using DDA on multiple levels aka a "hierarchy"- HDDA, Ken Museth, implementation in OpenVDB. The basic idea is that the DDA process is pretty much restarted when the hierarchy level changes. An alternative (if you had a shallow tree) would be to have a stack of the DDA traversal state, one copy for each level. This alternative may be too register heavy (and might benefit from the stack being in shared mem instead). Regarding the voxel memory layout, you show that first comes the 64-bit bitmask and then the data segment. This means that when you fetch a brick, you always load 8 bytes of bitmask data, as well as an additional 120 bytes of data, to fill the cache line. This means that 93% of your cache during traversal will be filled with these data segments!!! try to pack at least 16 bricks together so that you fill all 128 bytes of the aligned cache line with bitmasks!!

@andreirost4696 19 күн бұрын

Eyooo hello

@oamnog 19 күн бұрын

Ohh I get it, so that the cache isn't wasted on data segments that may not even be used as the ray may skip the voxel brick, and also the bitmask data of the surrounding voxel bricks are loaded ahead of time to be used spontaneously, and reducing updating the cache frequently. Determining the neighbouring 15 voxel brick bitmask data to store could be determined using 2x2x4, 1x2x8, 1x4x4 and 1x1x16 bounding boxes that can be offset and rotated based on the ray general direction and position. The ray's position in the bounding box would prob be in the far corner, and the shape of the bounding box (1x1x16, 2x2x4, etc) is determined whether the ray is moving closely axis aligned, or sharply in two axis, etc. You would prob need to take into account the relative hierarchical voxel size that the ray is currently at. But that approach sound complicated, and you could prob just pack 64 bricks' bitmasks together into 4 128 bytes aligned cache line (but idk how cache works so i may be wrong).

@DouglasDwyer 19 күн бұрын

Thanks for the suggestions! I'll be sure to check out the OpenVDB implementation. As for cache lines, that's a very good point. I'll have to experiment with alternate data layouts in the future. But there's one nuance that I didn't mention - data segments are variable length. So if a brick has a single non-empty child, then the data segment will only be 4 bytes. But coming up with a way to pack the bitmasks (maybe I could store the bitmasks inline alongside each child pointer) might also help with cache coherency! That said, I don't really think that I'm memory bound right now :)

@pusab2064 17 күн бұрын

This is some crazy low level thinking i aspire to be like this

@JoeTaber 16 күн бұрын

@@DouglasDwyer You can be limited by memory bandwidth separately from being limited by memory capacity. :)

@frozein 19 күн бұрын

Great stuff, the bitmasking optimization was really clever. As for you hierarchical traversal question, it is definitely possible to avoid the ray-cube intersection test. The basic idea is at every step, you first descend as far down the tree until you reach a non-empty node. Then, perform your DDA step within that node, and if the ray exits the current tree node, ascend back up the tree until you are in bounds again. You do need to reinitialize DDA whenever you change levels though. I actually just implemented this for octree traversal in my engine and it works great!

@DouglasDwyer 19 күн бұрын

Thanks for watching! I do the same thing in my engine that you describe. What you call reinitializing DDA is what I'm referring to as a ray-cube intersection test, since you use the cube and ray position data to determine the side of closest hit :) Edit: speaking of your engine, can't wait to see your next video!

@frozein 18 күн бұрын

@@DouglasDwyer Ah I see, makes sense. I avoid the ray-cube intersection by subtracting position within the node and multiplying by 2 when descending. I keep a stack of node positions to ascend. It would be slightly more complicated for a 64-tree like you have though.

@yiwmsh4393 6 күн бұрын

The mountain top looks so distant to me, down here at the start of the trail, so I'm very thankful you're posting the pictures you took up there.

@nitaki_ 19 күн бұрын

This is a great video, but you should refrain from using rotating or moving gradients as a background for your explanations. The youtube compression does not like it at all

@franzusgutlus54 19 күн бұрын

You are so right!

@DouglasDwyer 19 күн бұрын

Good suggestion. I was curious to see how it would turn out, but perhaps KZbin's compression butchers it...

@NathanNahrung 19 күн бұрын

Would you see further improvements by expanding on on the beam technique? For example, what if your initial low resolution image was based on the displays aspect ratio? For example, a 16:9 ratio screen would generate a 17x10 square grid of rays (rays at screen edges so grid x & y are +1 the screen aspect ratio) for the first low resolution image, the next image which would start rays further from the camera could have 16x more samples (a 68x40 square grid of rays) as this would force to the next brick layer, and then this pattern could continue until the final resolution image which is at the displays ultimate resolution (including any multiple samples per pixel). Aternativly you could determine the initial low resolution by working backwards from the displays ultimate resolution, rounding up fractions of pixles by overdrawing the field of view for the lesser resolution images, and perhaps going all the way down to a 16 pixel image as the initial low res image.

@linksword7110 19 күн бұрын

holy, i thought you said you were going to reduce quality for more episodes. But this is amazing

@DouglasDwyer 19 күн бұрын

I am not planning to reduce quality for discrete graphics cards. The lower quality will just be an option for the engine to run on iGPUs

@linksword7110 19 күн бұрын

i meant on the videos :p

@DouglasDwyer 19 күн бұрын

@@linksword7110 Oh gotcha haha. I'm glad to hear you thought the quality is still good. This video only took 8 hours total to produce!

@BalintCsala 19 күн бұрын

What I do in my own system to solve the hierarchical issue is to first separate the ray origin into an integer and fractional part. This is mostly to combat precision errors (since you move up and down the scales a ton), instead of getting 23.999 as the hit position I get 24 as the integer part and -0.001 as the fractional one, this proved way less destructive for me. Then what you need is a way to tell how to convert this position from one octree level to a different one. Going down towards smaller cubes is easy, you always go down exactly 1 level. I do this the following way: // The size of one voxel is half as big as before, so fractPos should be doubled fractPos *= 2; // If the fractPos is (1.5, 0.5) that means we should be in the child with coordinates (1, 0) // This can be calculated with a simple step cellOfset = step(1.0, fractPos); // Adjust the integer position accordingly integerPos = integerPos * 2 + ivec2(cellOffset) // And return fractPos to be a in [0, 1] fractPos -= cellOffset; Going up is a bit harder since you might go up multiple levels, but it's still doable, if "levels" is how many levels you want to go up, then it's: // Store the current int pos for later oldIntegerPos = integerPos; // This divides the position by 2^levels integerPos >>= levels; // Bit complicated, this calculates the relative offset between the previous voxel and the (0,0,0) coordinate // of the ancestor we go to fractPos += vec2(oldIntegerPos - (integerPos

@DouglasDwyer 18 күн бұрын

Thanks for the suggestion! I use similar math to go up and down trees in my engine as well. However, the Shadertoy you sent performs what I would call a ray-box intersection test rather than DDA. The DDA algorithm is characterized by the fact that you *don't* recompute side distances for all cardinal ray directions each step. Rather, you update a single side distance each step in the direction that you moved (see this page lodev.org/cgtutor/raycasting.html). The suggested shadertoy does `vec2 dists = -ro * invRd + max(invRd, 0.0)`, which I would call a full ray-box intersection test

@djordjepepic1656 19 күн бұрын

You might want to check out Nvidia's paper 'Raytracing Sparse Voxel Database Structures on the GPU', they outline a hierarchical DDA approach for efficient empty space skipping that you might be able to implement in your engine. You should read the paper for a better explanation, but the basic idea is to do DDA for all levels of the hierarchy at the same time, going down a level on collisions and going up a level when leaving a 'chunk' of the current level.

@DouglasDwyer 19 күн бұрын

Interesting - I'll be sure to check it out :)

@stevegta6830 7 күн бұрын

I've always loved voxel engine graphics. Great work on this :)

@phylliida 18 күн бұрын

Ur videos are so good!! Always excited to see another one

@stormy_vox 19 күн бұрын

your optimizations are really cool! i don’t fully understand the bitmask stuff bit it’s neat

@adricklynn8882 3 күн бұрын

Amazing work! Can't wait to see more

@Flobbled 7 күн бұрын

Really cool stuff! The second optimization was genius!

@simongido565 19 күн бұрын

I do something similar in my engine. I call it compressed and normal cells. Compressed cell is the one that contains subgrid of selected size with the same voxels. The first advantage is that instead of storing for example 16x16x16 voxels I store only one that represents all of them if they are the same. Then about raymarching. I raymarch compressed grid as if it was normal voxel grid. If i get inside cell that contains only one voxel which means it is compressed, I just return color of the voxel that cell holds. If I get inside cell which contains 16x16x16 voxels it means it was not compressed, I start new raymarching that starts where compressed cell raymarching ended and I raymarch voxel subgrid which has dimensions 16x16x16, then if not hit happend I just continue raymarching compressed cells again and so on. It increased performance a lot and memory wise it is also great. I am thinking about implementing multiple levels but for now I am okay with just two levels.

@simongido565 19 күн бұрын

Maybe I could draw a picture if you wanted 😆 it might make more sense then.

@sti3167 18 күн бұрын

kind of like palette compression?

@simongido565 18 күн бұрын

@@sti3167 more like grid compression? I do not know what exactly is pallete compression :D. Imagine grid 256x256x256 and each voxel has size 1.0. So to compress it and traverse faster I take some number by which is 256 divisible. Lets say 8. So I create new grid with dimensions 32x32x32 ( 256 / 8 ) and voxel size 1.0 * 8. Now I go through original grid by checking 8x8x8 subgrids. If that subgrid has uniform voxel I copy in top compressed grid just single voxel instead all of them. If they are not uniform I copy all of the voxels. When raymarching I raymarch compressed grid as if it was regular voxel grid. But once I get inside voxel (cell) which has more voxels inside, I switch to traversing that subgrid with dimensions 8x8x8.

@simongido565 18 күн бұрын

So in the end it is grid of smaller grids. Which I create from denser grid.

@sti3167 18 күн бұрын

@@simongido565 Sounds like you are generating and using LoDs/mips for voxels on multilevel grid(or nested grid, sometimes called brick-map) depending on distance/required details. I would love to see some code of how you generate it/store/traverse, but its up to you : ) P.S Afaik palette compression is about compressing data that belongs to voxels instead of storing it in voxels themselves - for example unique colors or normals etc in a chunk of voxels is being written to a lookup-table(or something like that) and then every voxel indexes into this table instead of directly defining it, that way you can minimize memory overheads using less bits, even without doing quantization/lossy compression. This might improve performance too, because that way you can ensure you are not memory bound and there are less cache misses because it can fit into shared memory, cache lines etc

@MatejVancoCG 17 күн бұрын

Really cool. Thanks for such detailed explanation!

@c0pi 16 күн бұрын

I recognize the place of the demo!! I had a pizza in that place. Nice!

@tommycard4569 19 күн бұрын

I always look forward to your videos, keep up the great work!

@DouglasDwyer 19 күн бұрын

Will do!

@platinumsun4632 9 күн бұрын

Dude I really love the beam optimization. Looks so cool. Like out of an old 2.5d game.

@247flashgames 19 күн бұрын

You are insane, and I love the quality of your videos & algorithms! You explain your strategies well, and I can’t wait to see more videos! ❤

@DouglasDwyer 19 күн бұрын

Thank you :)

@owendavies8227 17 күн бұрын

Wow, the beam optimization is absolutely genius.

@empireempire3545 19 күн бұрын

Okay i'm drunk but hear me out: beam-tree You initially cast a single square 'macro-beam' from the camera and advance it forward as usual until you hit first voxel - this is the root node. The moment you hit the voxel, you split the macro-beam into sub-beams - ideally as few as possible, these are the 1st gen children of the root. The sub beam which hit the intiial voxel gets handled more or less as you do atm with low and high resolution beams. The rest of the camera projection proceeds onwards - but at worst case you still need only 4 macro beams to surround the beam (X) with the voxel which you hit (consider how to divide the area of M's): MMMMM MMXMM MMMMM MMMMM Looking from the side, the macro beams are truncated square-based pyramids, subdividing into smaller truncated square-based pyramids. The idea of course is to reduce the number of beam-steps you have to make. Smaller number of beams means smaller number of steps you would have to calculate - and even if the calculation is slightly more complicated (square vs point) then the reduction of 2 orders of magnitude should more than make up for it.

@RyDawgE 12 күн бұрын

This is AWESOME!!!! Very inspirational

@CSPciccionsstibilepippons 10 күн бұрын

I tried the native demo today: with my Ryzen 3500u integrated GPU it runs at around 50 ms with default settings, but with 2x downscale it runs under 20 ms even with maximum lod bias and I can't even notice the difference in resolution 🎉🎉🎉

@beholdergamedesign 19 күн бұрын

I kind of wonder if we need another name for ray marching determined by grid boundaries or bounding volume intersections vs by distance fields.

@procedupixel213 19 күн бұрын

"Grid traversal" algorithms is the name that I learned for some of these.

@BalintCsala 19 күн бұрын

Ray marching is just the process of stepping the ray forward by some amount instead of using analytical formulas, if you visit the wikipedia page it tells you the name of those two specific variations, sphere tracing for SDF-s and cube-assisted raymarching for voxel traversal, although I personally see that referred to more as "voxel tracing"

@0memega 17 күн бұрын

Before this, I didn't know you could even use DDA for raytracing. As I only used it for raycasting. Amazing video.

@HerrDoktorWeberMD 19 күн бұрын

Doing god's work, stuff I wish I had the spare time to achieve. I just pray you don't pull a John Lin by dropping the most gorgeous voxel world renderer anyone has ever seen and then vanishing.

@DouglasDwyer 19 күн бұрын

Well, I would love to "pull a John Lin" by dropping the most gorgeous voxel world renderer. But I don't intend to disappear!

@hexadeque1101 18 күн бұрын

Here is my DDA on an octree implementation if it is helpful: I have basically a stack data structure holding the current node I'm looking at and its parents so that I can walk up the tree. The stack is initialized to just hold the root octree node. At the beginning of each DDA step in the loop, the current position (rayOrigin + rayDirection*distance) is guaranteed to be within the current octree node on the top of the stack, but the current node is not guaranteed to be a leaf node (the current position is inside one of its children). while (rayDepth < RENDER_DISTANCE) { Get the bottom most node containing the current position (the current node is now guaranteed to be a leaf node containing the current position) Check if the current node is a filled voxel. If so, the ray hit something Step the ray with DDA (the current position is no longer within the bounds of the current node) Walk up the nodes in the stack until the current position is within bounds of the current node (sets up for the first part of the loop for the next iteration) } Hopefully that made sense. If not, my actual code is in the replies.

@hexadeque1101 18 күн бұрын

struct OctreeNode { uint data; // Basically an enum using the following constants uint[8] children; }; // Different materials are not implemented yet so full and empty are the only two states a voxel can be const uint EMPTY = 0; const uint FULL = 1; const uint PARTIAL = 2; // For parent nodes who have some full (or partial) children and some empty children struct Ray { bool hit; uvec3 position; ivec3 normal; float depth; }; layout(std430, binding = 1) buffer World { OctreeNode world[]; }; ... Ray castRay(vec3 rayOrigin, vec3 rayDirection) { ... // Ray box intersection on the size of the octree in case the ray starts outside the octree is omitted ray.position = uvec3(floor(rayOrigin)); ray.depth = 0; vec3 stepSize = vec3( sqrt(1 + (rayDirection.y*rayDirection.y + rayDirection.z*rayDirection.z) / (rayDirection.x*rayDirection.x)), sqrt(1 + (rayDirection.x*rayDirection.x + rayDirection.z*rayDirection.z) / (rayDirection.y*rayDirection.y)), sqrt(1 + (rayDirection.x*rayDirection.x + rayDirection.y*rayDirection.y) / (rayDirection.z*rayDirection.z)) ); ivec3 direction = ivec3(sign(rayDirection)); vec3 rayLength = mix(rayOrigin - ray.position, 1 - rayOrigin + ray.position, step(0.0, rayDirection)) * stepSize; // The stack is stored as an array indexed by currentLevel which is incremented every time we descend down the octree // It stores uints which are indices into the world array uint nodes[OCTREE_LEVELS + 1]; uint currentLevel = OCTREE_LEVELS; // Level 0 is an individual voxel nodes[currentLevel] = 0; // Root node is always index 0 uvec3 currentNodePosition = uvec3(0); while (ray.depth < RENDER_DISTANCE) { while (world[nodes[currentLevel]].data == PARTIAL) { currentLevel--; uint voxelWidth = uint(pow(2, currentLevel)); uvec3 center = currentNodePosition + uvec3(voxelWidth); uvec3 subnode = uvec3(greaterThanEqual(ray.position, center)); currentNodePosition += subnode * voxelWidth; nodes[currentLevel] = world[nodes[currentLevel + 1]].children[subnode.x + subnode.y * 2 + subnode.z * 4]; } if (world[nodes[currentLevel]].data == FULL) { ray.hit = true; return ray; } // DDA step + store hit info for if we hit something // Possible optimization: instead of taking one unit long steps, take steps proportional to the size of the current node. i.e. if the current node is empty and 4 voxels wide, step 4x the distance. I could not get that working at the moment, though, as it brings extra complications. :/ if (rayLength.x < rayLength.y && rayLength.x < rayLength.z) { ray.position.x += direction.x; ray.depth = rayLength.x; rayLength.x += stepSize.x; ray.normal = ivec3(1.0, 0.0, 0.0) * -ivec3(sign(rayDirection)); } else if (rayLength.y < rayLength.z) { ray.position.y += direction.y; ray.depth = rayLength.y; rayLength.y += stepSize.y; ray.normal = ivec3(0.0, 1.0, 0.0) * -ivec3(sign(rayDirection)); } else { ray.position.z += direction.z; ray.depth = rayLength.z; rayLength.z += stepSize.z; ray.normal = ivec3(0.0, 0.0, 1.0) * -ivec3(sign(rayDirection)); } // Bounds check if (ray.position.x < 0 || ray.position.y < 0 || ray.position.z < 0 || ray.position.x >= worldSize || ray.position.y >= worldSize || ray.position.z >= worldSize) { return ray; } // I was lazy and went all the way up to the root node every time and left only going up as far as necessary for "later" currentLevel = OCTREE_LEVELS; nodes[currentLevel] = 0; currentNodePosition = uvec3(0); } return ray; }

@shadow_blader192 19 күн бұрын

Amazing video

@DouglasDwyer 19 күн бұрын

Thank you!

@Z_Z.t 16 күн бұрын

for DDA in different sized pow2 boxes just replace multiplication with bit shifts and integer arithmetics

@WlliamBrown 14 күн бұрын

This looks really awesome! I stumbled across your videos yesterday, and have viewed most of them. I have also been researching voxels with marching cubes, but it destroys precise environment manipulation. Have you considered applying marching cubes only within each of your 8x8x8 voxels?

@DouglasDwyer 14 күн бұрын

Thanks for watching! I prefer the look of smooth cubic voxels, so I'm not planning to integrate marching cubes at this time (it would also make ray traversal more complicated)

@bytesandbikes 19 күн бұрын

Very nice! 😀

@moonshot3159 19 күн бұрын

holey moley this guy's good! Have you also considered similar optimizations that Vercidium (yt channel) did with his voxel engine? Or are those a different beast all together.

@DouglasDwyer 19 күн бұрын

Thanks for watching! I think that most of Vericidium's optimizations are very good, but they mostly apply to rasterizing voxels. Since I'm ray marching, they don't apply.

@thomasmcbride2287 19 күн бұрын

THREE DIFFERENT OPTIMIZATIONS??? YOU’RE A WIZARD!

@DeDaft 19 күн бұрын

That’s amazing progress

@UnofficialFoneE 19 күн бұрын

Yes, this video made me "bricked" 👀

@matanon8454 19 күн бұрын

Its amazing! Is water flow simulation something you consider to add at some point? Would love to dig some rivers and see water flowing 😍

@DouglasDwyer 18 күн бұрын

I haven't studied fluid simulation yet, so unfortunately I probably won't get to it for a while. But I would love to have water like John Lin's!

@Lantalia 19 күн бұрын

Neat! Have you done anything with multiple, unaligned, grids, so you can, for instance, have a moving ship that is itself a voxel grid? It seems like your rendering solution may make that a lot easier than I would normally expect

@DouglasDwyer 18 күн бұрын

While the renderer supports unaligned objects, I haven't re-added voxel entities on the game logic side. The functionality is there - hoping to take advantage of it in a future episode!

@JosephGleespen 16 күн бұрын

@@DouglasDwyer How do you ray trace the unaligned objects, or rather shoot rays through/past the unaligned obejcts? Do you trace them in separate passes and the composite them into a single image? I'm mostly trying to conceptualize how one would perform DDA while encountering things that are off grid.

@victorwidell9751 Күн бұрын

The low resolution pre rendering pass was interesting. Would it help to do it multiple times? I imagine it could be done with each level of an octree.

@Octal_Covers 19 күн бұрын

Which DDA algorithm are you using? Different algorithms have difference performance characteristics so it might be worthwhile to look at some other options. As for stepping through various sized voxels, would it be possible to multiply the amount you add to the ray position? Since it's a line, it should be the same relative positions

@DouglasDwyer 18 күн бұрын

I am using a similar technique to the one described on this page: lodev.org/cgtutor/raycasting.html The problem is that when you step between grid levels, the distance from the ray to the current X/Y/Z planes changes in a non-uniform way. You need to update that distance somehow, and as far as I can tell updating the distance and recomputing it are pretty similar computationally. So every time I switch grid levels, I need to recompute the DDA variables. This is what I refer to as performing a box-intersection test, since it's the same thing mathematically.

@Octal_Covers 18 күн бұрын

@@DouglasDwyer you could attempt updating it each time the voxel size changes, and if it's not as performant as box intersection tests then you could just stick to box intersection tests. The algorithm I'm using for my voxel game is the Fast Voxel Traversal Algorithm, which doesn't use as much trig functions

@guillaumemagniadas2472 19 күн бұрын

When you're speaking of sparse tree, are you speaking of sparse voxel octree ? I wrote an alternative of DDA algorithm that would compute a different step depending of the current level of depth in the SVO

@owendavies8227 17 күн бұрын

I guess for preserving DDA calculations between scales, if you're just going up or down a single level, the numbers would be 4 times greater or less with no other changes, and that could be performed just with bitshifting and nothing else, which would be pretty fast.

@owendavies8227 17 күн бұрын

I guess you could keep track of how many levels you are stepping between and do a bitshift of double the number of difference in levels of hierarchy, but it would probably be more computationally efficient just to do a double bitshift each time so you don't have to keep track of anything.

@DouglasDwyer 17 күн бұрын

Unfortunately, I don't quite think that this works. With DDA, you store the distance along the ray to each plane of the current voxel. But if you move up/down tree levels, the planes may shift by non-uniform amounts. There are perhaps ways to update the plane positions based upon the initial and final voxel, but I don't know of any that are more efficient than just reinitializing DDA from scratch (which is equivalent to a ray-box intersection test).

@owendavies8227 17 күн бұрын

@@DouglasDwyer I am willing to believe I made some kind of mistake.

@owendavies8227 12 күн бұрын

@@DouglasDwyer I don't know if this would necessarily be faster, but you could, if nothing else, theoretically update the plane positions in linear time. If the axes are aligned to the voxel grid (which could be done with a matrix multiplication for the camera and ray just once in the beginning), it would mostly just be addition, subtraction and bitshifts to move the planes around.

@Wizarth 17 күн бұрын

You mention that you ensure the beams never go far enough into the geometry hierarchy to miss a voxel that is smaller than the grid/pixel size. Do you adjust the depth based on distance from camera? I.E. does it go deeper into the hierarchy when closer to the camera?

@DouglasDwyer 17 күн бұрын

Yes, that's correct. The maximum distance that a ray can travel depends upon the size of the voxel (i.e. hierarchy depth) and the distance between adjacent rays. Adjacent rays "spread out" as they progress with a perspective camera, so that needs to be accounted for.

@Z_Z.t 16 күн бұрын

10:48 isnt using simple rasterization with depth gonna give better depth values? (im aware about greed meshing)

@amaryllis0 18 күн бұрын

Game looks great, i gotta second that the rotating backgrounds make me feel nauseous tho

@Harald723 19 күн бұрын

YES MORE...

@steluste 17 күн бұрын

Are you using any engine/framework such as Monogame or it's all from scratch? Also do you have a public repository of the project?

@DouglasDwyer 17 күн бұрын

The project is written entirely from scratch in Rust and uses WebGPU as the graphics API. The project is not open-source at present, but many components of it (like the event system are networking system) are open-source on my GitHub!

@EndroEndro 19 күн бұрын

Sadly i'm getting error panicked at 618 in web-d2eddb7ca2dff036 js file (Could not load GPU adapter) and 602:21 in opera other browsers and exe the same issue.

@DouglasDwyer 19 күн бұрын

Thanks for trying the demo! It sounds like your computer doesn't support Vulkan or WebGPU. I will add an error message in the future for unsupported devices.

@Deniil2000 17 күн бұрын

you mentioned u32 and u64 in your video. are you writing this in Rust?

@DouglasDwyer 17 күн бұрын

The project is written entirely in Rust and uses WebGPU as the graphics API.

@lumiey 19 күн бұрын

There is an LOD 'leaking'? artifact when I fly a bit high up

@DouglasDwyer 19 күн бұрын

Yep, the LOD generation isn't perfect. The LOD leaking shouldn't happen as much with procedurally generated terrain and other solid models. It happens because the church has a lot of thin surfaces and so the engine isn't sure which surface to put on top when generating the higher LODs.

@lumiey 19 күн бұрын

It runs super smooth on my PC, but I can't see how fast it actually runs because I can't disable vsync in online demo

@lumiey 19 күн бұрын

My GTX 1060 can handle 1500x1500 resolution at 60fps (2.25 mega pixels (1080p is 2.0736MP), there is no performance difference between web/native

@shadow_blader192 19 күн бұрын

1:55 isn't it called raycasting?

@sjoerdev 19 күн бұрын

raymarching is raycasting in small steps

@shadow_blader192 19 күн бұрын

@@sjoerdev raymarching is raycasting based on minimum distance to objects?

@sjoerdev 19 күн бұрын

@@shadow_blader192 thats sphere tracing which is a form of raymarching

@shadow_blader192 19 күн бұрын

@@sjoerdev oh those names

@DouglasDwyer 19 күн бұрын

No idea, everyone seems to have their own names for these algorithms lol

@Midazc 19 күн бұрын

Did you slow down the audio track? Sounds much more natural on 1.25x

@DouglasDwyer 19 күн бұрын

Speaking slowly and clearly is important for public speaking, so that's what I aim to do. But you are more than welcome to enjoy the video at 1.25x if you'd like!

@R.Daneel 18 күн бұрын

Wow. KZbin compression isn't doing your grey background any favours! I can't tell from the video. Have to fallen down the SIMD rabbit-hole yet?

@paxcoder 18 күн бұрын

So down to 60ms? But it would have to be ⌊1000/60⌋ = 16ms to run at 60fps right?

@DouglasDwyer 18 күн бұрын

Yep! As noted in the video, the benchmarks presented were on a very bad GPU (Intel UHD 750H). On any discrete card, the engine runs at a buttery-smooth 60 FPS - for example, the scenes in the video run at 4 ms (or 250 FPS) on my NVIDIA 1660 TI. It's possible to make the game run in realtime on the Intel GPU as well by lowering the resolution. But for benchmarking I wanted the frame times to be as big as possible, so I chose a worst case.

@paxcoder 18 күн бұрын

@@DouglasDwyer Too bad a discrete card is a requirement, as I'm afraid I only use integrated cards, but that is a whole lot of voxels there, so I guess it makes sense.

@DouglasDwyer 18 күн бұрын

Discrete card is only a requirement for 1080p. If you play the game at a lower resolution, you can definitely hit 60 FPS on an iGPU. But don't take my word for it - try the demo and evaluate for yourself! There is an option in the settings menu to downscale the image resolution for better performance :)

@xezvcikgames1191 18 күн бұрын

@@DouglasDwyer can you please show in pseudo-code how you do ensure low res wont step too far 11:06 ? i didnt quite get it, also what it would be like to trace more coarse lods in pre-pass, for example on CPU side? still could avoid a lot of traversal iterations. How do you even make sure its done in conservative manner? wont it require supercover dda or smth, hard to imagine tbh

@StylishHobo 18 күн бұрын

Rather than your custom data structure. Have you considered using a k-d tree instead?

@jamesoneill4607 19 күн бұрын

Fewer rays

@mohammadazad8350 19 күн бұрын

Premier!

@mohammadazad8350 19 күн бұрын

So apparently commenting "First" in French earned me a heart... my studying is paying off!

@timmygilbert4102 18 күн бұрын

@@mohammadazad8350je suis là pour tester ton français 👀 voyons voir si ça paye tant que ça 😂

@mohammadazad8350 18 күн бұрын

@@timmygilbert4102 "I am ??? for test your (familiar) French. Let's see see if that pays ??? that that" By interpolating, we get: "I am going to test your French, let's see if it paid off". I can't believe a KZbin comment just gave an exam on something I do for fun :).

@mohammadazad8350 18 күн бұрын

*gave me Also, I genuinely had to resist the urge to look up "tant" in the dictionary.

@timmygilbert4102 17 күн бұрын

@@mohammadazad8350 good job 👍😁

@beholdergamedesign 19 күн бұрын

A voxel brick has 8 64 bit occupancy masks?

@DouglasDwyer 19 күн бұрын

Since a voxel brick has 4^3 = 64 voxels, with one bit per voxel, each brick has 1 64-bit occupancy mask.

@gsestream 19 күн бұрын

try to render your engine nightmare scenario, ie the worst case geometry. multiplication is always cheap operation. division might not be, but will usually be also cheap on modern hardware. no op required is better than have to do op tho. also if you use sphere collision volume overlapping, then you need only one test per cell, not 6 for all sides of a cube cell plane boundaries. but you are testing coordinate plane boundaries for the DDA universally I think. so instead of storing stuff in the data structure, why not use the 64-bit as a pointer to the voxel data, which could be anything, not limited to the 64-bits. pointer to data structure is very fast. some optimizations are not worth it and make the logic a mess. order of magnitude optimizations should be the focus always. oh you try to compress the data using bit storage. if you make the assumption of large empty areas, then you can compress the data with RLE ones and zeros (voxel no voxel) for traces. so you are not using BVH to traverse top-down the voxel tree, that would compress the data also. bvh top-down traversal will work like DDA but processes the tree in the occupancy order, and stops if there is no stuff voxels in the tree leaf. usually you can just do the old style draw distance capping to limit amount of voxels drawn, even if you have billions. the old trick. even when doing true 3d traces. ie only render closest to a draw distance, even for lighting bounce traces. less is more. to get an idea of actual performance, try 4K resolution and see how badly it chucks or not. making an off-line renderer to test any features at high res and lighting rendering gives directions. you know what to do. maybe use an off-line pre-calc sdf for the ray marches, at least for static parts of the scene. maybe separately test dynamic meshes. instead of complex hit detection, you would just raster all voxels. as triangles. difference is not significant in general, not in the order of magnitude margin range. try rendering a voxel cloud, which is spongy ultra sparse voxel mesh. sdf can be updated on fly locally where voxel mesh edits happen.