Raytraced fluid simulation on Intel Arc A750 GPU

Рет қаралды 2,180

Dr. Moritz Lehmann

Күн бұрын

Пікірлер: 35

@III_three Жыл бұрын

I can't even fathom the work you have put into coding FluidX3D. And you even made it open souce!

@ProjectPhysX Жыл бұрын

A lot! I have to correct you: FluidX3D is not open-source, but source-available. My license is very similar to open-source (free to use, public source code, redistribution allowed etc.), with one small difference: I don't allow commercial and/or military use. Such limitations on usage are not formally compatible with the open-source definition.

@III_three Жыл бұрын

@@ProjectPhysXYeah I did see that when I was on your github page. Needless to say, it's still very awesome that you made the source code available.👍

@oskarelmgren Жыл бұрын

Awesome to have more videos again! :)

@MenkoDany Жыл бұрын

God bless you for putting up with intel's nonsense. I had heard that initially the performance was terrible, has it gotten better?

@ProjectPhysX Жыл бұрын

Haha, I work at Intel :) Arc had 2 problems in the beginning: game performance for DX9 and DX11 titles was awful, and there was a lot of bugs/crashes. Both are fixed by now with much better drivers, and today the cards actually deliver the value they promised. If you face any issues still, let me know!

@MenkoDany Жыл бұрын

@@ProjectPhysX No I meant specifically OpenCL

@MenkoDany Жыл бұрын

I remember there was an issue like a year ago where opencl performance was like half of what it should be

@ProjectPhysX Жыл бұрын

@@MenkoDany in the beginning it was plagued with bugs and crashes, but now it's quite stable. In memory-bound OpenCL applications like FluidX3D, performance of the A750 is similar to an RTX 4070.

@MenkoDany Жыл бұрын

@@ProjectPhysX I'm a dev not a gamer :D That's really cool. I hope Intel pulls through. Or at least that Battlemage won't be a dud. How's it like working at intel? I hope they pay you well, you're extremely extraordinary. I'd love to see a video with more math deepdive on this project

@matorsoni45 Жыл бұрын

My friend, your work is beyond amazing. As a physicist about to start my masters degree in CS, I'm getting so much inspiration from this project. If you were to recommend books for someone interested in building a simulation system like this, which would you choose? From system design, algorithms and data structures for massive parallel workloads, to the physical models and 3D rendering.

@ProjectPhysX Жыл бұрын

I didn't actually read too many books about all this. At a certain stage, the only source for further information is research papers, and you have to puzzle together the information and fill in the missing gaps yourself. The only data structure you need in GPU computing is 1D arrays with Structure-of-Arrays data layout, the others (linked list etc.) are a million times slower. Some starting resources for you: - Book: OpenCL Programming Guide ptgmedia.pearsoncmg.com/images/9780321749642/samplepages/0321749642.pdf - Book: The Lattice Boltzmann Method: link.springer.com/book/10.1007/978-3-319-44649-3 - Video: How do Video Game Graphics Work? kzbin.info/www/bejne/eWm8pZd5bdKrirc - Slides: Memory Coalescing Techniques: homepages.math.uic.edu/~jan/mcs572/memory_coalescing.pdf - Website for searching research papers: Google Scholar: scholar.google.com/ - To start with OpenCL programming: github.com/ProjectPhysX/OpenCL-Wrapper Good luck with your studies!

@zelinzhao2863 Жыл бұрын

Have you considered further adding the function of gas-liquid two-phase (boiling, condensation, etc.), which is currently single-phase.

@ProjectPhysX Жыл бұрын

I've experimented with that during my Master's and PhD, it's called Shan-Chen or phase-field model. I've even coauthored a detailed comparison study between phase-field and Volume-of-Fluid LBM (what FluidX3D uses): www.researchgate.net/publication/361502271_Comparison_of_free_surface_and_conservative_Allen-Cahn_phase_field_lattice_Boltzmann_method TLDR: Phase-field LBM computes both fluid and gas phases, can do boiling/condensation, and can handle splitting and merging of bubbles easily. VoF-LBM only computes the fluid phase and treats the gas phase as vacuum. Handling bubble splitting/merging with VoF is extraordinarily difficult and inefficient. Why choose VoF then? In phase-field LBM, the fluid-gas interface is diffuse, about 3 cells thick. Im VoF the interface is sharp, always 1 cell thick. To resolve a droplet, phase-field needs 3³ as many cells. And per cell, the memory demand is double (extra memory for the phase-field). So phase-field needs ~54x more VRAM!

@Sean74218 8 ай бұрын

Can you please do SpaceX Starship upper stage, reentry profile simulation please?

@ProjectPhysX 8 ай бұрын

Here you go: kzbin.info/www/bejne/jnzLlmCodq90apo

@salmiakki5638 Жыл бұрын

Question: is there a way for non-developers (i.e. not having access to the source code) to know if a specific workflow is compute/memory bound on the GPU (and conversely, single thereade/multi thread/ cache / etc bound on the CPU )?

@ProjectPhysX Жыл бұрын

Yes: You measure both utilization of the GPU core and GPU memory interface over time. If bandwidth utilization is close to 100%, the application on this particular GPU runs memory-bound, otherwise compute-bound. Unfortunately, no third-party tool out there can do this. All the common tools (Windows Task-Manager, GPU-Z, HWInfo, Precision X1, ...) only measure GPU core utilization, and Windows Task-Manager does that only super inaccurate. But there is hope: coincidentally, I've been writing such a tool since a few months: mast.hpc.social/@ProjectPhysX/111747962627994603 The first-party interfaces from Intel (IGCL) and Nvidia (NVML) allow you to read out bandwidth utilization, and you can then show how much GB/s an application is pulling. I'm now onto AMD support for Windows and Linux. For CPU workloads, you can already check per-core utilization in Windows Task-Manager or htop on Linux, to see if an application utilities only one core at 100% for a sustained time, or if load is balanced across multiple cores. There is no tools/interfaces for measuring utilization of CPU memory bandwidth though, so you're in the dark over memory/compute-boundness. But al most all software on CPU runs highly memory-bound. And ideally, for the most accurate assessment, you run a profiler over the assembly of the algorithm to count arithmetic operations and memory transfers. The Flops/Byte ratio (=arithmetic intensity) tells you for any new hardware specs (GFLOPs/s / GB/s) if it will run compute- or memory-bound.

@loona5530 Жыл бұрын

How would I go about using a multi gpu setup for rendering?

@ProjectPhysX Жыл бұрын

FluidX3D can do that, distribute both simulation and rendering across multiple GPUs via domain decomposition. You can even "SLI" together Nvidia + AMD + Intel cards, as long as they have similar VRAM capacity and bandwidth. To do this, tell the LBM constructor to use a grid of domains, like LBM lbm(512u, 128u, 256u, 4u, 1u, 2u, ...); The first 3 numbers are total grid resolution, and the next three numbers mean you split the box in 4x1x2 = 8 equal domains across 8 GPUs. If you have at least 8 identical GPUs installed, bin/FluidX3D will automatically assign them to domains. If you have different GPUs, run bin/FluidX3D 0 1 2 3 4 5 6 7 with the numbers being the OpenCL device IDs. You can also as sign one device for several domains. How multi-GPU simulation works: The big simulation box is split up in equally sized domains. Each GPU holds 1 domain in memory, and cannot see the others. At the boundaries where neighboring domains touch, a small around of data is copied between GPUs via PCIe in every time step. How multi-GPU rendering works: Each GPU knows only its own domain, and also renders o my its own domain. The domain is shifted by the 3D offset of where it is located in the big box, then rendered according to where the camera is positioned. Each GPU generates one frame, only showing a part of the whole simulation box. All frames are copied to the CPU, and are overlayed by the CPU pixel-by-pixel, based on which frame's pixel is closest to the camera. This information is provided by z-buffer accompanying the frames.

@zenith360 Жыл бұрын

hes back at it bois! also, would this work well with an rtx 4060 oc?

@ProjectPhysX Жыл бұрын

It works on a 4060, but it will be slow. The 4060 has a super slow 128-bit memory interface, usually only to be found on cheap 50-class cards. The Nvidia Ada generation in general has moved 5 years backward regarding the memory bandwidth.

@username8644 Жыл бұрын

@@ProjectPhysXGlad someone is finally talking about that. A rtx 3060 12gb has the same memory bus width as an rtx 4070 ti, at 192 bit. Meanwhile a 16gb 4060 ti only has a 128gb bus width. It doesn't even make sense.

@zenith360 Жыл бұрын

@@username8644 😭 ok

@imstupidbut1356 Жыл бұрын

how do i download it?

@ProjectPhysX Жыл бұрын

Here is the download link: github.com/ProjectPhysX/FluidX3D/archive/refs/heads/master.zip And here the instructions for how to get started: github.com/ProjectPhysX/FluidX3D/blob/master/DOCUMENTATION.md Have fun!!

@imstupidbut1356 Жыл бұрын

Thank you!

@deltachanger714 8 ай бұрын

Hey man when im tyring your sim for some reason this line of code. lbm.graphics.visualization_modes = VIS_FLAG_LATTICE|VIS_FLAG_SURFACE|VIS_Q_CRITERION;. is giving me errors say that the lbm class has no graphics member. i using vs 2022 maybe thats why i will downgrade to 2019 and see then

@ProjectPhysX 8 ай бұрын

Edit src/defines.hpp, comment out //#define BENCHMARK and uncomment #define INTERACTIVE_GRAPHICS More details here: github.com/ProjectPhysX/FluidX3D/blob/master/DOCUMENTATION.md#3-go-through-sample-setups