Good morning sir, great work on all of this. I'm a hobbyist in doing various scientific simulations in my free time to learn. I had a basic question (i think, haha): it seems you focus on leveraging FP32 and FP16 performance of GPUs (based on the names of the binaries files on your github). Can you explain if you have any plans to try and perform these sims using the FP64 capability of GPUs? I ask this because an old GPU I have ordered has an oddly tremendous FP64 TFLOPS value, even compared to today's highest consumer flagship. I also know FP64 is used in some real applications where precision really matters. It would be great if there is a method to tap into that performance!
@ProjectPhysX10 сағат бұрын
FluidX3D does all the math in FP32 (that is supported with high performance on all GPUs) and for memory storage can use either FP32 or FP16. FP16 storage means 1.8x the grid cells fit into the same VRAM capacity, and simulations run 2x as fast. This doesn't even come with loss of accuracy as the truncated digits are just non-physical noise anyways. I did try FP64 for math + memory storage too. It blows up the memory footprint by 4x, and makes runtime 4-10x slower, for no benefit in accuracy at all. Read more here: www.researchgate.net/publication/362275548_Accuracy_and_performance_of_the_lattice_Boltzmann_method_with_64-bit_32-bit_and_customized_16-bit_number_formats That numerical behavior - low-precision being sufficient - is quite fortunate as it makes LBM a lot more accessible on cheap gaming GPUs that can't really do FP64 and have small VRAM capacity. The modern FP64-capable GPUs (V100/A100/H100/MI200/MI300) are all super duper expensive.
@lab00042 күн бұрын
hey! very cool software, i'm glad you made it open source, otherwise i'd still be looking for hours on a fast and free cfd software haha i don't know if you still read the comments on this video but in case you still do i have a question: i've been using the software for the past few days and have been messing with the settings here and there, how does the VRAM occupation work? i'm using a laptop rtx 3070 with 8gb vram but everytime i run a simulation and compare simulations run on 100mb vram and 1000mb vram the simulation speed seems to slow down even though my framerate is relatively smooth on both, is it due to the simulation needing more processing power from the GPU and is completely seperate from the framerate? i'm a complete noob on cfd and simulation stuff by the way haha so i'm sorry if i said some dumb things, i came across the software and wanted to see how the aerodynamic of my car models would perform
@ProjectPhysX2 күн бұрын
It's actually source-available, not open-source. This is very similar to open-source, with the difference that commercial and/or military use are excluded. Larger VRAM occupation means more grid cells to compute, meaning the simulation will take disproportunately longer: runtime ~ (VRAM occupation)^(4/3) as you also need more time steps at larger resolution. The rendering at some point also will get slower but frametime is not exactly proportional to cell count.
@lab0004Күн бұрын
@@ProjectPhysX I see, thanks for the explanation! Also I forgot but I wanted to also ask about how to visualize air pressure because so far i've only been able to switch between air velocity and air density on the field visualization mode.
@therealdeemz_5 күн бұрын
Hahaha! Came from your comment on Iceberg Tech's Titan Xp video! Glad to see it was put to great use in its time
@copilotlover6 күн бұрын
how we do it with your github code
@LGDUDE6 күн бұрын
First of the month
@adrianpop39279 күн бұрын
I managed to achieve the same thing.. there are bugs but it's better than black and white
@btbb372614 күн бұрын
Mooooove out of my way!
@DeafMan198315 күн бұрын
Nice OpenCL 1.1, correctly? I would like to understand how do i use getting started with OpenCL. I use on Ubuntu 24.04.1 but it works fine since i tested triangle.
@ProjectPhysX14 күн бұрын
OpenCL 1.2, but it's backward-compatible with 1.1. Here is my OpenCL starter pack: * OpenCL-Wrapper to remove the OpenCL API boilerplate code and patch device-specific driver bugs: github.com/ProjectPhysX/OpenCL-Wrapper * Technical Talk: OpenCL GPU Programming for HPC Applications: kzbin.info/www/bejne/rWWrdqqapcmHpNU * Catalogue of OpenCL C built-in functions: www.khronos.org/files/opencl30-reference-guide.pdf * Book: OpenCL Programming Guide ptgmedia.pearsoncmg.com/images/9780321749642/samplepages/0321749642.pdf * Technical Talk: Combined scientific CFD simulation and interactive raytracing with OpenCL: kzbin.info/www/bejne/pnWbe4p3j5eZbtE
@DeafMan198314 күн бұрын
@ProjectPhysX wow excellent explanation. I will develop it. I am very happy because my graphic card RX 480 8GB is quite old and hasn't problems with OpenCL 1.1. Thank you for saving my old graphic card because I don't want to waste old graphic card. It works fine for Vulkan 1.1 and OpenGL 4.6. 👍💪
@xcloudx01alt19 күн бұрын
"How much vram do you have?" "yes"
@exhilex24 күн бұрын
device? (cpu and gpu); resolution??
@ProjectPhysX24 күн бұрын
Intel i7-8700K and Nvidia Titan Xp, 1080p
@marcoomau26 күн бұрын
You need to consider that when thrown by a catapult the position relative to the direction may vary.
@lucamagni9928 күн бұрын
Congratulations, your work on this project gets better and better! May I know how soon a nice tutorial on configuration and use will be available?
@ProjectPhysX27 күн бұрын
@@lucamagni99 thanks! Please see here: github.com/ProjectPhysX/FluidX3D/blob/master/DOCUMENTATION.md
@NewtonInDaHouseYoАй бұрын
Dear Moritz. Would it be possible to simulate an aircraft (Re 10^7) with reasonable fidelity in real time on a modern GPU like a RTX4090?
@ProjectPhysXАй бұрын
@@NewtonInDaHouseYo yes, at ~centimeter scale resolved. It's good enough for bulk aerodynamics but cannot resolve the turbulent boundary layer.
@hoangminhdo1259Ай бұрын
i want to ask about how it compare to 4060, 4060ti 16gb ,3070 .
Do you use immersed boundary method or body fitted mesh?
@ProjectPhysX23 күн бұрын
Mid-grid bounce-back boundaries
@bjornfischer3614Ай бұрын
Hallo Moritz, ich versuche mich aktuell auch etwas an Simulationen. Ich rendere meine Daten mit OpenCl Funktionen direkt auf der GPU, habe aber noch keine Lösung gefunden, wie ich mein fertiges uchar Array mit RGB Werten tatsächlich auf den Bildschirm bekomme. Aktuell versuche ich es mit OpenGl, aber das führt bis her nur zu Blackscreens und Neustarts. Du meintest in dem Vortrag 2022, dass du die Bilder an den CPU zurückschickst und dann anzeigst. Dabei stellen sich mir zwei Fragen. Wie zeigst du das Bild dann vom CPU aus an ? Und wäre es nicht schneller das Bild auf dem GPU zu lassen und irgendwie von dort aus anzuzeigen ? Ich meine wenn ich ein Bild anzeigen möchte läuft das doch sowiso über die GPU, oder ? Ich habe schon versucht aus deinem Code schlau zu werden, aber mit meiner begranzten Erfahrung finde ich mich da ehrlich gesagt nicht zurecht. Wäre super, wenn du mir da einen kleinen Tipp geben könntest. Viele Grüße Björn
@ProjectPhysXАй бұрын
Servus Björn, das rauszufinden hat mich auch viel Nerven gekostet. Man braucht letztendlich externe Bibliotheken - <windows.h> auf Windows und <Xlib.h> auf Linux - um das Bild mit SetBitmapBits()+BitBlt() bzw. XPutImage() auf den Bildschirm zu zeichnen. Drum herum braucht es noch einen Haufen Gedöns, um ein Vollbild-Fenster zu erstellen, in das man das Bild dann gemalt bekommt, und für Maus- und Tastatur-Input. Findest du als Minimal-Implementierung alles hier: github.com/ProjectPhysX/FluidX3D/blob/master/src/graphics.cpp#L440-L749 Die externen Bibliotheken könnten noch mehr, z.B. Linien und Dreiecke zeichnen, das hab ich aber alles selbst implementiert, auch auf CPU Seite, damit es schneller läuft (bzw. denn überhaupt gescheit funktioniert) und ich für Windows/Linux nicht alles doppelt machen muss. Das einzige was ich aus den Bibliotheken nutze ist wirklich die Bitmap zu zeichnen und Maus-/Tastatur-Input. Das Bild über OpenCL-OpenGL interoperability direkt aus dem VRAM auf den Bildschirm zu zeichnen wäre vermutlich etwas schnellere Latenz, spielt aber letztendlich keine Rolle weil es so schon schnell genug ist. Vorteil ist, dass dann das Rendern auch klappt wenn man den Bildschirm über die iGPU betreibt und auf einer dGPU rechnen/rendern lässt. Und das funktioniert auch mit z.B. einer A100 die selbst gar keine Display Anschlüsse hat. Die X11 Implementierung hinzubekommen war besonders grässlich, weil X11 segfaultet bei jeder Gelegenheit, z.B. wenn man einen Tastendruck mit XNextEvent() abfragt während gerade der frame gezeichnet wird. Dafür braucht es extra XLockDisplay()/XUnlockDisplay(). Und für unterschiedliche Tastaturlayouts muss man eine extra X11-Erweiterung laden. Um Größe und Offset des primären Monitors bei mehreren Bildschirmen rauszufinden braucht es auch eine extra Erweiterung. Und nichts ist ordentlich dokumentiert... VG Moritz
@bjornfischer3614Ай бұрын
@@ProjectPhysX Vielen Dank für die Antwort. Das werde ich auf jeden fall mal probieren.
@pawelczubinski6413Ай бұрын
what a setup lol, your work is very impressive!
@fadoobabadplАй бұрын
How is PyTorch performance for training physics informed neural networks on B580?
@PhenxАй бұрын
How does the code perform on a “normal” computer (Like a $4k xeon workstation, or even a ryzen gamer pc - a computer a undergrad/grad student could afford) compared to the traditional methods? Lets say, a 15-20 million cell model of a car
@ProjectPhysXАй бұрын
@@Phenx it's around 2000x faster, 15-20M cells complete in seconds, can watch that real time. Example: kzbin.info/www/bejne/aIOVkoWFZrKUp6M
@chaos3088Ай бұрын
Is your software only meant to assist rendering? What other function it has?
@ProjectPhysXАй бұрын
@@chaos3088 it's a physically accurate computational fluid dynamics software, for research and engineering, with a huge range of applications from microfluidics to aircraft aerodynamics. The rendering is just a small part of it.
@chaos3088Ай бұрын
@ProjectPhysX that's really cool! So it's more meant for simulation studies eh?
@pierro281279Ай бұрын
Do you know how pci express speed affect performance ? Because there's a lot of cheap used crypto mining computers with a lot of GPUs but in 1x.
@ProjectPhysXАй бұрын
Don't use these, PCIe x1 is too slow. Should be at least PCIe 3.0 x4.
@tetraktys6540Ай бұрын
Cool. Can you run a known model side by side to see similarities? Automotive?
@engineeringwithsyauqyАй бұрын
Dr Lehman, maybe im start to planning to make gui for your cfd code
@RAYY_WILDАй бұрын
bro calls me broke in 3 different languages, love the sim tho 👍
@RafaGmodАй бұрын
its so good to not rely entirely on CUDA!!
@stimpyfeelinitАй бұрын
is that a wheelchair or something
@ProjectPhysXАй бұрын
Santa's sleigh with some X-wing and laser cannon modifications. Merry Christmas! The CAD model is from Zannyth / Kevin Piper: www.thingiverse.com/thing:2632246/files
@hannosoloАй бұрын
@@ProjectPhysX Are the laser cannons for the naughty?
@O-cDxAАй бұрын
Dr. Lehman, please consider creating an easy to use GUI, or a port to Blender so that those that lack your intelligence can use your gift to the world ! I have tried over and over, but get stumped at a certain point. Happy Christmas !
@TheDarkrider551Ай бұрын
still waiting for the basic tutorial for those of us that cant code, just want a basic setup with walkthrough, and maybe some examples for moving parts or getting/extracting forces. Just some basic functionality from the this for us normies... :(
@jeeves-2Ай бұрын
Would you be able to combine an integrated GPU with a discrete GPU? Amazing to see this, hope you get a few MI300 under your Christmas tree!
@ProjectPhysXАй бұрын
Yes, I actually tested this with (Intel UHD 630 + Titan Xp) and it worked. But the slower iGPU is the bottleneck then and it all will be super slow. With faster iGPUs in the future, pairing with an older dGPU could be a viabnle option. What I also tried was pairing CPU and iGPU. This also works, but is not practical as it's slower than using either one alone. CPU+iGPU share the same memory interface and only slow each other down when they fight for bandwidth.
@_ThriАй бұрын
Haters will say this isn't what Santa's sleigh looks like
@leerman22Ай бұрын
The only good Netflix adaptation.
@RobMiami787Ай бұрын
767-400 has larger windows than the 763, close to or equal to 777 size windows Also has a 'glass cockpit' The plane is also 18" taller the 763's And finally the distinctive wingtip, which we see later in the 777
@RobMiami787Ай бұрын
Very nice video I flew MUC EWR on a UA 764 and IAH MUC UA 764 -also ATL AMS DL 764 BCRF livery One of my favorite planes tbh 2x seating on the windows is perfect
@andymaurice7693Ай бұрын
Looks like a 787 wing tip to me...
@detriticАй бұрын
Why does the Munich airport have a ziggurat
@principedelapaz3593Ай бұрын
i can test this on my intel i3 cpu ?💀💀
@ProjectPhysXАй бұрын
Yes! You'll need to install the OpenCL CPU Runtime for that, see here: github.com/ProjectPhysX/FluidX3D/blob/master/DOCUMENTATION.md#0-install-gpu-drivers-and-opencl-runtime
@rickross5299Ай бұрын
is this a simulation or actual footage? It looks so real.
@SphaeroXАй бұрын
I'm waiting for the quantum computer age, then you could also simulate this including interactions with the sun etc. 😍
@PatrickButcherineАй бұрын
I don't know if you have contacts at the company's you get the various processors you use, anyone who could actually build custom prototype processors and mother boards, but I think I have a way to get faster processors which means faster processing times... SUPER SILVER.... Some university created that alloy a while back, I want to alloy it with copper and reinvent the wire with a hexagonal strand of super silver/copper alloy instead of the normal round strand of copper. The electric travels on the outside of the wire so a hexagonal shape will have more area in the same gauge... Super silver is a super conductor... CRYO whatever you make out of the new super conducting alloy so the molecules align to add to the efficiency of transferring power. On the PCB, replace the copper traces with the new alloy, increase the thickness slightly so the trace can have a half hexagon shape, more area for electrons to flow on. Super silver is super conducting so a hexagonal wire made of it should be able to flow a denser signal way faster than normal 99.9% oxygen free round wire can.... In theory anyways... But I think it will work. I'm usually never wrong with this type of stuff. I just wanted stronger solder joints and was looking into silver solder and found SUPER SILVER.... if the processors internals were made with a super silver alloy then you'd have super fast processors. I really wanna make black glass OC81D germanium transistors with an alloy, either just the leads or alloy some internal parts... Can it be added as impurity to the wafer so I can have some new flavors of fuzz for my guitar??. I really feel I have a stock option plus cash type idea here with this whole SUPER SILVER ALLOY. I'm just a nobody so no "business" people will listen to me and take me seriously. Super conducting is just that... Conducts shit super quick... Hence the word super conductor. I feel you have to cryo whatever is allowed so every single molecule in it aligns perfectly straight... It's like a super highway for electrons. Kind of like greasing a slip n slid before you slide down it.... I play guitar so I really wanna remake tube innards with new super conducting alloys for new flavors of tone. The turrets need alloyed, all component leads need reinvented with super conducting leads already built into them. I feel the quantum computing world would really benefit from it... What if they alloyed all the magnets at CERN... Each one has tons of PCBs and wire connecting it all.... Imagine making just the wire super conducting. Then imagine if the wire/PCBs AND THE SOLDER WAS SUPER CONDUCTING...... You got any contacts at Intel I can talk too?? I think, at the very least,,,, I'm really on to something with hexagonal stranded super silver/copper alloy wire...
@ProjectPhysXАй бұрын
@@PatrickButcherine this is just nonsense. Hexagonal shape won't make cables better conduncting, just harder to manufacture and more expensive. Wires are typically already many strands of copper rather than a single strand. There is a very good reason cables are copper and not copper-silver alloy: silver is super expensive, and an alloy would be a lot more brittle and even a worse electrical conductor. And room-temperature superconductors don't exist.
@PatrickButcherineАй бұрын
@ProjectPhysX the super silver alloy is like 1000 times stronger than normal silver... Hence the name.... SUPER SILVER. and it's not nonsense, that's a lack of common sense and sheepism, meaning youll follow the herd even if they're jumping off a bridge to their death. Electrons flow on the outside of the wire, they don't flow inside the wire. So more surface area per gauge and if alloyed with a super strong super conducting material that in theory means it should conduct a more dense signal faster than normal 99.9% oxygen free copper. Yes, they do shit a certain way for a reason but electricity has never ever been so pure. All metals that made anything went thru a less pure refining process, the electricity back in the 20s wasn't like todays electricity, we need a super conducting wire, super conducting traces, super conducting solder.... Super conducting tube/transistor/IC innards.... The nonsense is that you ignorantly and blondly follow the establishment and you don't think for yourself, everything they've ever told us was theoretically wrong. Troy exists... Witches are real life flesh and blood beings... That's why the pope HITLERED all of them I'm rivers by tying chains and rock to women's feet, if she floated she wasn't a witch.... EVERY ONE FUCKING DROWNED BRO, NOT ONE HUMAN FLOATED WITH HUNDREDS OF POUNDS OF ROCK TIED TO THEM.... THEY USED THE SCIENCES AGAINST THE ONES THEIR IMIGINARY GOD SUPPOSEDLY LOVES SO FUCKING MUCH.... WAKE UP BRO
@toothrottingkandyАй бұрын
this might be your most ambitious simulation yet
@Jay-sr8geАй бұрын
Thost turns to align the plane for final approach are so aggressive
@kotukuwhakapiko467Ай бұрын
its faaaaat flllatttt
@rasimbotАй бұрын
What does coloring represent?
@ProjectPhysXАй бұрын
Velocity magnitude - red is faster, blue is slower.
@xn_edits2779Ай бұрын
I wish someone could implement this in Blender (3D software) for creative projects...
@chaos3088Ай бұрын
Love your work n your triple GPU setup!
@browskieАй бұрын
HI , Can this cfd do conjectate heat transfer analysis ?
@dickmason100Ай бұрын
What motherboard are you using?
@ProjectPhysXАй бұрын
Not sure what manufacturer the mainboard is, but this is what the server looks like: kzbin.info/www/bejne/p3mYlIxolpt_Y5o
@AndyRRR0791Ай бұрын
Hey Moritz, how do you get the rotors to revoxelize without the bounding boxes overlapping and destroying the fuselage/body geometry?
@ProjectPhysXАй бұрын
@@AndyRRR0791 tricky problem with pragmatic solution: only voxels within the bounding box with the same velocity in current as in previous voxelization step are cleared. That solution works even for cases like counter-rotating gears.
@AndyRRR0791Ай бұрын
@@ProjectPhysX Thanks for the response, Moritz. I am having the problem of when operating at a large AoA, my propellers create a very large bounding box which impacts over the (also larger) fuselage bounding box. I can't even get them to the start position of the voxelization without them intersecting. Your rotors here seem quite angled too so I wondered how you got it to work here. I was curious whether there were any bits left in the cell memory that could be used to flag the ownership of each cell to determine whether to overwrite or flip as necessary? Haven't got my hands dirty in the code yet.
@AndyRRR0791Ай бұрын
I just figured out that since I'm loading both models to test currently as stationary then your little trick you describe above may not be working. I will have to adjust things and test.
@hollyhall1429Ай бұрын
I can hear that cpu scream from here...😂
@renejams4124Ай бұрын
missed that ● 300 km/h airspeed, 10° angle of attack ● Reynolds number = 51M, y+ = 538 /// your fucking probe dynamic data is missing also no surface interaction or the lack of specified are the engines ON or OFF / is this a wind tunnel test e.g. ? WTF it is we are looking at
@walkerjian2 ай бұрын
This is great! I like how you think. A long time ago I combined Bresenham and Z-Buffer to do hidden line removal! And was working on including dithering (for shading and signal processing) into it too. I also worked with badly rounded data and the effects it had on the FFT's being used to signal process it. I kept going and unrolled all the fft loops flat, and successively rounded the FFT butterfly LUT's into zero crossings, producing an amazingly simple algorithm that recovered Walsh-Hadamard sequency space. And it was surprisingly good at doing the signal analysis/processing. Effectively it took the matrix multiplications inherent in the FFT down to simple additions (sign bit for negatives) Sound familiar? I suggested a similar thing could be done for the matrix multiplications in modern AI inferencing, and viola! we have 1.58 bit 'bitnet' solutions that could possibly be implemented on *simple* optical processors, no mults needed ;)