Brute Force Processing

Рет қаралды 155,294

Күн бұрын

In this video, I examine a sequence of strategies to try and improve the rendering speed of a Mandelbrot Fractal. Starting off with naive assumptions, I explore optimising the algorithm, then the use of vector co-processing, and finally threads and thread-pools in order to squeeze as much out of my CPU as I can.
Source: github.com/OneLoneCoder/Javid...
Patreon: / javidx9
KZbin: / javidx9
/ javidx9extra
Discord: / discord
Twitter: / javidx9
Twitch: / javidx9
GitHub: www.github.com/onelonecoder
Homepage: www.onelonecoder.com

Пікірлер: 571

@wessmall7957 4 жыл бұрын

Arm-chair fractal geometry expert here. Was expecting to be bored, but was actually thrilled to see how your optimizations would perform. However I'm dispointed you didn't go on a 45 minute tangent about the existential implications of the fractal nature of life, the universe, and everything.

@daedreaming6267 4 жыл бұрын

Well you see... What you forgot to take into account is that the fractal nature of things? Really boils down into one thing. And that thing? Is 42.

@SpaghettiToaster 3 жыл бұрын

@@daedreaming6267 What he actually forgot to take into account is that there is no such things as the fractal nature of things, because things are not fractal in nature.

@Capris4EveR 3 жыл бұрын

@@SpaghettiToaster in my point of view everything in nature is paralel (mirrored to infinite). We can't find the start since its infinit but we will be able to predict the future and change the lane of our evolution. In my mind a higher being developed us for God knows what purpose and to be paralel with that, someone else created them too, when our turn comes we will create other if we didn't already (animals).

@Kenjuudo 3 жыл бұрын

@@emjizone I'm pretty sure that if you would iterate through every single color component of every single pixel in every single frame in every single video for every single video resolution as they are stored on the google servers you'd pretty much find the number 42 in all of them. And for the off-chance a video's video stream doesn't contain that number there are also the audio streams to sifter through... :)

@austinwoodall5423 2 жыл бұрын

Fractals are so named for their FRACTionAL dimensionality. for instance, the Mandelbrot set is neither 1D nor 2D but 1.72 dimensional. Fractals are characterized by their infinite perimeter but 0 area (or other applicable into)

4 жыл бұрын

34:00 there's enough interest.

@Pogosoke 4 жыл бұрын

Yes please!

@3Triskellion3 4 жыл бұрын

Yes, please (or at least a blog post?)

@Gustvoasbezerra 4 жыл бұрын

Interest there is

@EIYEI 4 жыл бұрын

Yes please!!!!!

@p003872 4 жыл бұрын

Ohh yes there is!

@dartstone238 4 жыл бұрын

The SIMD conversion would be super interesting I thing. I do like the low level part of things.

@Agatio17x 4 жыл бұрын

I would love to watch a SIMD video too!

@ElGuAcA11 4 жыл бұрын

Me too, i'd love to see it!

@byrontheobald6663 4 жыл бұрын

From my own testing with other problems (N-Body) you don't actually need to do much. If you look here: gcc.gnu.org/projects/tree-ssa/vectorization.html You'll see that GCC will actually automatically compile SIMD instructions if you compile past -O2, just make sure your code is in a vector/array or otherwise contiguous in memory. I'd imagine MSVC will do something very similar.

@mileswilliams527 4 жыл бұрын

Yes PLEASE!!! I'm at this point in my coding journey where I need to start getting the most performance I can and have started looking at parallelism so this would be very interesting

@ScottFCote 4 жыл бұрын

I also would very much like to see this.

@dragan38765 4 жыл бұрын

More SIMD/AVX videos? Yes, please. I'd love to learn more about those, I only know the basics sadly. I pray to the compiler gods that they have mercy upon my soul and vectorize it automatically.

@douggale5962 4 жыл бұрын

In most cases, they can't auto-vectorize. Usually because it can't be certain your pointers aren't aliasing, it thinks your stores may be changing the values of subsequent loads, so it is afraid to use SIMD, it changes the results. You can help a lot with "restrict" pointers (__restrict in g++). It can be absolutely sure there is no aliasing when you say they are restrict. Saying restrict in the signature's pointers means that the function cannot deal with the restrict pointers pointing to the same memory. Applying that restriction sets the compiler free to assume that its stores won't be feeding back into loads, allowing vectorization.

@Bravo-oo9vd 4 жыл бұрын

Instead of praying, you can profile and inspect compiler output. It's hard to speed up your code when you don't know where the bottlenecks are and you can help the compiler with autovectorising if you provide it with some hints. Writing code with intrinsics directly is not very maintainable, and better thing to do would be to look into parallel processing libraries that manage this complexity for you. Also by looking at output assembly you're able to confirm that other optimisations are working such as instruction level paralellism

@GregoryTheGr8ster 4 жыл бұрын

I hate to have to be the one to bring you the news, but there are no compiler gods. I SWEAR THAT I AM NOT MAKING THIS UP!

@daklabs 4 жыл бұрын

The AVX sounds interesting, I'd like to see more on it!

@gideonmaxmerling204 4 жыл бұрын

there is a youtuber named "what's a creel" that has some AVX in asm vids and also some VCL vids which is a vector library for c++

@glitched_code 4 жыл бұрын

very interesting, i'm rooting for the SIMD conversion video to come out :)

@javidx9 4 жыл бұрын

thanks and noted!

@geotale95 4 жыл бұрын

I'm a simple man. I see brute forcing and the Mandelbrot set, and I click on the video.

@kepeb1 4 жыл бұрын

I'm a simpleton, I see overused formulaic cliche comments, and I join in.

@geotale95 4 жыл бұрын

@@kepeb1 Yeah, sorry, I didn't see any other comments and wanted to comment quickly, and that's the best I could do :p

@luisgeniole369 4 жыл бұрын

I think the Mandelbrot Set and fractals in general are so engaging because they're the closest thing we have of visualizing eternal cosmic inflation, and that pokes at a very powerful memory of our collective consciousness. Also the pretty colors.

@RogerBarraud 3 жыл бұрын

One does not simply brute-force into Mandelbrot...

@Omnituens 4 жыл бұрын

I'm in for a dedicated video on the SIMD conversion.

@ScramblerUSA 4 жыл бұрын

30:02 - should be "bit 255".

@antipainK 4 жыл бұрын

Could you make a video about dynamically resizing variables (ones, that when they run out of precision - they double their length)? That would of course require defining all calculations and make program slower, but would enable us to zoom "indefinitely".

@marcodebruin5370 2 жыл бұрын

One way to avoid the "waiting for the slowest thread" problem and to keep every thread busy till the end (and thus shorten the total time needed). Have the workers calculate the iterations for 1 pixel, then ask for the next pixel not yet started, then feeding the pixels to those threads until you've run out of pixels to do. All threads will be kept busy until the last few pixels. That should still give a significant improvement (especially on screen-areas that are unbalanced)

@jordanclarke7283 4 жыл бұрын

I wish to register interest in the SIMD code. Also I wish to acknowledge that the part where you talked about running in release mode versus debug mode was for my benefit after the (stupid) question I asked in your last video regarding the speed of decal rendering. Keep up the good work! Awesome channel! 👍🏻

@atrumluminarium 4 жыл бұрын

Yes please I would be interested in seeing more about the intrinsics and performance advantages. I've been programming for years and had no idea that was even a thing you can do in C/C++

@davidbundgaard 4 жыл бұрын

Its such a joy to see you program this fractal generator and I would love to see more about avx, gpu calculations on this problem and threads, very nice walkthrough. Keep it up as always.

@Collinoeight 4 жыл бұрын

Definitely interested in seeing that conversion. This is one of the best CS videos i've seen in quite a while.

@cocorinow 4 жыл бұрын

javid you angel! just as i'm starting my master's thesis having to write a program that will have to crunch through a lot of complex numbers you post this, thanks!

@nawdawg4300 2 жыл бұрын

I watched a numberphile video about the mandelbrot set (and the key fact that any value past a radius of 2 explodes to infinity) and I managed to write the code for a multithreaded implementation (not thread pooled) without any external resources. Seeing the mandelbrot pop up on the screen for the first time was an amazing feeling haha. Anyways now that I've stumbled across your video I can't wait to learn more about and implement the thread pool/vector co-processing (I just got my BS CS degree and am disappointed I didn't get to learn more about this stuff). This is definitely one of the better videos I've come across when it comes to mandelbrot implementation, so thanks for that!

@GregoryTheGr8ster 4 жыл бұрын

This is a totally riveting video! Not only do you dive deep into the Mandelbrot fractal, to the point where a mighty 64-bit double is inadequate, but you also dive into unearthing the incredible power of the Intel CPUs. I completely fell in love with SSE and AVX when I finally decided to learn how to make use of them, but I have not yet ventured into threads and multi-cores. I know that I need to. It's on my bucket list. No, actually it is much more imminent than that. I will get into multi-core programming soon. This is a *must*.

@antipainK 4 жыл бұрын

I would love to see how to do conversion to SIMD! :) Btw, my first C++ project was Mandelbrot Set generator and I did most of the mistakes that you pinpointed here. Now I'm gonna rewrite it after final exams. :D

@taraxacum2633 4 жыл бұрын

maybe real time ray tracing??? 😄 I am trying to do it rn but I'd love your explanation and tricks!

@javidx9 4 жыл бұрын

Its a popular request for sure!

@wesleythomas6858 4 жыл бұрын

Awesome video. Great touching on the different techniques available, never even knew about SIMD! 👍

@DiamondShocked 4 жыл бұрын

This is exactly the type of computation that GPUs are designed for, and you would likely see a huge performance improvement there. In addition, while this approach generalizes fractal computation somewhat well, there are many techniques to be explored to optimize specifically the rendering of the mandelbrot set. For example, there is no need for the computation to go to the max iteration count for pixels which are clearly in the middle region and will go forever if possible. There are also ways to scale the values of the floating points based on the zoom so that the issue of precision is not a limitation. This would allow you to zoom indefinitely. Next, it does not make sense that 32 threads should have improved performance over 8 threads on your machine, since your 4 core processor can only effectively execute instructions from 8 threads simultaneously (two instruction streams per CPU core). I would also say avoid busy waiting for threads in the thread pool. Simply deschedule these threads if they do not have work to do, and make them runnable when they should run. It is perhaps possible that this is handled somehow by the OS/Compiler (if a loop is seen to jump to the previous instruction and the condition is not met, the processor may be able to dynamically see that no useful progress can be made and the thread will yield to another?), although unlikely. While it is a rule of thumb to avoid thread busy waiting at all costs on a single core machine, there are reasons why busy waiting may be an appropriate solution in a multi-core system: conditions can change while a thread is busy waiting due to the behavior of other threads, and the system call to make the thread runnable can be avoided. So the tradeoffs between these options should be weighed. Also your solution to thread synchronization (atomic counter) can be replaced more elegantly with a barrier or other synchronization primitive. An interesting experiment for the approach you have taken would be to disable the optimization flags of the compiler, so that simple code optimizations such as loop unrolling or strength reduction can actually be measured.

@AusSkiller 4 жыл бұрын

With the way he was doing things it absolutely makes sense that 32 threads would improve performance over 8 threads, what is probably tripping you up is that the number of threads isn't the reason it'll be faster (because more threads doesn't really help as you probably correctly concluded), but rather it is faster because he divides the area up based on the number of threads thus reducing the time it takes to process the most computationally expensive section, which with only 32 sections is likely still a major bottleneck. With just 8 threads and 8 sections whenever a thread completes its section then the hardware thread will sit idle until the others are all complete, but with 32 threads and 32 sections whenever a section is completed another software thread can utilise the hardware thread that was freed up, and the maximum time to complete a section is reduced to a quarter of what it would take with 8 sections allowing it to be up to nearly 4 times faster in the worst case scenario for 8 threads. However it is entirely dependent on the complexity of the sections and how well balanced they are, if all 8 sections are equally complex then there will be no performance benefit running 32 threads/sections, and 32 threads/sections may even be marginally slower in that case. Basically the more sections there are the better it can load balance and keep the CPU fully utilised, at least up to the point where the overhead of more threads starts negating the better utilisation.

@luisendymion9080 4 жыл бұрын

AusSkiller is right. You're speaking of a better but way more complex implementation. Javi's goal is to use a minimalist implementation that still makes a good use (although not optimal) of the cores. For his algorithm makes sense to use more threads because it reduces the overall time to join them.

@DiamondShocked 4 жыл бұрын

@@AusSkiller I see your point. The load balancing is the key issue here. For each thread to have somewhat equal work, each pixel should have an estimated cost and the total cost of the pixels for each thread should be similar. At least that's one way you could do it off the top of my head. Also the pixels chosen for each thread should be done so with memory access locality in mind.

@Splatpope 4 жыл бұрын

yep, got 60fps all the way to float precision limit on my opengl implementation of the julia set

@Splatpope 4 жыл бұрын

can you tell me how to overcome float precision ?

@Zartymil 3 жыл бұрын

Such a fun video! Didn't even feel like it was almost an hour long. Thanks for the content! Subscribed :)

@javidx9 3 жыл бұрын

Cheers!

@rhydermike 4 жыл бұрын

Highly enjoyable video there. I first came across the Mandlebot set on my BBC Micro as there was a viewer included on a Disc User Magazine cover disc. It wasn't *quite* as fast as your routine in execution. You've made me want to break off from my regular programming project to render the Mandelbrot set, something I have never done. Just one for the nerdy bucket list. Great stuff, and thanks for making it.

@Gordonfreems 4 жыл бұрын

The value of this video is incredible, thank you

@Endede 3 жыл бұрын

In addition to the great walkthrough of optimizations, and the fantastic explanations of successes and failures of optimizations, I immensely enjoyed your near-giddiness when playing with your Mandelbrot Set Explorer. Clearly, you love coding, but I feel like in this video, it was extra-apparent and frankly infectious.

@ogr2 4 жыл бұрын

Awesome video man. Actually I was watching another video about the "Mandelbrot Conjunction" and I found it quite interesting to analyze.

@javidx9 4 жыл бұрын

Cheers Oscar, they just never get boring - I really find Mandelbulbs quite interesting

@Kenjuudo 3 жыл бұрын

Awesome video. It seems I've struck gold by finding your channel. I can already tell I will never regret subscribing to you! Don''t ever change your format! You hear???

@zikomo8913 4 жыл бұрын

I just started learning programming and I was dreaming of making a Mandelbrot Fractal plotter and then improving it bit by bit(going lower and lower towards hardware) to optimize it. and now you did a video on it, amazing !

@jsflood 4 жыл бұрын

Amazing as always. This is great! Really interesting. I would love to see a video explaining the vector co-processing (immtrin/SIMD/AVX) further. Thank you. Great work :-)

@sanyasanders9580 2 жыл бұрын

Wow! It's incridible! Thanks for this video

@chrismingay6005 4 жыл бұрын

Great video thank you. I really liked the pacing (as always) and while not a c++ programmer in any capacity I learned quite a lot as well.

@Hacker-at-Large 4 жыл бұрын

I’d also love to see this extended into using the GPU.

@gilleswalther5964 4 жыл бұрын

Great performance! Really impressive what you can get out of the CPU.

@DFPercush 4 жыл бұрын

+1 just for the profile pic. Very apropo.

@andersonklein3587 4 жыл бұрын

I absolutely loved this AVX implementation! I might just try to implement that on a game engine I'm building for parallel processing inside thread pool. Thank you so much for your amazing video on optimization! PS: I'm actually ashamed that I hadn't used .wait on my code for the pool. I literally had my threads pooling constantly and wasting circles. lol

@battosaijenkins946 2 жыл бұрын

Amazing. I was using intrinsics to see if I could outperform the quake 3 square root dilemma a while back and brought back wonderful memories. I cant wait to try this mandelbrot concept using OpenGL! Thank you~ 👏👏👏

@benrex7775 4 жыл бұрын

This was really interesting. Thank you for making the video.

@rachelmaxwell4936 4 жыл бұрын

I enjoy and appreciate all of your content. I would really like a video on intrinsics.

@moritzmessner1459 4 жыл бұрын

Your voice is so calm and nice to listen to!

@morphx666 4 жыл бұрын

Absolutely! I'd love to see a video explaining the conversion to SIMD data types.

@SamClayMold 4 жыл бұрын

Hey javidx9, love this video and your channel. Commenting here to say I am definitely interested in that video about the 256 bit registers and parallelism

@cameraforchristmas 4 жыл бұрын

What a terrific find! I'm working on a project based on the Mandelbrot set. I fell in love with it when It was an assignment in college many years ago. What is crazy is that I came across your channel not for the Mandelbrot part but when trying to find a good C++ development environment for the Raspberry Pi. I chose WXWidgets, Code:: Blocks and MinGW. So, I found the channel vid on WxWidhets setup. Now here I've run into you twice on the same project. How cool is that? I've already compiled my Mandelbrot starting point in C#, my favorite environment. My end point is a Raspberry PI and a 7" touchscreen in a permanent case. It should allow the user to zoom in (and out). I want it to behave as a purpose-built, single function appliance. No desktop, no multiple-windows, no choosing the app to run, just plug it in to the wall power. For extra credit, I thought I'd see if I could output the display on the HDMI connector at the same time so someone can enjoy it on a monitor or the living room TV. This would be the Manderbrot-set-top-box. Haha. So, thanks twice for the great content! Dave San Francisco

@DavidWeiss2 4 жыл бұрын

Please do more like this. I learned a lot!

@duality4y 4 жыл бұрын

thanks for this! it explains a lot of questions i had regarding sse and simd

@christianantfeld3827 4 жыл бұрын

I'm shocked by how smart you and the other commenters are. Absolutely incredible if anyone else can follow along.

@carsont1635 4 жыл бұрын

Thank you for the excellent video! I have a course next semester specifically regarding parallel processing and this was a great introduction into some of the techniques involved. And I would love to see the video mentioned about how to use a cpu's vector processor!

@CrystalMusicProductions 4 жыл бұрын

Wow this video and in general your channel are a great motivation to keep programming and exploring the field :)

@javidx9 4 жыл бұрын

Thank you Crystal Music!

@patrickc1659 4 жыл бұрын

Really enjoyed this, thanks. I'm all for hearing about how the AVX-256 works.

@muckiSG 4 жыл бұрын

For everybody interested in the mandelbrot set and not deep in the subject i woukd suggest to search for "numberphile mandelbrot". Really nice videos ... have fun!

@homomorphic 3 жыл бұрын

Definitely your best video.

@DestroManiak 3 жыл бұрын

this is fascinating, immensely helpful for scientific calculation coders

@portlyoldman 4 жыл бұрын

Blimey. Really enjoyed that 😀I’m not even a C++ programmer only comfortable with C# but I learned a lot from this. Also enjoyed the style and presentation. Thanks. Subbed 😁

@javidx9 4 жыл бұрын

Thank you Jim, much appreciated!

@DiveInsideTech 4 жыл бұрын

Hi David, the vids are awesome, and I loved them! You are supposed to be earning more subs!

@anakimluke 4 жыл бұрын

THE INTRO SOUND is finally better!! thanks! hahah :)

@fernandomartins5451 4 жыл бұрын

Awsome Video! I'd love to see more about SIMD and vectorization.

@emanuelebonura783 4 жыл бұрын

Javidx9 brute-forcing youtube compression algorithm, great video!

@stephenbradshaw6727 4 жыл бұрын

A splendid video! Thank you very much. It might be interesting to insert a generator function to be called in place of the Mandelbrot-specific code, with arguments that will create different kinds of fractals for exploration. I am certainly interested in seeing more specifics w.r.t. implementing the AVX extensions. Amusingly, the Mandelbrot Set is based on iterating quadratic polynomials, similar to those which describe population growth and are being used to gain some insights into our current global predicament.

@gyokzoli 3 жыл бұрын

Excellent video, thank you!

@tomay3000 4 жыл бұрын

Man, this is awesome, the math performance calculations, I didn't know about that header file before. Yes I do want to see more videos about it and how you did the conversion. TIA.

@kris030 4 жыл бұрын

please do more of this, optimization is my jam

@mattbland2380 3 жыл бұрын

Loved this. I remember entering Basic programs to plot the Mandelbrot set back on 8 bit computers and tweaking them myself without understanding them properly and getting wonderfully strange results.

@AstrosElectronicsLab 3 жыл бұрын

New sub, man. Love your stuff.

@ryanlyle9201 3 жыл бұрын

"quite advance for this channel". My god, he's holding back his full power.

@ghollisjr 3 жыл бұрын

I had a lot of fun making a Mandelbrot fractal Web app with WebGL by putting the fractal calculations in a fragment shader. Highly recommend it as a learning project.

@mendonisstudios 4 жыл бұрын

Great video. Thanks for the insights!

@undersquire 4 жыл бұрын

luv ur vids very informational :) i've learned a lot i would love to see a series on possibly making your own basic coding language in C++

@javidx9 4 жыл бұрын

It's certainly an interesting idea! If I find a need for such a language then it may make a good series

@porkmancer 4 жыл бұрын

Thanks for the great video, very interesting stuff.

@dhiahassen9414 3 жыл бұрын

Best fractal programming video on youtube

@Padeir0 4 жыл бұрын

that was a very enjoyable video! and i'm really hoping for a follow up video working around the precision issues :D

@mohamedmoatyhassan5125 4 жыл бұрын

I absolutely love this channel

@cumulonimbuscapillatus9312 4 жыл бұрын

The accidental video recording lag near the end though lmao. Great video as always!

@javidx9 4 жыл бұрын

lol, hey its an interesting and very real side effect, so I thought it had value and kept it in :D

@oscill8ocelot 4 жыл бұрын

Consider this my shout-out for more vector processing instruction videos :3

@sachinambetkar3637 4 жыл бұрын

Thank you so much sir ... I always follow your tutorial.. it helped me a lot ... 🙂

@javidx9 4 жыл бұрын

Glad to hear that Sachin!

@rednike500 4 жыл бұрын

It would be nice to see a video on parallelism and avx, but your video are always super interesting. Please continue to do as you like in your videos, you are an amazing programmer

@ComposingGloves 3 жыл бұрын

If you had kept reading the wiki you would be my micro-processor professor who seems to think reading slides counts as teaching. Thanks for teaching! Your channel is amazing!

@colinbrown7947 4 жыл бұрын

Very nice video! I would love to see you implement more mandelbrot specific optimization, perhaps even looking into perturbation or series approximation.

@s-sugoi835 4 жыл бұрын

Subbed since 10k and your videos still motivates me to not give up on C++ even tho I mainly program nowadays using Python 'cause of university. C++ sure is beautiful 😊.

@lifeisgameplayit 4 жыл бұрын

Big thumb here. I find this tube content rather elegant with some funky twist in it. ;}

@kyleeames8229 4 жыл бұрын

I too would be interested in the SIMD conversion video. I'm always fascinated by low level stuff and would happily watch the video.

@matthewpalermo4957 4 жыл бұрын

Loved the demonstration of how good the compiler optimization is! Question: Is there an abs_squared method for std complex values? That could be used instead of your implementation and still avoid the square root.

@johngreen1060 4 жыл бұрын

That would be a dot product of the complex number against itself or multiplication of the number by a complex conjugate of itself.

@fredg8328 4 жыл бұрын

Compilers are not always good at optimizing. They are smart to do small optimizations, but generally you can do better with higher level ones. And there are a lot of differences between compilers. Visual Studio compiler is known to be one of the worst at optimizing. But that's the same story as why you should use STL or not. Visual Studio follows the standards more tightly and has to add a lot of tests in the generated code.

@wtk300 4 жыл бұрын

std::norm(std::complex)

@reik2006 3 жыл бұрын

Very satisfying!

@aim__freakz8499 4 жыл бұрын

This is super interessting!

@lordskeletorde 4 жыл бұрын

Awesome video, and I definately want to know all about SIMD!

@dioneto6855 4 жыл бұрын

funny because when you started using thread I thought what could happen if you go to a stressful region of the fractal to the processor while recording and thanks you killed my curiosity

@oblivionronin 4 жыл бұрын

Awesome long video love it ! Fractal are indeed hypnotizing ! Also Yes i definitly want to see that SIMD optimisation. I was thinking, couple things we coudl do to optimise the display of this woudl be 1. cache the world position, zoom level and itteration count, if thoses havent changed we can just re-render that frame thats already baked in. 2. once thoses are calculated once, you can use that data (time for the thread to finish) to guess where and how you shoudl divide up your thread grid, maybe some area (like that big half screen with very little calculation shoudl be handled by only one thread and that one sliver of fractal shoudl be calculated by the other 31 threads. (IE : you hold the thread time to finish for each of them, and an average of them all. On each frame or itteration if the thread finish time is shorter then the average, it shoudl take some screen space from other threads (verically or horizontal) if its higher it shoud give some up. That way, threads kind of dynamically exchange their screen space depending on their own finish time and the average time it takes. ) Sorry for the long comment, just had to put that out there ! :P Cheers !

@antipainK 4 жыл бұрын

Yeah, dividing screen to threads by bitmasks should do the trick. Didn't think of it. Thanks, gonna implement it in my own Mandelbrot generator :)

@maybenexttime37 3 жыл бұрын

Great video! An improvement would be to divide the screen into many thin rows instead. Rows are better than columns for two reasons: 1- Better cache locality for threads when they're working on a particular job 2- You can increase the number of rows arbitrarily whereas number columns have to be tuned such that you're as close to a multiple of 8 pixels per column as possible (otherwise you give up potenial AVX benefits). This is important because in order to reduce bubbling you want to increase the number of jobs so that the workload is more balanced among threads. Of course to do that you need to decrease the size of each row/column. Maybe even down to one single pixel row per job, because at given width (1280) thread signaling overhead will likely be negligible even then.

@unknownuser8044 4 жыл бұрын

And now, Implemented on the GPU? :)

@javidx9 4 жыл бұрын

Well that is indeed the next phase is suppose, though I reckon there is a lot to squeeze out of the CPU yet!

@Gelikafkal 3 жыл бұрын

@@javidx9 I think the communication between cpu and gpu over the PCI bus will eat up most of the performance that a GPU implementation would give you as long as you don't have a way to directly give a device pointer to the graphics library/engine.

@mariokart6309 3 жыл бұрын

@@Gelikafkal GPU compute is very much viable for anything that can take advantage of mass parallelisation, otherwise why would that functionality even exist?

@lucasgasparino6141 3 жыл бұрын

To anyone interested: the CUDA C book has a bare bones implementation of the Julia/Mandelbrot sets, written in C. It's very intuitive to use, and the difference is GIGANTIC if your GPU is reasonably modern.

@TenshiSeraph7 3 жыл бұрын

Yup, I reimplemented the Mandelbrot set using recursive CUDA, although it was not a live updating viewer (following a tutorial). Indeed the transfer of data between host and device (GPU) is usually costly, but it should be possible to move fractal data directly to the frame buffer without transfering everything back to the CPU. There is also an optimization for parallel implementations of the Mandelbrot set, which takes advantage of the fact, that all points in the set are connected. You can divide the set into different rectangles which are computed by different threads. First you only compute each pixel on the border of the rectangle. If none of the border pixels contain any part of the fractal, then you can skip the calculation of the pixels inside the rectangle (assuming you are not zoomed out too far).

@skilz8098 4 жыл бұрын

I found this to be an excellent video and very enjoyable. As for the possibility of you making a video strictly based on the intrinsics library that is based on the CPU's extensions, I would truly be interested in it. I've always used Intel CPUs and I'm decently familiar with the x86 architecture, but never got to work with that part of the CPU's instruction set. Something else to consider... One possible optimization could be to calculate each iteration of the Mandelbrot set as a recursive Function Template. The compiler can precalculate the values for you. Then you can cache them repetitively for each loop, each iteration. Another technique would be similar to the concepts of how rendering pipelines work with a back buffer and a swap chain. You would write to a texture image that is the back buffer first, calculate the next iteration of values, and write it to the next texture to be rendered. Then have the swap chain present the image to the screen as a single texture image mapped to a quad that is the size of the screen using basic vertex and fragment(pixel) shaders. After that have a compute shader working in its own thread to calculate the next iteration based on the input of where the zoom took place within the image and the zoom amount and zoom level. This will then create the appropriate texture for the next iteration and feed it into the swap chain's back buffer. This ought to make it possible to have nearly an infinite amount of zooming levels with minimal overhead, providing the precision of a 64-bit unsigned integer to represent which zoom level we are on. This zoom level variable would, of course, end up being incremented or decremented accordingly depending on the direction of the next zoom provided by the user. Another thing to consider is that we do have double precision floating point variables to work with, but they shouldn't be needed. It should be possible to work with basic floating-point types, however, we have to know what level of scale we are at since this algorithm will end up generating a unique set of floating-point values for each iteration and zoom level. There may be a small amount of overhead as we would have to keep an extra floating-point value around to keep the state of the previous zoom level so that if the user does zoom out, we can easily reference the previous state to use that for the next set of calculations. It's almost the same thing you've already done in this video, but it would be adding a few different set of features, template metaprogramming to precalculate values for you and the usage of a graphics pipeline involving not just the basic shaders, the swap chain, and the back buffer, but to utilize compute shaders as well as giving us the ability to perform both pre and post-process calculations of data for the next iteration to be rendered.

@furyzenblade3558 4 жыл бұрын

Great Video!

@ratchet1freak 4 жыл бұрын

About the condition variable wait, you should wrap the unique_lock and the wait in a block to avoid holding that lock for too long (doesn't really matter in this case because each worker has its own mutex), and you should guard against spurious wakeups. You could also try other pimitives for launching the threads like a semaphore or a barrier. The barrier is the superior method to do this kind of waiting as it can actually wake up multiple threads at the time (which you can't with condition variables even with broadcast due to thundering herd related issues) and the completion function can both signal completion to the main thread and then wait for the next task. (edit) Also if you divided tasks by row instead of column you might get some gains because of cache prefetching and false sharing avoidance.

@d97x17 4 жыл бұрын

Very nice explanation. I would be interested in the conversion process :) If possible it would be awesome if it also includes a small introduction to assembly itself I have a question regarding the parallelisation, would you recommend using the thread functionality from the standard library (as you did in this video) or would you advise using the OpenMP API?

@tarikeljabiri 4 жыл бұрын

As always amazing things. That's why C/C++ called low level. Thanks. Amazing and complex.

@seanvinsick5271 4 жыл бұрын

Very interested in simd conversion. Also would love to see more multithreading

@TheMrNaatan 4 жыл бұрын

Oh yes we would like to see this video on SIMD/AVX ! :D

@cosaga29 4 жыл бұрын

As someone taking a parallel computing course I love seeing the raw performance gains with the various parallelization? methods. Thank you for showing gains in a nice visual manner! When you were developing this, did you toy with adding OpenCL/CUDA implementations?

@gustavoandre8866 4 жыл бұрын

Perfect, continue with thats videos

@RogerBarraud 3 жыл бұрын

Very nice indeed - thanks!

@BuxleyHall 4 жыл бұрын

Thanks for the video! Add my name to those interested in learning more about AVX and SIMD. On a different topic, but still related to the but related to rendering the Mandelbrot set, I’d love to see how to get past the resolution limit of 64-bit doubles so one could explore even deeper into the set. Thanks again!