-mno-fma: “FMA tells the compiler to allow generation of fused multiply-add (FMA) instructions, also known as floating-point contractions. NOFMA disables the generation of FMA instructions.”
@AZisk2 жыл бұрын
Nice! Thanks for that.
@edmondhung60972 жыл бұрын
M1 or aarch64 have no FMA anyway?
@Zwiebelgian2 жыл бұрын
If you use g++ on macOS it actually uses clang++. To test this, run g++ -version and see if it mentions clang. If you want to actually use g++, you have to (on my machine at least) type g++-11.
@markp84182 жыл бұрын
I would like to see a video comparing build times for a significant C/C++ project such as Yocto - (this will require running a VM with arm Ubuntu) - my M1 MacBook Pro took about 2.5 hours with all performance cores at 100% - the machine i now use for builds is an Intel Xeon with 16 cores takes about 45minutes..
@quimblyjones97672 жыл бұрын
Cheers for everything you do mate.
@AZisk2 жыл бұрын
Thanks! Glad you enjoy
@danny_the_K2 жыл бұрын
Wow, that’s a huge jump in performance for C++ devs. I wonder what else will benefit from it. Good test Alex.
@robgreen132 жыл бұрын
Yes, it will help me when developing in C++ ... when testing my code. But it will help the customers that run my code ... they're normal people, not C++ devs! :) So it isn't just me that benefits, or me that needs to buy an Ultra, customers will too (or a MacPro when it comes). (Message to my boss if you're reading this ... please can I have one?)
@gempio26342 жыл бұрын
@@robgreen13 won't it help you with compilation time?
@robgreen132 жыл бұрын
@@gempio2634 Yes it will make everything faster for me, which is great. It will also be better for customers running my code ... faster computers benefit everyone who needs/notices the speed and more efficient computers save energy (although my daughter would remind me to look at that entire lifecycle of the computer from manufacture to disposal before making a firm judgement on that aspect!)
@quimblyjones97672 жыл бұрын
Thanks for putting in the baby air 😁 Nice to see it getting some love
@mateuszmikoajczyk20692 жыл бұрын
i suppose if you would seed the pseudo random number generator with a known value you would get a deterministic set of numbers right? Which means that each instance of the program would be sorting exact same array thus we could compare apples to apples :)
@hrissan2 жыл бұрын
Nice suggestion!
@johningham18802 жыл бұрын
I came down here to find if someone had already said this. I do not know C++, but when running simulations in, say, MATLAB, with which I am familiar, it is easy to seed the random number generator-there is even a command specifically for it..
@lesfreresdelaquote11762 жыл бұрын
Hello, you should use: -Ofast, it is even faster and makes automatically use of GPU for vector computing. It works with clang of course...
@lashlarue592 жыл бұрын
Just for fun I ran those same tests on my M1 macmini vs. a Dell laptop with a 8 core I7 and the results were within a second of each other.
@benygh9112 жыл бұрын
Alex my Bro, Technically You ARE CORRECT..! 😁✌ for MANY Years and years the Processor [or CPU] WAS *only 1 CORE,* so having many Cores IS in FACT having Many Processors {at a Tiny Physical Scale ON the Silicon) GreeTs and G0D Bless "U" ALL 😀👋✝
@avask172 жыл бұрын
the -march=ivybridge specifically optimizes for intel ivy bridge architecture. to let the compiler do at least some optimization for the M1 you’d better write -march=native, which, as the name implies, does optimize for the native architecture
@AZisk2 жыл бұрын
yep, tried that, but got a complaint about march not being recognized
@avask172 жыл бұрын
@@AZisk hmm, okay, that’s kinda weird the m1 clang doesn’t support it, I definitely remember doing -march=native on x64 :) It looks like it’s gonna be added to clang 15.0 though
@SimplicityForGood2 жыл бұрын
What idle like Atom, Sublime Text have shown the mostpromises to work well with the Macbook Pro Max?! Thanks for getting back to anyone that tested this and come to an understanding if one idle actually is a clear winner to use with these new MacBook pros!😎
@AZisk2 жыл бұрын
i prefer VS Code as my editor generally. Unless i’m doing .net
@SimplicityForGood2 жыл бұрын
@@AZisk have you tested Atom?
@AZisk2 жыл бұрын
@@SimplicityForGood never have
@SimplicityForGood2 жыл бұрын
@@AZisk alright, I really enjoy it! 😃 just got my MacBook Pro with fhe max chip! About to create a home office now 😎🤘🏻
@很好-h8e2 жыл бұрын
I wish that one day Visual Studio on Windows would compatible with arm64 because the poor translation from x86 to arm would sometimes makes the fan of my MacBook Pro 13 kicks off
@bhushanladhe98162 жыл бұрын
New rider works perfectly for my .net core solution. I am using .net 6 on Macbook pro M1 Max
@很好-h8e2 жыл бұрын
@@bhushanladhe9816 Actually, I am learning how to develop windows application, that's why I have to run visual studio on windows.
@platin21482 жыл бұрын
Btw. why are you using the old gcc alias? MacOS doesn’t use gcc anymore they use clang.
@pwhv2 жыл бұрын
thank you so much man, you are one of the best doing this you won a subscriber and keep the great job
@mateosking2 жыл бұрын
The B. in Benoît B. Mandelbrot stand for Benoît B. Mandelbrot. And inside that the B stands for Benoît B. Mandelbrot. And inside tha…
@AZisk2 жыл бұрын
yess!
@MeinDeutschkurs2 жыл бұрын
Alex, why is your BMD ATEM ISO so heavily flickering?
@richardyoung30362 жыл бұрын
To build arm64 version using SSE optimizations: Find on web a header named sse2neon.h, modify the file main.cpp of Mandelbrot to include sse2neon.h, and define _ _ SSE _ _
@ramialkaro2 жыл бұрын
Wonderful and brilliant topic as always. Thanks
@nomasprime2 жыл бұрын
Surely test runners would be one of the biggest general gains for more CPU cores?
@ahashem2 жыл бұрын
What is the mic you are using in your videos?
@AZisk2 жыл бұрын
sennheiser mkh50
@mohmedezzet52172 жыл бұрын
Thank you for this amazing effort but I have a question i need to buy a laptop for deep learning and neural networks cnn and image processing tasks is macbook pro m1or windows with rtx 3060/3070/3080 is faster
@AZisk2 жыл бұрын
just did a video on this recently. rtx for now
@AliAhmed-dh3bp2 жыл бұрын
Nice audio (mic) quality
@kbaeve2 жыл бұрын
Is there any real practical reasons for using these new M1s? I mean if you gotta get down at this level to find some performance boost it looks kinda, hm, waaaay over the top for any normal to mid-range nerd users
@kbaeve2 жыл бұрын
Too me 2019 MacBook pros just seem like the best bang for any buck out there. Tons getting sold for cheap and they are still super good computers for years for normal users no?
@JP-js8jr2 жыл бұрын
low power consumption, no fan noise, no heat
@86400SecondsToLive2 жыл бұрын
My current machine (rtx 2080 ti & ryzen 5600x) eats electricity worth 60€ each month. I only do statistics, stochastic simulation and play wow. If the performance is remotely close, saving 50-54€ each month might be worth the switch. Well, might substract a few bucks, because I'll have to actually turn on the heater during winter months ;-)
@kevinxin15452 жыл бұрын
Hey Alex, just curious what camera gear ur using. Are you shooting in 4k or 1080p? Thanks!
@AZisk2 жыл бұрын
i’m shooting mostly 1080, but upscale to 4k during production
@JustinHiggins2 жыл бұрын
FWIW I ran the Mandelbrot test on an m1 mac mini running Asahi linux with 100000 as the parameter and it took 25 seconds compared to 42 seconds as per this video. Not as fast as the Mac Studio, but 40% faster than the same M1 processor running MacOS. I tried both g++ and clang+ and got the same result. Not sure why it's so much faster, I might be missing something.
@seanogorman69402 жыл бұрын
awesome
@synen2 жыл бұрын
As more compilers get updated to fully utilize the cores of the Pro Max and Ultra the more we will benefit from the new architecture, sounds obvious but this is really new architecture, takes time.
@platin21482 жыл бұрын
I think arm is already pretty good optimized. The diff is here mostly the core count.
@fenrisler2 жыл бұрын
What are you going to do with all this saved time? Run some tests!
@tuskig70172 жыл бұрын
Hi, I hope next time you can test the source compilation time of GO and RUST under M1 and M1 Ultra, they both make good use of multi-core.
@kenvererer2 жыл бұрын
What's that 747 cockpit keyboard above the magic keyboard?
@AZisk2 жыл бұрын
oh that’s for my 747 :) haha. no, it’s the ATEM mini extreme, which i use to live stream
@josejimenez75022 жыл бұрын
Holy smokes that is a huge difference in performance how would these tests stack up with Java ? I guess I need to test this out now lol amazing
@enitalp2 жыл бұрын
Set the seed for the random number generator and you will be deterministic.
@elgreengroo2 жыл бұрын
Did I notice fortran there? What about it?
@donpeleas47802 жыл бұрын
you can get the same performance with a PC half the price.
@Ericerikerich2 жыл бұрын
Great stuff as usual, Alex! Could you try the following comparison? M1 Pro, Max and Ultra running a lengthy macro in Excel via Parallels? Would love to know how much multi-core workloads in Parallels would benefit from the extra hardware.
@ramialkaro2 жыл бұрын
Hey Alex👋, Is that you alex who replay to people “claiming gifts 🎁 “ or someone else pretending to be you.
@AZisk2 жыл бұрын
scammers
@Criteria122 жыл бұрын
too many comments on the code if I'm making code review for you
@KoenKooi2 жыл бұрын
Kudos for saying sea-lang instead of klang :)
@AZisk2 жыл бұрын
I've been guilty of saying it both ways before 😅
@andrewgrant7882 жыл бұрын
I always pronounce it klang, I didn’t know there were people who pronounced it sea-lang. I am shocked.
@wokecults2 жыл бұрын
I have a question. Did developers already figured out what the Neural Engine cores are for and if they could be used to train models? Thanks.
@harrytsang15012 жыл бұрын
Yes and No The neural engine cores are only exposed via Apple’s API which constraints the structure of the model to be static. It is good at inference at low power, but for training, GPU is still more flexible and powerful with a higher TDP
@wokecults2 жыл бұрын
@@harrytsang1501 Thanks Harry. When you say that the APIs constraint the structure to be static, do you mean that the cores are only used to process models but not for training? And even if the Neural Engine cores are not fast as GPU, wouldn't be good to add them for training anyway? Currently I have tensorflow installed, but only works with CPU and GPU. If I were able to add the Neural Engine cores, training should go faster. Did anyone figure out if that's possible? Does CreateML use these cores?
@harrytsang15012 жыл бұрын
Even if you managed to do so, it is safe to say it will likely be slower using Neural Engine cores than not. First, it takes a specific CoreML format that is similar to TorchScript in the way that it has to be quantised, optimised and built for the architecture. This format is foreign to TensorFlow, ONNX or PyTorch. Converting the models is a separate step before being able to use the Neural Engine. Second, the inputs and outputs has to be very strictly defined. Whereas sometimes image processing models can take a scaling factor parameter with varied frame sizes which can be adjusted during convolution. This is simply not possible with the Neural Engine. It’s similar to the memory copying overhead of using GPU, but the gains are minuscule at best and you lose the flexibility of TensorFlow or ease of using Python. GPU has notable speedup over CPU for training because you can train the model with large batches of data using the many stream processing units. The same cannot be said about the Neural Engine since there’s so little transparency. Guessing from documentation, the CoreML format is optimised for the specific hardware architecture on stuff like memory locality and caching. Which makes it possible to be faster on inferring a single input/iteration, but that’s not exactly what you need for training.
@wokecults2 жыл бұрын
@@harrytsang1501 Oh, now I get it. I didn't consider model conversion. Still, it looks like not even Apple uses these cores for training. I just checked Create ML specifications and they say "Train models blazingly fast right on your Mac while taking advantage of CPU and GPU.". It looks like only the CPU and GPU are used for training, and the Neural Engine cores were not designed for training at all.
@benygh9112 жыл бұрын
Alex my Bro, Technically You ARE CORRECT..! 😁✌ for MANY Years and years the Processor [or CPU] WAS *only 1 CORE,* so having many Cores IS in FACT having Many Processors {at a Tiny Physical Scale ON the Silicon) GreeTs and G0D Bless "U" ALL 😀👋✝ @Alex Ziskind
@420247paul2 жыл бұрын
We finally found it, something this piece of junk shines at.
@platin21482 жыл бұрын
The program would actually be significantly faster if it’s not using all of these crappy std’s.
@hasin96692 жыл бұрын
Do a giveaway. I want to learn programming but I can’t afford a laptop. So do a giveaway so maybe I can win it, and be a programmer. 🙂