FINALLY, where the M1 Ultra REALLY shines!

Рет қаралды 37,074

Күн бұрын

Пікірлер: 78

@KoenKooi 2 жыл бұрын

-mno-fma: “FMA tells the compiler to allow generation of fused multiply-add (FMA) instructions, also known as floating-point contractions. NOFMA disables the generation of FMA instructions.”

@AZisk 2 жыл бұрын

Nice! Thanks for that.

@edmondhung6097 2 жыл бұрын

M1 or aarch64 have no FMA anyway?

@Zwiebelgian 2 жыл бұрын

If you use g++ on macOS it actually uses clang++. To test this, run g++ -version and see if it mentions clang. If you want to actually use g++, you have to (on my machine at least) type g++-11.

@markp8418 2 жыл бұрын

I would like to see a video comparing build times for a significant C/C++ project such as Yocto - (this will require running a VM with arm Ubuntu) - my M1 MacBook Pro took about 2.5 hours with all performance cores at 100% - the machine i now use for builds is an Intel Xeon with 16 cores takes about 45minutes..

@quimblyjones9767 2 жыл бұрын

Cheers for everything you do mate.

@AZisk 2 жыл бұрын

Thanks! Glad you enjoy

@danny_the_K 2 жыл бұрын

Wow, that’s a huge jump in performance for C++ devs. I wonder what else will benefit from it. Good test Alex.

@robgreen13 2 жыл бұрын

Yes, it will help me when developing in C++ ... when testing my code. But it will help the customers that run my code ... they're normal people, not C++ devs! :) So it isn't just me that benefits, or me that needs to buy an Ultra, customers will too (or a MacPro when it comes). (Message to my boss if you're reading this ... please can I have one?)

@gempio2634 2 жыл бұрын

@@robgreen13 won't it help you with compilation time?

@robgreen13 2 жыл бұрын

@@gempio2634 Yes it will make everything faster for me, which is great. It will also be better for customers running my code ... faster computers benefit everyone who needs/notices the speed and more efficient computers save energy (although my daughter would remind me to look at that entire lifecycle of the computer from manufacture to disposal before making a firm judgement on that aspect!)

@quimblyjones9767 2 жыл бұрын

Thanks for putting in the baby air 😁 Nice to see it getting some love

@mateuszmikoajczyk2069 2 жыл бұрын

i suppose if you would seed the pseudo random number generator with a known value you would get a deterministic set of numbers right? Which means that each instance of the program would be sorting exact same array thus we could compare apples to apples :)

@hrissan 2 жыл бұрын

Nice suggestion!

@johningham1880 2 жыл бұрын

I came down here to find if someone had already said this. I do not know C++, but when running simulations in, say, MATLAB, with which I am familiar, it is easy to seed the random number generator-there is even a command specifically for it..

@lesfreresdelaquote1176 2 жыл бұрын

Hello, you should use: -Ofast, it is even faster and makes automatically use of GPU for vector computing. It works with clang of course...

@lashlarue59 2 жыл бұрын

Just for fun I ran those same tests on my M1 macmini vs. a Dell laptop with a 8 core I7 and the results were within a second of each other.

@benygh911 2 жыл бұрын

Alex my Bro, Technically You ARE CORRECT..! 😁✌ for MANY Years and years the Processor [or CPU] WAS *only 1 CORE,* so having many Cores IS in FACT having Many Processors {at a Tiny Physical Scale ON the Silicon) GreeTs and G0D Bless "U" ALL 😀👋✝

@avask17 2 жыл бұрын

the -march=ivybridge specifically optimizes for intel ivy bridge architecture. to let the compiler do at least some optimization for the M1 you’d better write -march=native, which, as the name implies, does optimize for the native architecture

@AZisk 2 жыл бұрын

yep, tried that, but got a complaint about march not being recognized

@avask17 2 жыл бұрын

@@AZisk hmm, okay, that’s kinda weird the m1 clang doesn’t support it, I definitely remember doing -march=native on x64 :) It looks like it’s gonna be added to clang 15.0 though

@SimplicityForGood 2 жыл бұрын

What idle like Atom, Sublime Text have shown the mostpromises to work well with the Macbook Pro Max?! Thanks for getting back to anyone that tested this and come to an understanding if one idle actually is a clear winner to use with these new MacBook pros!😎

@AZisk 2 жыл бұрын

i prefer VS Code as my editor generally. Unless i’m doing .net

@SimplicityForGood 2 жыл бұрын

@@AZisk have you tested Atom?

@AZisk 2 жыл бұрын

@@SimplicityForGood never have

@SimplicityForGood 2 жыл бұрын

@@AZisk alright, I really enjoy it! 😃 just got my MacBook Pro with fhe max chip! About to create a home office now 😎🤘🏻

@很好-h8e 2 жыл бұрын

I wish that one day Visual Studio on Windows would compatible with arm64 because the poor translation from x86 to arm would sometimes makes the fan of my MacBook Pro 13 kicks off

@bhushanladhe9816 2 жыл бұрын

New rider works perfectly for my .net core solution. I am using .net 6 on Macbook pro M1 Max

@很好-h8e 2 жыл бұрын

@@bhushanladhe9816 Actually, I am learning how to develop windows application, that's why I have to run visual studio on windows.

@platin2148 2 жыл бұрын

Btw. why are you using the old gcc alias? MacOS doesn’t use gcc anymore they use clang.

@pwhv 2 жыл бұрын

thank you so much man, you are one of the best doing this you won a subscriber and keep the great job

@mateosking 2 жыл бұрын

The B. in Benoît B. Mandelbrot stand for Benoît B. Mandelbrot. And inside that the B stands for Benoît B. Mandelbrot. And inside tha…

@AZisk 2 жыл бұрын

yess!

@MeinDeutschkurs 2 жыл бұрын

Alex, why is your BMD ATEM ISO so heavily flickering?

@richardyoung3036 2 жыл бұрын

To build arm64 version using SSE optimizations: Find on web a header named sse2neon.h, modify the file main.cpp of Mandelbrot to include sse2neon.h, and define _ _ SSE _ _

@ramialkaro 2 жыл бұрын

Wonderful and brilliant topic as always. Thanks

@nomasprime 2 жыл бұрын

Surely test runners would be one of the biggest general gains for more CPU cores?

@ahashem 2 жыл бұрын

What is the mic you are using in your videos?

@AZisk 2 жыл бұрын

sennheiser mkh50

@mohmedezzet5217 2 жыл бұрын

Thank you for this amazing effort but I have a question i need to buy a laptop for deep learning and neural networks cnn and image processing tasks is macbook pro m1or windows with rtx 3060/3070/3080 is faster

@AZisk 2 жыл бұрын

just did a video on this recently. rtx for now

@AliAhmed-dh3bp 2 жыл бұрын

Nice audio (mic) quality

@kbaeve 2 жыл бұрын

Is there any real practical reasons for using these new M1s? I mean if you gotta get down at this level to find some performance boost it looks kinda, hm, waaaay over the top for any normal to mid-range nerd users

@kbaeve 2 жыл бұрын

Too me 2019 MacBook pros just seem like the best bang for any buck out there. Tons getting sold for cheap and they are still super good computers for years for normal users no?

@JP-js8jr 2 жыл бұрын

low power consumption, no fan noise, no heat

@86400SecondsToLive 2 жыл бұрын

My current machine (rtx 2080 ti & ryzen 5600x) eats electricity worth 60€ each month. I only do statistics, stochastic simulation and play wow. If the performance is remotely close, saving 50-54€ each month might be worth the switch. Well, might substract a few bucks, because I'll have to actually turn on the heater during winter months ;-)

@kevinxin1545 2 жыл бұрын

Hey Alex, just curious what camera gear ur using. Are you shooting in 4k or 1080p? Thanks!

@AZisk 2 жыл бұрын

i’m shooting mostly 1080, but upscale to 4k during production

@JustinHiggins 2 жыл бұрын

FWIW I ran the Mandelbrot test on an m1 mac mini running Asahi linux with 100000 as the parameter and it took 25 seconds compared to 42 seconds as per this video. Not as fast as the Mac Studio, but 40% faster than the same M1 processor running MacOS. I tried both g++ and clang+ and got the same result. Not sure why it's so much faster, I might be missing something.

@seanogorman6940 2 жыл бұрын

awesome

@synen 2 жыл бұрын

As more compilers get updated to fully utilize the cores of the Pro Max and Ultra the more we will benefit from the new architecture, sounds obvious but this is really new architecture, takes time.

@platin2148 2 жыл бұрын

I think arm is already pretty good optimized. The diff is here mostly the core count.

@fenrisler 2 жыл бұрын

What are you going to do with all this saved time? Run some tests!

@tuskig7017 2 жыл бұрын

Hi, I hope next time you can test the source compilation time of GO and RUST under M1 and M1 Ultra, they both make good use of multi-core.

@kenvererer 2 жыл бұрын

What's that 747 cockpit keyboard above the magic keyboard?

@AZisk 2 жыл бұрын

oh that’s for my 747 :) haha. no, it’s the ATEM mini extreme, which i use to live stream

@josejimenez7502 2 жыл бұрын

Holy smokes that is a huge difference in performance how would these tests stack up with Java ? I guess I need to test this out now lol amazing

@enitalp 2 жыл бұрын

Set the seed for the random number generator and you will be deterministic.

@elgreengroo 2 жыл бұрын

Did I notice fortran there? What about it?

@donpeleas4780 2 жыл бұрын

you can get the same performance with a PC half the price.

@Ericerikerich 2 жыл бұрын

Great stuff as usual, Alex! Could you try the following comparison? M1 Pro, Max and Ultra running a lengthy macro in Excel via Parallels? Would love to know how much multi-core workloads in Parallels would benefit from the extra hardware.

@ramialkaro 2 жыл бұрын

Hey Alex👋, Is that you alex who replay to people “claiming gifts 🎁 “ or someone else pretending to be you.

@AZisk 2 жыл бұрын

scammers

@Criteria12 2 жыл бұрын

too many comments on the code if I'm making code review for you

@KoenKooi 2 жыл бұрын

Kudos for saying sea-lang instead of klang :)

@AZisk 2 жыл бұрын

I've been guilty of saying it both ways before 😅

@andrewgrant788 2 жыл бұрын

I always pronounce it klang, I didn’t know there were people who pronounced it sea-lang. I am shocked.

@wokecults 2 жыл бұрын

I have a question. Did developers already figured out what the Neural Engine cores are for and if they could be used to train models? Thanks.

@harrytsang1501 2 жыл бұрын

Yes and No The neural engine cores are only exposed via Apple’s API which constraints the structure of the model to be static. It is good at inference at low power, but for training, GPU is still more flexible and powerful with a higher TDP

@wokecults 2 жыл бұрын

@@harrytsang1501 Thanks Harry. When you say that the APIs constraint the structure to be static, do you mean that the cores are only used to process models but not for training? And even if the Neural Engine cores are not fast as GPU, wouldn't be good to add them for training anyway? Currently I have tensorflow installed, but only works with CPU and GPU. If I were able to add the Neural Engine cores, training should go faster. Did anyone figure out if that's possible? Does CreateML use these cores?

@harrytsang1501 2 жыл бұрын

Even if you managed to do so, it is safe to say it will likely be slower using Neural Engine cores than not. First, it takes a specific CoreML format that is similar to TorchScript in the way that it has to be quantised, optimised and built for the architecture. This format is foreign to TensorFlow, ONNX or PyTorch. Converting the models is a separate step before being able to use the Neural Engine. Second, the inputs and outputs has to be very strictly defined. Whereas sometimes image processing models can take a scaling factor parameter with varied frame sizes which can be adjusted during convolution. This is simply not possible with the Neural Engine. It’s similar to the memory copying overhead of using GPU, but the gains are minuscule at best and you lose the flexibility of TensorFlow or ease of using Python. GPU has notable speedup over CPU for training because you can train the model with large batches of data using the many stream processing units. The same cannot be said about the Neural Engine since there’s so little transparency. Guessing from documentation, the CoreML format is optimised for the specific hardware architecture on stuff like memory locality and caching. Which makes it possible to be faster on inferring a single input/iteration, but that’s not exactly what you need for training.

@wokecults 2 жыл бұрын

@@harrytsang1501 Oh, now I get it. I didn't consider model conversion. Still, it looks like not even Apple uses these cores for training. I just checked Create ML specifications and they say "Train models blazingly fast right on your Mac while taking advantage of CPU and GPU.". It looks like only the CPU and GPU are used for training, and the Neural Engine cores were not designed for training at all.