No video

Is M1 Ultra enough for MACHINE LEARNING? vs RTX 3080ti

  Рет қаралды 87,809

Alex Ziskind

Alex Ziskind

Күн бұрын

Пікірлер: 308
@sundeepsingh4478
@sundeepsingh4478 2 жыл бұрын
*Reasons to go for M1* - Adobe suite softwares - Xcode - Power Consumption is important to you *Reasons to go for desktop PC* - 3D Modeling - GPU intensive workloads - Visual Studio (still issues with preview on M1) - Gaming - Future Upgradability - Maintenance (say if SSD/ Ram etc fails, can easily replace with new ourself)
@supervan5597
@supervan5597 Жыл бұрын
very unbiased. love it.
@obrien8228
@obrien8228 Жыл бұрын
Mac is more plug and play
@hglenn2k
@hglenn2k 9 ай бұрын
Yep I love macbooks in general but sadly I am married to Visual Studio for now
@Os_390
@Os_390 9 ай бұрын
> "- Xcode" Shouldn't it be in the "Reason to go for desktop PC" list? xD
@lolaa2200
@lolaa2200 5 ай бұрын
@@obrien8228 we are talking here about running specialized libraries as tensorflow or pytorch, so i don't see how that comment is relevant in this instance
@karthikeyanparasuraman9337
@karthikeyanparasuraman9337 2 жыл бұрын
That wsl reveal animation was so nice!! I wonder what the neural engines inside the chips do.... Nice video btw!
@myyoutubename152
@myyoutubename152 2 жыл бұрын
Based on the average power usage over time it would seem the total power used was quite similar. 300w x 5 mins versus 100w x 15 mins.
@redocious8741
@redocious8741 2 жыл бұрын
Ur comment is very biased towards the PC Its more like 400 w x 6 mins vs 110 w x 15
@TheMinigato
@TheMinigato 2 жыл бұрын
It’s not 300w. It’s 400w with 500w peaks
@dustsucker4704
@dustsucker4704 2 жыл бұрын
@@redocious8741 but time is money so 10 minitues less time to wait for the model makes a huge difference if you to it 10-20 times a day so it's not the Maschine I would want for these kinds of workloads even it it draws less power work time is mir valuble
@anupamkulkarni7
@anupamkulkarni7 2 жыл бұрын
This is an absolutely stupid comment. ML benchmarking is simply about performance. That's ALL that matters.
@youtube-username-placeholder
@youtube-username-placeholder 2 жыл бұрын
I agree we should focus on performance when it comes to ML benchmarking, but how is this comment stupid? OP was just sharing a simple observation.
@woolfel
@woolfel 2 жыл бұрын
thanks for the doing the benchmarks, but short tests with small datasets don't really tell you how well a system works for doing ML training. I've been doing benchmarks with Pascal and Coco datasets to get a more realistic measurement. The downside is it takes 2hours+ to run the benchmarks. If you run with different batch settings, doing a series of benchmarks ends up taking several days. The biggest limitation to training performance is memory. As long as your dataset fits in memory, RTX card will be much faster than m1 GPU. Keep in mind CudaNN is highly optimized and uses a lot more tricks than other hardware on the market. Google TPU and Tesla D1 are catching up to Nvidia hardware, but they are still behind on the software tooling side.
@randomnobody660
@randomnobody660 2 жыл бұрын
aww come on man, not going to share results? So does the rtx card still win by a similar margin when running larger data sets or more epochs or does that change and in which direction? Also if data set doesn't fit in vram how much penalty for swapping are we talking about here? Also also, and I'm just throwing this out here, under that circumstance does 3090 make up a bit of lost time with pcie5.0?
@amdenis
@amdenis Жыл бұрын
Yes, it depends on the test (e.g. AI training types, video post work, 3D modeling, etc), as well as the software being used. We use RTX through H100’s for most of our AI development- at least on the training side. However for coding, data sci work, inference and UI/UX we all use our favorite OS, whichever that is. One thing to keep in mind for pro level large parameter/data set AI dev, you will often be using a dedicated server running in the kilowatts with AI grade GPU’s (e.g. V100’S, H100’s, etc). Whether owned, hosted or otherwise, few jobs will be run locally.
@acasualviewer5861
@acasualviewer5861 9 ай бұрын
few jobs CAN run locally.. especially when you need 80GB for your model. That's why the 96GB M1 (or 128GB M3) is interesting. But only if they test in a RAM constrained environment. The NVidia 4090 only has up to 16GB.
@sulthoniashiddiiqi9834
@sulthoniashiddiiqi9834 2 жыл бұрын
Thanks, Alex. I really appreciate what you have done.
@cinetic81
@cinetic81 2 жыл бұрын
Yes please compare to other Macs! Also curious if there are other AI/ML tests that would show the improvement, vs the 3080ti
@erickflores785
@erickflores785 Жыл бұрын
THIS RIGHT HERE!!!
@thedownwardmachine
@thedownwardmachine 2 жыл бұрын
I wonder what the M1 machine learning cores are for, when they would be used, and if the benchmarks take advantage of them (and if it would even make sense to do so).
@NicolaMastrandrea
@NicolaMastrandrea 2 жыл бұрын
npu is used only in CoreML. Tensorflow right now is gpu only using metal.
@ben.scanlan
@ben.scanlan 2 жыл бұрын
Just to spy on you
@rohithmekala2608
@rohithmekala2608 2 жыл бұрын
@@ben.scanlan no not for spying but CoreML and Reslove 18 takes advantage of them.
@mqsv
@mqsv 2 жыл бұрын
The ML cores are for running models, not training. I read that since they're limited in floating point resolution (I forgot the #, but it could be that it only supports 16bit floats, whereas training requires 32 or 64 bit floats )
@josephchaaban8840
@josephchaaban8840 2 жыл бұрын
As far as I’m aware, if I’m recalling what my professor mentioned once correctly, machine learning cores are simply called that because they’re very specialized pieces of hardware that are able to perform a small matrix multiplication or something of the sort in one instruction. That’s about as much as I know, but I’d take a guess and say maybe the machine looks for such operations/cases when running on a dataset and if it finds them, uses said cores
@NicozStrat
@NicozStrat 2 жыл бұрын
Very cool comparison. Thanks Alex
@barmalini
@barmalini 2 жыл бұрын
Interestingly, each system have eventually spent roughly the same amount of energy on completion of the task. And this gives a lot to contemplate upon, like where do you put your priorities, do you want cooler and slower, or prefer hotter and louder, all is determined by the cost of a single computational operation.
@randomnobody660
@randomnobody660 2 жыл бұрын
We do know that gpu do get quite a bit more efficient at lower power at least to a point(thanks to crypto miners optimizing their margin mostly). Makes one wonder whether it's possible to under-volt/under-clock a 3080ti so much it becomes practically silent...or at least sounds like a propeller plane rather than a jet.
@patrickkhwela8824
@patrickkhwela8824 2 жыл бұрын
As a Zulu man I am really impressed with how you pronounced "Ubuntu".
@shaunpugh3287
@shaunpugh3287 Жыл бұрын
Really interesting videos and a totally different take on performance testing. Well done Alex.
@truthmatters7573
@truthmatters7573 2 жыл бұрын
I would like to see the m1 ultra compared to the m1 max and pro in a machine learning benchmark. I'm curious if the m1 ultra scales linearly for machine learning workloads.
@SimplicityForGood
@SimplicityForGood 2 жыл бұрын
What does scale linearly exactly mean? And what is the opposite in practical use? Computer science student here, am interested to learn, so hope you can teach me something, tanks
@ravenclawgamer6367
@ravenclawgamer6367 2 жыл бұрын
@@SimplicityForGood Scaling linearly means 2X the chip gives 2X performance. And that's not CS, just basic maths :)
@truthmatters7573
@truthmatters7573 2 жыл бұрын
@@SimplicityForGood so if the m1 chip has 8 GPU cores, the m1 pro doubles that to 16, the m1 max doubles that to 32, and the m1 ultra doubles that to 64, do you then also get double the performance for each of the steps? That would be linear scaling. If you double the cores, but only have 80% increase in performance, then it is below linear scaling. practically speaking, it means you get diminishing returns if you invest in extra resources but don't get the extra performance commensurate with your increased investment (whatever the investment is, money, or power for example). That may be a reason to stick with lower specs because they are better value for your investment.
@SimplicityForGood
@SimplicityForGood 2 жыл бұрын
@@ravenclawgamer6367 alright then!
@SimplicityForGood
@SimplicityForGood 2 жыл бұрын
@@truthmatters7573 got it, and so far what have you seen? Would it be worth purchasing a 64 gb ram, 32 gb gpu vram with 4 tb right now in a Macbook pro or in a mac studio or do you think one should have a different or lower configuration from the linear performance you seen in such tests you describe? I got a chance right now to buy one apple computer , or rather get it if I develop for a company as gesture of trust instead of a salary as my first job. I want to future proof myself with making the right choice. What do you think is a good configuration to order? The above configuration was the one I made for my order, but zi have not yet signed it… Or do you think is way too much power in a too heavy laptop and one would be much better off waiting for a more lightweight macbook Air M2 in May? Thanks for getting back! 😊
@n0stalgia1
@n0stalgia1 2 жыл бұрын
Just saw a review of what looks like a very similar PC to yours over at GamersNexus (pre-built Alienware Aurora RF-13, i9-12900KF, RTX 3090) - they mention very bad airflow design on that machine. Could be the reason why the temperatures on it are so high and the fans so loud.
@Watchandlearn91
@Watchandlearn91 2 жыл бұрын
Yeah so far the M1 Ultra has seemed to be a disappointment in GPU workflows compared to the M1 Max with relatively low performance gains for the amount of $ upcharge. CPU perf looks really good though.
@deilusi
@deilusi 2 жыл бұрын
still not good for everything. If you load float/matrix operations a lot, it will be slower than intel. TLDR is that this is single task box, only video edits, where its unparalleled and internet consumption, because m1 is designed for that. I am sure apple compared 3090 only in video cut/compile only, anything that fully utilizes 3090 and dont go through hardware codecs baked in m1, will be massively slower than 3090. Size and power does what they should. I guess to show that we can run python pymol and rcsb structure/6WM2 for example which is big enough I hope to fully load everything.
@jaksvlogs7195
@jaksvlogs7195 2 жыл бұрын
@@deilusi what if i do data analysis with pandas/scipy etc?
@deilusi
@deilusi 2 жыл бұрын
@@jaksvlogs7195 will be different on what you do, but in general if you work on float lists/matrix's numbers mostly, then intel will be better for money. Same general performance as 8000$ apple build costs ~4000$ for intel and ~3000$ for amd. Right now there is one golden rule, If you care about performance, dont pick old stuff. All 3 parties made big leaps in performance literally NOW, dont even look at last year stuff, not talking about older ones, differences are 15-20% each year quite literally. It really is 3 dimensional problems of budget/expectations/time, pick 2. I deal with laptops & cloud mostly and there amd + ubuntu wins by a mile, as intel is to power hungry for working on battery, and I need VM's which makes apple throw the towel. I would start with my budget and make a list of what I CAN get from apple/intel based/amd based. Second part is what you need, if you have massive data sets to process or something, make a list of what I need/what I want. Apple limits you to 8TB, while others can easly give you 80TB if needed. I would recommend you start with "pcpartpicker guide" for basic build after that and adjust to your needs. If you want to invest time into making purpose build box, that give you best value for invested cash. I recommend amd for smallest laptops, amd+nvidia for stronger ones, intel for full desktop, and apple only if you already have iphone, watch and others, as it's synergize well, and you dont have issues with your budget.
@philippeastier7657
@philippeastier7657 2 жыл бұрын
That is a matter of software optimisation. Apps have to take into account the existence of more cores, that is not always easy in algorithms. RAW cumulated performance is not that easy to take advantage of.
@davidesquer1312
@davidesquer1312 2 жыл бұрын
@@jaksvlogs7195 you would need to use cuDF instead of pandas and CuPy instead of scipy and numpy, native pandas/scipy/numpy don't utilize the gpu you are running them from the cpu.
@FlorinArjocu
@FlorinArjocu 2 жыл бұрын
Would love to see how a pure linux system does in this test.
@julianrozentur3046
@julianrozentur3046 2 жыл бұрын
I think it mostly demonstrates that tensorflow is not yet fully optimized for metal. I am sure Apple used a more low level benchmark. But nevertheless it is a useful learning from practical standpoint. Actually 3x slower with 1/3 of power requirement is not too bad. Also it would be interesting to learn if 128 gb of unified memory makes training more efficient since it can do it in larger batches than a typical gpu card can, since it’s memory is 8 gb(4080ti has 20). Unsure what this benchmark does in terms of batch size
@kostian8354
@kostian8354 Жыл бұрын
3080ti has 12GB, and has 900+ GB/sec of memory throughput vs 800 of m1 ultra. I bet on large models m1 ultra could be quite fast
@kostian8354
@kostian8354 Жыл бұрын
@@dotinsideacircle Any where it doesn't fit in memory on video card
@toleoring
@toleoring 2 жыл бұрын
Great video! Apple should give us a "high power" mode for M1 Ultra. This ideally would allow higher wattage for the GPU to match the 3x value of the RTX chip to reach higher performance. Currently, the monster copper cooling system is largely wasted.
@vinyl1372
@vinyl1372 2 жыл бұрын
They would need a different form factor
@noeldillabough
@noeldillabough 2 жыл бұрын
I believe last time I looked WSL2 is 94% as fast as a native box, well worth using and so convenient.
@AZisk
@AZisk 2 жыл бұрын
didn’t know that - thanks!
@synen
@synen 2 жыл бұрын
Great as always Alex.
@darcsentor
@darcsentor 2 жыл бұрын
Keep up the great work, this series is super interesting. Seriously tempted by the M1 studio.
@metallicafan5447
@metallicafan5447 2 жыл бұрын
The high no. Of cuda cores on the 3080ti help a lot. Almost the same number as rtx 3090. Although rtx3090 will probably be even better since it can store much bigger models in the massive 24GB GDDR6X VRAM.
@benjaminlynch9958
@benjaminlynch9958 2 жыл бұрын
You know that the Mac Studio can be configured with > 100GB of unified memory that the GPU has direct access to, right?
@metallicafan5447
@metallicafan5447 2 жыл бұрын
@@benjaminlynch9958 you might want to watch the video again. Mac studio was literally dragged through the mud here by an rtx 3080ti. That's a consumer card. In short, unified memory sounds great on paper but very few real world cases actually benefit from it. Granted for things like web development, ios development and android development mac studio would be a great tool to have under youe belt. But CAD, machine learning, gaming, etc are things where the rtx cards shine The cpu part of m1 family is damn impressive. GPU has been a bit of a disappointment Also remember one thing - the Alienware the Alex used is probably the worst rtx 3080ti build you can get. Custom builds have much better acoustics and thermals.
@RRKS_TF
@RRKS_TF 2 жыл бұрын
@@metallicafan5447 unified memory has its purpose such as enabling rapid communication between the CPU and GPU for tasks that have highly parallel components but also have sequential parts. A unified memory buffer would be better though so the GPU can have its own raw memory rather than managed memory which and reduce latency and the unified buffer could allow for tasks that swap between the GPU and CPU to be much faster
@gerryakbar
@gerryakbar Жыл бұрын
@@metallicafan5447 also, unified memory probably good for VRAM cost cutting hence you can get bigger VRAM with lesser price than using Tesla or stacking RTXs machine learning training is a VRAM hog task, hence Apple Silicon approach probably works in the future
@Khari99
@Khari99 2 жыл бұрын
This is very confusing. On your previous test, the M1 Max with 32 cores completed in 11 minutes. I get that the scaling isn't linear but the higher end Ultra with 48 cores completing in 14 minutes sounds like something is off. Why would it be slower than the M1 Max?
@AZisk
@AZisk 2 жыл бұрын
i don’t know, but that’s what i’ve got
@AZisk
@AZisk 2 жыл бұрын
it’s disappointing
@Khari99
@Khari99 2 жыл бұрын
@@AZisk hmm. My intuition wants to point to a flaw in how Mac OS is handling the computation on the other half of the M1 ultra. It feels like only half of the cores are being used. The 48 core is split by two 24 core processors. So 14 minutes would make sense if the performance was for some reason capped at using at 24 cores because this would perform proportionally to the 32 core gpu. If it’s not the software then it could be an even worse issue regarding the interconnected dye tech in their processors not working on a hardware level. Every time I see people compare the 64 core Ultra to the 32 core max, I see identical performance. This would also fit the theory that only half of the cores are actually being utilized. What are your thoughts? Would it be possible to show that all cpu/gpu cores are being used but the software can only handle half of the processing?
@Khari99
@Khari99 2 жыл бұрын
@@AZisk I’ve had issues with software on windows in the past with machines that had dual socket CPUs. I needed software updates in order to use both CPUs in parallel. I reached out to the devs about this and they explained that trying to make an update like this is very complicated. Something like tensorflow should be accessing the hardware directly. So I have no idea why these results would happen unless there was an issue with how mac os accesses the dual cpu.
@ravenclawgamer6367
@ravenclawgamer6367 2 жыл бұрын
@@Khari99 I suppose that the latency is much higher across the "UltraFusion" and as the die is basically split in two, that's actually ending up harming the performance where latency is a big deal. In the PC, despite much lower bandwidth, the CPU is at one place and GPU at another, meaning data isn't traveling between (two) GPUs.
@missycalimba
@missycalimba 2 жыл бұрын
M1 Ultra has 64 gb of unified memory while RTX 3080ti only has 16gb of vram. Does this mean training some large models is only possible on the m1 Ultra?
@supervan5597
@supervan5597 Жыл бұрын
You are correct
@Pekz00r
@Pekz00r 2 жыл бұрын
Is this utilising the Neural Engine/ML cores of the M1 Ultra or not? If not, how can you run this kind of workloads on those cores?
@dysonbros
@dysonbros 2 жыл бұрын
Thanks, this is what I was looking for!
@tordjarv3802
@tordjarv3802 2 жыл бұрын
In this particular case the performance per Watt seem to be the same, you get roughly 3 times faster for roughly 3 times the power.
@johnsausage
@johnsausage 2 жыл бұрын
I've been watching some other videos checking out the studio M1 performance and I must say it is very impressive, given the power consumption, thermals, noise and size (but not upgradeability). BUT you can also look at the size like it's a bad thing. I've always asked myself why my friend hat a big tower case in his music studio, even though he didn't even need all expansion slots and what not. Then it hit me: Imagine being a burglar, stealing stuff.. would you really grab that big tower or would you rather grab a notebook or a small Mac studio? :-P Sure, you can put the computer in safe(r) storage room, but the same you could do with a bigger case. In these cases I genuinely think that having a big tower is better and lowers the risk of it being stolen (given the burglar doesn't have excessive knowledge on prices of hardware). Of course you have insurance for such cases but imagine the downtime you're having when the computer is missing..
@keithmooore
@keithmooore 2 жыл бұрын
He probably got that big tower in order to be able to use good cooling solution that no noise. Usually those coolers are pretty damn large.
@Buqammaz
@Buqammaz 8 ай бұрын
We need same tests with the M3 Max 👏
@darrenhankner5282
@darrenhankner5282 2 жыл бұрын
while The Ultra version of the Mac Studio falls FLAT in performance on the new Davinci and also the latest Final Cut Pro. Seems like the M1 Max is the best bang for the buck.
@jroemling
@jroemling 2 жыл бұрын
The benchmark should probably be optimized to make use of the neural engine of the M1. When Apple compared the M1 Ulltra against the RTX 3090 the message was that at a certain point in the performance/power curve the M1 Ultra takes 200W less than the 3090 while having the same performance. What they comveniently didn’t say is that the 3090 has a lot more headroom and can draw even more power and be a lot faster than the M1 Ultra.
@timurzaynutdinov3445
@timurzaynutdinov3445 Жыл бұрын
Power consumption is also 4 times higher, isn't it? Pytorch (Metal) and Neural Engine tests are wanted.
@craigasketch
@craigasketch 2 жыл бұрын
I really feel the cooling solution is overkill for the mac studio's. I wonder if it's to give it headroom for the next iteration. Every test you've done it's never seemed to pull the voltage it could use. Just should mention a 3090ti is the same cost as a m1 max. It's just no contest.
@Winkoo6
@Winkoo6 2 жыл бұрын
You are probably right with the next iteration, the 370W power supply also not reasonable here, M1 Max limited to use less than 100W, so even if you have 2 M1 Max here, that still not need more than 200W power supply.
@RRKS_TF
@RRKS_TF 2 жыл бұрын
In terms of energy efficiency they would be comparable assuming performance scales linearly, otherwise the 3080ti would likely win, thus the 3090 would be even better
@suvalaki
@suvalaki 2 жыл бұрын
I dont get it. Why not just build a linux nvidia box? you can get way more RAM and have as many GPUs as you like? Alternatively spin up some GPU containers on the cloud. No one serious is making models on one of these things...
@edmondhung6097
@edmondhung6097 2 жыл бұрын
Does the possible 128GB RAM on GPU help on those large model
@dinoscheidt
@dinoscheidt 2 жыл бұрын
Well per Watts thats pretty good. However, the real interesting thing for ML on the M1 Ultra is the huge VRAM. Nvidia charges a hefty premium to be able to hold large ML models in memory. The speed difference (no memory paging, less batching) might cancel out since IO is the slow part of ML. The tooling on none-Nvidia GPUs isn’t great though - and will likely stay that way
@davidesquer1312
@davidesquer1312 2 жыл бұрын
the difference is i've run that test on an a6000 and it takes an average of 17 seconds. he's comparing a gaming gpu (rtx3080ti) over a workstation gpu like the mac ultra, if you put the mac against an a6000 is way behind.
@droweedryan
@droweedryan 2 жыл бұрын
this just solidifies that the studio is a powerhouse for certain workflows that take advantage of built in cpu functionalities or any cpu operations in general. that is undeniable and the very low powerdraw is insane! but any heavy lifting on the gpu side it just fails to match up, the audacity to compare it to 3090 is just hilarious, maybe next iteration they will come up and just not falsely do those claim. still keeping my m1 air for mac/ios specific build operations until then
@hasanabs
@hasanabs 2 жыл бұрын
Nice, this is what I am looking for since M1 was comming
@RobertMcGovernTarasis
@RobertMcGovernTarasis 2 жыл бұрын
Don’t think you’ve done a test I haven’t appreciated yet.
@TheMarcosVerissimo
@TheMarcosVerissimo 2 жыл бұрын
Great video, thanks a lot for it. I'd love to know the price of each machine, or at least the full configuration of each of them. Mind sharing that with us?
@y2an
@y2an 2 жыл бұрын
The M1 Ultra has 8192 execution core versus 10240 cuda cores in the RTX3080Ti (are they equivalent, architecturally?) so should we just expect the 3080Ti to be faster? The margin is x3 admittedly so the gain is not in proportion to the number of cores. The power consumption for the Ultra was a third or less compared with the system with the RTX, which matters financially and in a green world. How much did the system with the RTX cost? The card alone is $1200 on Best Buy. The Ultra is $5k in total.
@rapidfan92
@rapidfan92 2 жыл бұрын
If the RTX draws 3 times the power but finishes 3 times faster then the Mac will not make earth greener ;) Regarding your question about architecture: they both have a completely different architecture, therefore the number of exec. units is not representative. Actually, since Apple has a more modern manufacturing process (5nm TSMC vs. 8nm Samsung on Nvidia), and far more transistors, I assumed the numbers of apple at the presentation regarding GPU performance to be realistic. But what they delivered is nothing more than a disappointment (if you are no video editor). In other fields like Blender, the performance difference is even higher, far higher (Nvidia RTX 3090 is about 10x faster rendering in blender than m1 ultra)
@rexx2198
@rexx2198 2 жыл бұрын
great test~
@dr.mikeybee
@dr.mikeybee 2 жыл бұрын
What's surprising is that the Ultra did that well. The 3080Ti is a beast. I'd be interested in seeing this on an M1 Mac Mini.
@daveh6356
@daveh6356 2 жыл бұрын
Having watched loads of Mac performance comparison videos I have one key takeaway; we're completely locked into a discrete architecture mindset. Apple says "Apple Silicon" the tech community hears "CPU" OR "GPU". Why not CPU AND GPU AND media encoders (AND ANE AND AMX)? Apple Silicon won't shine until we take the specs off (excuse the pun) and start looking at multi-processor/APU benchmarks which engage the silicon in the same way real world apps do. Are there no AI benchmarks which use all available silicon simultaneously? The only "APU/UMA" benchmark I can find is for 2D image manipulation - Affinity Photo benchmark 11021 Combined score, this outperforms PC workstations by 3-4x. Alex - you're technical, why not show powermetrics or asitop instead of a wall meter?
@michel8847
@michel8847 2 жыл бұрын
Because limitations apply to these special compute blocks. For example, the ANE works really well for inference (using AI models) but not for training, and even so the insane inference performance gains with the ANE only apply to convolutional and ReLU layers with FP16. This makes the ANE essentially useless in AI training.
@daveh6356
@daveh6356 2 жыл бұрын
@@michel8847 you're still thinking discretely. With UMA it doesn't matter that the ANE isn't best at all tasks as other silicon (like AMX/SIMD or GPU) can pick up the compute functions (initialisation overhead permitting). Sadly benchmarks and many applications are wedded to this architectural view - that was my point.
@ilovepickles7427
@ilovepickles7427 2 жыл бұрын
Nice video. Thanks. It took almost as long as the wait for my new Macbook lol 😢
@boi64pr60
@boi64pr60 2 жыл бұрын
Wow, and I thought I would be the other way around, thanks again
@david_hsu
@david_hsu 2 жыл бұрын
I'm curious if this is making use of the Apple Neural Engine, or only the GPU?
@yarnosh
@yarnosh 2 жыл бұрын
Sorry, I don't know much about Machine Learning, but isn't this what the Neural Engine is for?
@totalermist
@totalermist 2 жыл бұрын
The ANE is for inference (i.e. running pretrained models), while this test measures _training_ performance, for which the ANE is basically useless. Makes a lot of sense, too, since running ML models is *much* more common than training them.
@dakiles
@dakiles 2 жыл бұрын
Alex! Such a great video!! Could you please make a video setting up the Alienware for Linux environment? Thank you so much for reading this message
@AZisk
@AZisk 2 жыл бұрын
thanks. i tried for many hours and gave up. afraid the alienware machine i have just doesn’t let you install linux
@Myektaie
@Myektaie 2 жыл бұрын
Thanks 😊
@Markste-in
@Markste-in 2 жыл бұрын
If pytorch would just release GPU support for M1... that would be great. Tensorflow is such a clunky and awkward framework
@AZisk
@AZisk 2 жыл бұрын
waiting for that
@fakezpred
@fakezpred 2 жыл бұрын
PyTorch currently only supports CUDA and ROCM (experimentally) since metal does seem to be a pain in the ass to work with.
@lalpremi
@lalpremi Жыл бұрын
thanks 🙂
@atimney
@atimney 2 жыл бұрын
That graphics card at times was pulling 5x the power! I'd actually go for the mac for development purely for the power draw and quietness, minus the cricket noise! nice vid
@davidesquer1312
@davidesquer1312 2 жыл бұрын
if you develop ml often you would never take that mac over a dedicated workstation gpu, i work with an nvidia a6000 and often run trainings for over 10 hours which would take weeks if not months on a m1 ultra
@raramra9267
@raramra9267 2 жыл бұрын
I expected your experiment, and I was right. But I am curious of doing ONLY Inference case. Training requires heavy throughput so also requires many gpu cores, but inference do not need opus as much as training.
@hornyj1
@hornyj1 Жыл бұрын
There is a reason why training of large NN models is done on clusters of GPUs (from NVidia) and/or specialized TPUs. You can basically have 3 times faster solution for half the price if you go PC with decent NVidia card. Note RTX 3080Ti is similar to RTX 4070 in CUDA performance (important for NN training) and you don't need super high end processor for these kind of loads too... All Macs are crazy overprized for what they can do and the limitations of software you can run on them is deal-breaking for me. And yes, I had MacBook Pro and Air and I enjoyed what Apple does well, but long term, it was not good enough. :( What would you expect from company that cannot make a decent mouse after all?! 😀 (take it easy, it's just a joke😇 )
@dholzric1
@dholzric1 2 ай бұрын
I wonder how a cheap P40 gpu would do. I think it would probably also beat the mac. For inference, mine seems to be a bit slower than half the speed of a 3090.
@davidtindell950
@davidtindell950 2 жыл бұрын
please do a vid on External NVIDIA GPUs Compatible With M1 Macs [ via Thunderbolt Connection ]. Thank You ! ...
@kanuckadisk
@kanuckadisk 2 жыл бұрын
Question though: if you're running at scale where your power draw and cooling are important, the mac comes out ahead on that metric, if you have additional GPUs in the system does that drop the PC's normalised power consumption below that of the mac?
@yarnosh
@yarnosh 2 жыл бұрын
117W * 15min = 1755Wmin vs. 450W * 6min = 2700Wmin. I think people who do any real machine learning are going to go for the speed. And consider that the 3080Ti is only like $1400. I don't think the M1 Ultra can compete in this space. Better power efficiency doesn't go very far in a desktop.
@oscarsmith3942
@oscarsmith3942 2 жыл бұрын
If you care about power efficiency, you could under-clock and under-volt the 3080ti. You could probably get it to roughly 50% power at 75% performance which would still crush the M1.
@aravindpallippara1577
@aravindpallippara1577 2 жыл бұрын
Yeah what the other folks said, nvidia runs their gpus really really hot - also power to performance is usually on the amd's radeon series side mostly, but amd's software support is worse than nvidia's for machine learning (ROCM vs CUDA)
@darthrochon
@darthrochon 2 жыл бұрын
great video as always. Can you find some coding benchmark that hits the neural engine specifically ? maybe something in swift ?
@yannis54400
@yannis54400 2 жыл бұрын
Currently you can use the neural network only for doing inference
@FeyR1908
@FeyR1908 2 жыл бұрын
Please compare it with the 64 core ultra.
@avocado9227
@avocado9227 2 жыл бұрын
Thanks for your previous reviews. Why did it take you this long to do this test?
@xt8382
@xt8382 2 жыл бұрын
That noise is kind of scary from Mac Studio -- I just placed an order for the Ultra chip.
@mmbln
@mmbln Жыл бұрын
I think the performance is still pretty good compared with the power consumption. Apple ~120 W vs 430 W even if the run takes 3 times longer, I guess you still save some energy, right? anyhow, both systems are incredible regarding the computing power. Nice test + nice video, thanks for that
@user-jw3zx1tf6x
@user-jw3zx1tf6x 11 ай бұрын
You would spend same amount of energy. While M1 required 3x less power, training time was 3x longer, so at the end you would spend same amount of energy. Both system will cost you the same in energy, but with RTX you save on your time since you get results faster.
@dreamphoenix
@dreamphoenix Жыл бұрын
Thank you.
@kyryloyemets7022
@kyryloyemets7022 2 жыл бұрын
Would be cool to see a comparison with other macs and would be cool to make ml tests when Pytorch will be ported to m1 GPU
@platypusfeathers
@platypusfeathers 2 жыл бұрын
Machine learning content for Mac ecosystem videos!!! Please :-)
@essamal-mansouri2689
@essamal-mansouri2689 2 жыл бұрын
That cricket in your m1 ultra is hurting my brain
@schoolboogycorporation2062
@schoolboogycorporation2062 2 жыл бұрын
they were comparing at the same watts
@ichallengemydog
@ichallengemydog 2 жыл бұрын
In my experience I can't find any performance penalties running deep learning tasks in WSL, especially if you copy all the runtime files to the linux file system. Only when I run very heavy tasks that span several hours, WSL is a few minutes slower than native Ubuntu. I only ever boot into Ubuntu if the task I'm running will eat up ALL of the RAM. Otherwise, Windows is much easier to live with day-to-day.
@ravenclawgamer6367
@ravenclawgamer6367 2 жыл бұрын
I waited so long for this video that the universe was destroyed and built 3 times.
@moow950
@moow950 2 жыл бұрын
I am curious what the results will be on the upcoming new Mac Pro
@amirhosseinmousavi7112
@amirhosseinmousavi7112 2 жыл бұрын
Mac Studio's performance was impressive, considering its power draw. Though, it's probably even more impressive on the lower-wattage chip inside the 16" MacBook Pro.
@rapidfan92
@rapidfan92 2 жыл бұрын
It is not impressive at all, if you ask me, at least for ML tasks. 4 times the energy draw for 3 times the performance is a clear win for Nvidia, since performance/watt is no linear function. I would not be surprised if the RTX at 100 Watts would still perform better than the Mac on Tensorflow
@UltraFighter18
@UltraFighter18 2 жыл бұрын
@@rapidfan92 Yeah in general GPU tasks that don't use any special codecs or hardware to accelerate workloads RTX and RX GPUs are still faster.
@huangyongiscool
@huangyongiscool 2 жыл бұрын
thank you, now i save some money on M1 ultra.. and probably go all linux setup
@varnamq3
@varnamq3 Жыл бұрын
what kit do you use for your audio - very Radio sounding - professional sounding
@rachitbhatt40000
@rachitbhatt40000 8 ай бұрын
Please compare RTX 4090 Laptop with M3.
@sumukh007
@sumukh007 2 жыл бұрын
Compare it to the other mac's pls!!!!
@psionski
@psionski 2 жыл бұрын
In the comparison Apple was showing, they crippled the 3090 by power-limiting it. Note in their chart how the power usage doesn’t go up to the full 500 watts. Obviously if you remove the power limit it’s going to blow the M1 out of the water.
@welsh1lad
@welsh1lad 2 жыл бұрын
That was a shot at apple by Nvidia , but what was the price comparison too performance ?
@darrenhankner5282
@darrenhankner5282 2 жыл бұрын
I'm getting an M1 MAX Mac Studio. Just for the low power draw. I'm moving to Berlin Germany to attend university for Nursing and electric rates are more than twice as high as the USA. I believe in saving money over performance.
@totalermist
@totalermist 2 жыл бұрын
Unless you plan on running your system on full tilt for several hours a day, the difference in power draw is going to be insignificant.
@thecheaperthebetter4477
@thecheaperthebetter4477 2 жыл бұрын
btw if you look at gamersnexus review of those dell alienware machines, they throttle (bad airflow), so the windows machine was handicapped... useful to know
@AZisk
@AZisk 2 жыл бұрын
yep, seen that. pretty bad
@Xankill3r
@Xankill3r 2 жыл бұрын
Have you seen the latest Gamers Nexus video reviewing the new Alienware PC? The one they reviewed has a 3090 in place of the 3080 Ti you have but the review overall is extremely negative. The cooling in that case is really terrible. As such I am not sure you are getting your money's worth out of yours either (same CPU in both - 12900KF, same liquid cooling solution with a measly 120mm fan too probably). I'd recommend selling it off at the first opportunity you get and buying yourself a better assembled machine from anyone but Alienware. From all the videos I've seen reviewing pre-built machines I think custom shops like Maingear are better than the bigger name vendors like Dell, Alienware, Asus, etc.
@AZisk
@AZisk 2 жыл бұрын
yes, i haven’t had great results with that machine. and it’s not a cheap one.
@McBobX
@McBobX Жыл бұрын
Hey, may I know which mic you use? Much appreciated :D
@YiuMingLai
@YiuMingLai 2 жыл бұрын
Doing machine learning on machine for 5 minutes hardy conclude anything if I can retrained a 16GB BERT model to do something useful.
@timurzaynutdinov3445
@timurzaynutdinov3445 Жыл бұрын
Power consumption is also 4 times higher, isn't it?
@RealYethal
@RealYethal 2 жыл бұрын
RPCS3 recently got native macOS version, can you test the performance on M1 Ultra?
@edharris6452
@edharris6452 2 жыл бұрын
Would there be any benefit in comparing both GPU times to the same routine run natively on the neural engines?
@beefsandwish
@beefsandwish 10 ай бұрын
you should redo the test with M3 Max or M3 Ultra
@realarteezy
@realarteezy 2 жыл бұрын
would be interesting to compare the results when normalised by power draw
@chudchadanstud
@chudchadanstud 2 жыл бұрын
I don't think it matters lol. The cost to your electric bill isn't being offset by the cost difference between the m1 ultra and an "equivalent" build.
@aravindpallippara1577
@aravindpallippara1577 2 жыл бұрын
if the rtx3080ti is undervolted, pretty sure that will beat m1ultra as well - nvidia runs their cards really really hot (in search of the performance crown of course)
@SagarHingalAI
@SagarHingalAI Жыл бұрын
I think the issue here is python is not optimizied for m1 platform, and given python itself is a language which is not that good in terms of performance, I am not surprised M1 performed so poorly. Also I think those neural cores are not getting used for such tasks which should be used, but obviously thats an Apple hardware so not gonna happen for open source community.
@willdrunkenstein5367
@willdrunkenstein5367 5 ай бұрын
How much unified memory did you have on your machine?
@ajcroteau0928
@ajcroteau0928 2 жыл бұрын
Does that mean if he ran it on a Mac Studio base model… would it have taken 30 minutes? Will most likely rerun this test on my Mac Studio myself…
@christopherprobst-ranly6357
@christopherprobst-ranly6357 9 ай бұрын
The M3 Max is now even better than M1 Ultra. And to be clear, yes, CUDA has the edge right now. But ... factor 2x, 3x or 4x slower is totally fine, considering the mobility factor. Sure, a 10 kilo workstation should be better but we are finally talking 2x, 3x. not ... 10x or 20x.. ;-).
@prabhavkaula9697
@prabhavkaula9697 2 жыл бұрын
Could you make a PyTorch test for both the systems
@georgioszampoukis1966
@georgioszampoukis1966 2 жыл бұрын
M1 Ultra actually performs impressively well here. Also, keep in mind that the 3080ti "only" has 12GB of Vram while the M1 Ultra GPU can theoretically utilize up to 128GB of memory, which is insane. In many deep learning models you can easily fill up 8-10 gigs of memory when running larger batch sizes or higher resolution inputs, so, if you can actually utilize all 128GB of memory, the M1 Ultra offers insane value.
@teemuvesala9575
@teemuvesala9575 2 жыл бұрын
rtx 4000 series on tsmc 4nm will blow a hole on this apple hype
@TheAndymuns
@TheAndymuns 11 ай бұрын
do you know the Apple silicon NeuroEngine cores on the are design to do machine learning, you use these cores as well as the gpu cores
@Originalimoc
@Originalimoc 2 жыл бұрын
I see no performance difference under WSL2 or native, sometimes WSL2 may be even faster👀 WSL is not either an emulation or how solution like VMware virtualization works.
@metallicafan5447
@metallicafan5447 2 жыл бұрын
Can you do wsl2 vs virtualbox vs parallels to check how linux vms perform on windows vs mac?
@BeaglefreilaufKalkar
@BeaglefreilaufKalkar 2 жыл бұрын
Why is the GPU rated in machine learning and not the Neural Engine? Isnt that transcoding Video on CPU in stead of the Media Engines?
@aliyuabba4575
@aliyuabba4575 2 жыл бұрын
Maybe that weird sound is coil whine on the gpu
@phoenyfeifei
@phoenyfeifei 2 жыл бұрын
TBH, M1 is only good for video editing thanks for its built-in H264 H265 prores encoder and decoder, anything else suck However, M1 Macbooks have other features that differentiate them from windows PCs, for example the mini LED screen, great speakers and studio-quality mic
INSANE Machine Learning on Neural Engine | M2 Pro/Max
15:58
Alex Ziskind
Рет қаралды 182 М.
Is RTX a Total Waste of Money?? - Can we even tell when it's on?
15:10
Linus Tech Tips
Рет қаралды 3,9 МЛН
هذه الحلوى قد تقتلني 😱🍬
00:22
Cool Tool SHORTS Arabic
Рет қаралды 46 МЛН
小丑把天使丢游泳池里#short #angel #clown
00:15
Super Beauty team
Рет қаралды 47 МЛН
Mac Studio for developers: is it worth it? (analysis)
28:06
Filip Hráček
Рет қаралды 28 М.
World's 1st Coding Monitor
11:10
Alex Ziskind
Рет қаралды 66 М.
APPLE GPU vs NVIDIA vs AMD Comparison
13:19
Tech Notice
Рет қаралды 29 М.
M1 Ultra for Unity | vs RTX3080
12:41
Alex Ziskind
Рет қаралды 76 М.
M2 Ultra Mac Studio - Why Professionals Need This
9:44
ZY Cheng
Рет қаралды 32 М.
New GPU-Acceleration for PyTorch on M1 Macs! + using with BERT
19:00
NVIDIA REFUSED To Send Us This - NVIDIA A100
23:46
Linus Tech Tips
Рет қаралды 10 МЛН
$5,000 Mac Studio vs $6,000 PC  - NOT what I expected!
13:56
ZONEofTECH
Рет қаралды 436 М.