Mac Mini M4 takes on M3 Pro, AMD 6700XT, and 3080Ti! LLM Ollama generating side by side

Рет қаралды 21,926

Күн бұрын

Пікірлер: 63

@NomNomBasti 2 ай бұрын

20 tokens is pretty solid. It's faster than I could read the generated text, very power efficient, and a whole machine in a package. I'm so put off by my current PC, I might just buy the base model and switch later towards the Mac Studio.

@wealleuropean 28 күн бұрын

yeah, also work with my intell macbook, so that keyboard / track pad all set, realy fascinating cheap n powerful

@inwedavid6919 24 күн бұрын

Yes, that mean you have quite a lot of money to spend while a new graphic card for the PC you can have a 4000 or 5000 series for cheaper than the mini

@NomNomBasti 24 күн бұрын

@@inwedavid6919 Not sure what you mean? Base price Mac Mini is a very cheap overall package, and very energy efficient. Anyway, all of this is an enthusiast market, so you go big or go home. 😎

@peter.dolkens 12 күн бұрын

@@inwedavid6919you could buy 4 Mac minis for the price of a single 5090 🤣

@kristoffampong8774 11 күн бұрын

@@inwedavid6919 you also have to buy everything. Mobo, Processor, RAM, Drive, PSU, Fan, Case. So that would be an easy 1.5k - 2k USD.

@OlegShulyakov 2 ай бұрын

Amd RX 7800 XT - 57 tokens/sec MacBook M1 - 11,7 tokens/sec

@vosyasabesquien 2 ай бұрын

regarding apple silicon, this specific type of task runs much faster on a Max chip vs a 'standard' or 'Pro' chip, mainly because of the memory bandwidth. you also need a ton of RAM (minimum 32gb) because GPUs uses shared memory in apple SoC.

@AtharvaMaskar-k6t 17 күн бұрын

Heyy I am an ML Engineer planning to buy the Mac Mini M4. I usually rent GPUs for finetuning LLMs and my work on my local is usually building RAGs, maybe finetuning SLMs or BERT models, preparing datasets etc along with other casual usage. Should I get one and if yes is the base model enough or should I consider the 24 GB and the 32 GB variants? I could find little to no stuff online so link to resources or benchmarks would be highly appreciated.

@tech-practice9805 16 күн бұрын

I think bigger RAM is usually capable of finetuging bigger models and faster when finetuning

@dilip.rajkumar 2 ай бұрын

Great video. Could you also describe in a table which llama model along with model size can run on what M4 or M4 Pro machine along with RAM configuration? Like for eg: can we run the 70B Nemotron model on M4 Pro Mac Mini with 48 GB or 64 GB RAM?

@jum_py 2 ай бұрын

Very useful video!!! Thanks

@rhadiem 2 ай бұрын

You could compare against the cheap ebay cards like the $100 24gb M40 and the like as well. Although a 3090 goes for $600 on ebay now, which will run laps around the M4 16gb. M4's good for a power efficiency aspect, as well as being "useful" if you need a Mac to begin with.

@AJ-rg3qd 7 күн бұрын

Is it possible, that you installs ollama into docker container? so fare I understand, in docker it’s not possible to use the GPU on MacBooks ???

@tech-practice9805 6 күн бұрын

I haven't tried Docker yet

@danfitzpatrick4112 19 күн бұрын

The Mac mini M4 is Plenty fast enough for llama, Local AI, and Home Assistant IMO. Can anyone confirm?

@tech-practice9805 18 күн бұрын

For small/medium sized model, fast enough

@ywueeee 2 ай бұрын

wow that's so cool, can you do for image generation models as well?

@tech-practice9805 2 ай бұрын

yes, I can also compare them for image generating

@keremg 2 ай бұрын

Please do!

@maglat 2 ай бұрын

@@tech-practice9805Yes pls! would love to see this kind of video

@MathiasVervaeke Ай бұрын

Solid performance from then AMD RX 6700 XT then, which was positioned as a competitor to NVIDIA's RTX 3060 Ti and RTX 3070.

@jldymy 2 ай бұрын

What do you think about parallel inference using Mac mini m4? I haven’t manage to try that yet.

@enderbreton8136 24 күн бұрын

Can you tell me how you got ollama working on the 6700xt?

@tech-practice9805 22 күн бұрын

What error did you get? I will try to make a tutorial

@DaengRosanda 2 ай бұрын

Nice video content.. Can't wait for another comparison using low end GPU or AI board.

@Aaronage1 Ай бұрын

Thanks for the test! To add to the data, I ran the test on an M1 Ultra Mac Studio (48C GPU, 64GB RAM) and it did 55.8 tokens/sec at ~105W 👍

@tech-practice9805 Ай бұрын

Thanks for the info!

@satyakichatterjee4229 2 ай бұрын

But how are able to use 4 different graphics(2 virtual of course) card from the same device at the same time

@tech-practice9805 2 ай бұрын

the Macbook, Mac Mini were remotely accessed(use ssh login)

@AindriuMacGiollaEoin 2 ай бұрын

Very useful indeed

@yagoa 2 ай бұрын

Load duration is the most important to me, so for me the M4 is the winner, with faster external storage it could be twice as fast as this.

@nikodunk 2 ай бұрын

great video, thank you.

@tech-practice9805 2 ай бұрын

Glad you liked it!

@CasualExplains 18 күн бұрын

70b model omg run pls

@tech-practice9805 16 күн бұрын

I wish the Mac has more RAM on it

@kronosthesoulshaker 2 ай бұрын

Can you please do a video on Ollama with Ubuntu 24.04, RX 5700 XT and ROCm? A lot of people are looking for a complete guide, including myself. Thank you for your videos and keep them coming!

@AIbutterFlay 2 ай бұрын

can you do this with imac 2015 intel alot of ppl looking that too also can you do it on android?

@tech-practice9805 2 ай бұрын

I uploaded a video fro 6700XT kzbin.info/www/bejne/nmSXp3Rol9CVqs0 For 5700XT, last time I checked, Ollama is not supporting 5700XT. But 5700XT may be able to use other libraries such as pytorch-rocm.

@tech-practice9805 2 ай бұрын

I haven't tried them on intel mac. But it's a good idea. I will give it a try!

@AIbutterFlay 2 ай бұрын

@@tech-practice9805 hahaha not my friend gentle friend i was trolling, u r doing a good job, n i very gratefull for your answer not momre grolling to fans XD

@kdta91 24 күн бұрын

I'm running Ollama on RX 5600 XT. Works pretty well. I'm on PopOS 22.04

@K00LD00D 2 ай бұрын

Thank you

@tomwawer5714 2 ай бұрын

Thanks! What biggest ollama model can you run on Mac mini? Qwen:32b?

@tech-practice9805 2 ай бұрын

I tested Qwen:14b, speed is about 10 tokens/s. kzbin.info/www/bejne/d2nFgJ2FYseierc 32B will need bigger ram.

@tomwawer5714 2 ай бұрын

Thanks!

@vctrro 2 ай бұрын

M4 Pro Max (32) - 59.89 tokens/s

@大支爺 2 ай бұрын

Mmmmm........ I've 4090 with 192Gb DDR5.

@DigiDriftZone 2 ай бұрын

So M4 is still around 4x slower than even a pretty old 3080TI? - I love my Macbook Pro M3 Max and Mac Mini M1 but I think for LLMs I will stick to my Nvidia 3090 Intel desktop for now :)

@Chidorin 2 ай бұрын

do you need that speed if it’s faster than you can read?🤔

@DigiDriftZone 2 ай бұрын

@ for training models to recognise video surveillance images it’s a difference of 2 weeks on a 4 year old 3090 vs 8 weeks on a brand new m4 pro, so yes :)

@Duckstalker1340 2 ай бұрын

If you don’t train but only inference then M4 is the best in terms of initial value as well as running value due to huge power efficiency advantage

@AIbutterFlay 2 ай бұрын

m4 ultra it is supposed to beat 4090 :S

@DigiDriftZone 2 ай бұрын

@@AIbutterFlay That will be exciting to see! - currently my M3 Max just isn't fast enough for training models, it's fine for running them though.

@zx9rmario 2 ай бұрын

How many tokens can a GTX 3060 12GB compared to 3080Tia?

@tech-practice9805 2 ай бұрын

about 60 t/s

@rhadiem 2 ай бұрын

4060TI 16gb is a good value for PC if you're buying new and need to do AI.

@MuhammadFahreza Ай бұрын

@@rhadiemhow many token/s?

@Xiantez Ай бұрын

I tried to watch this video, but the "mouth sounds" the microphone is picking up is making it difficult to focus. All I hear now is smacking, spit moving around between breaths.