20 tokens is pretty solid. It's faster than I could read the generated text, very power efficient, and a whole machine in a package. I'm so put off by my current PC, I might just buy the base model and switch later towards the Mac Studio.
@wealleuropean28 күн бұрын
yeah, also work with my intell macbook, so that keyboard / track pad all set, realy fascinating cheap n powerful
@inwedavid691924 күн бұрын
Yes, that mean you have quite a lot of money to spend while a new graphic card for the PC you can have a 4000 or 5000 series for cheaper than the mini
@NomNomBasti24 күн бұрын
@@inwedavid6919 Not sure what you mean? Base price Mac Mini is a very cheap overall package, and very energy efficient. Anyway, all of this is an enthusiast market, so you go big or go home. 😎
@peter.dolkens12 күн бұрын
@@inwedavid6919you could buy 4 Mac minis for the price of a single 5090 🤣
@kristoffampong877411 күн бұрын
@@inwedavid6919 you also have to buy everything. Mobo, Processor, RAM, Drive, PSU, Fan, Case. So that would be an easy 1.5k - 2k USD.
regarding apple silicon, this specific type of task runs much faster on a Max chip vs a 'standard' or 'Pro' chip, mainly because of the memory bandwidth. you also need a ton of RAM (minimum 32gb) because GPUs uses shared memory in apple SoC.
@AtharvaMaskar-k6t17 күн бұрын
Heyy I am an ML Engineer planning to buy the Mac Mini M4. I usually rent GPUs for finetuning LLMs and my work on my local is usually building RAGs, maybe finetuning SLMs or BERT models, preparing datasets etc along with other casual usage. Should I get one and if yes is the base model enough or should I consider the 24 GB and the 32 GB variants? I could find little to no stuff online so link to resources or benchmarks would be highly appreciated.
@tech-practice980516 күн бұрын
I think bigger RAM is usually capable of finetuging bigger models and faster when finetuning
@dilip.rajkumar2 ай бұрын
Great video. Could you also describe in a table which llama model along with model size can run on what M4 or M4 Pro machine along with RAM configuration? Like for eg: can we run the 70B Nemotron model on M4 Pro Mac Mini with 48 GB or 64 GB RAM?
@jum_py2 ай бұрын
Very useful video!!! Thanks
@rhadiem2 ай бұрын
You could compare against the cheap ebay cards like the $100 24gb M40 and the like as well. Although a 3090 goes for $600 on ebay now, which will run laps around the M4 16gb. M4's good for a power efficiency aspect, as well as being "useful" if you need a Mac to begin with.
@AJ-rg3qd7 күн бұрын
Is it possible, that you installs ollama into docker container? so fare I understand, in docker it’s not possible to use the GPU on MacBooks ???
@tech-practice98056 күн бұрын
I haven't tried Docker yet
@danfitzpatrick411219 күн бұрын
The Mac mini M4 is Plenty fast enough for llama, Local AI, and Home Assistant IMO. Can anyone confirm?
@tech-practice980518 күн бұрын
For small/medium sized model, fast enough
@ywueeee2 ай бұрын
wow that's so cool, can you do for image generation models as well?
@tech-practice98052 ай бұрын
yes, I can also compare them for image generating
@keremg2 ай бұрын
Please do!
@maglat2 ай бұрын
@@tech-practice9805Yes pls! would love to see this kind of video
@MathiasVervaekeАй бұрын
Solid performance from then AMD RX 6700 XT then, which was positioned as a competitor to NVIDIA's RTX 3060 Ti and RTX 3070.
@jldymy2 ай бұрын
What do you think about parallel inference using Mac mini m4? I haven’t manage to try that yet.
@enderbreton813624 күн бұрын
Can you tell me how you got ollama working on the 6700xt?
@tech-practice980522 күн бұрын
What error did you get? I will try to make a tutorial
@DaengRosanda2 ай бұрын
Nice video content.. Can't wait for another comparison using low end GPU or AI board.
@Aaronage1Ай бұрын
Thanks for the test! To add to the data, I ran the test on an M1 Ultra Mac Studio (48C GPU, 64GB RAM) and it did 55.8 tokens/sec at ~105W 👍
@tech-practice9805Ай бұрын
Thanks for the info!
@satyakichatterjee42292 ай бұрын
But how are able to use 4 different graphics(2 virtual of course) card from the same device at the same time
@tech-practice98052 ай бұрын
the Macbook, Mac Mini were remotely accessed(use ssh login)
@AindriuMacGiollaEoin2 ай бұрын
Very useful indeed
@yagoa2 ай бұрын
Load duration is the most important to me, so for me the M4 is the winner, with faster external storage it could be twice as fast as this.
@nikodunk2 ай бұрын
great video, thank you.
@tech-practice98052 ай бұрын
Glad you liked it!
@CasualExplains18 күн бұрын
70b model omg run pls
@tech-practice980516 күн бұрын
I wish the Mac has more RAM on it
@kronosthesoulshaker2 ай бұрын
Can you please do a video on Ollama with Ubuntu 24.04, RX 5700 XT and ROCm? A lot of people are looking for a complete guide, including myself. Thank you for your videos and keep them coming!
@AIbutterFlay2 ай бұрын
can you do this with imac 2015 intel alot of ppl looking that too also can you do it on android?
@tech-practice98052 ай бұрын
I uploaded a video fro 6700XT kzbin.info/www/bejne/nmSXp3Rol9CVqs0 For 5700XT, last time I checked, Ollama is not supporting 5700XT. But 5700XT may be able to use other libraries such as pytorch-rocm.
@tech-practice98052 ай бұрын
I haven't tried them on intel mac. But it's a good idea. I will give it a try!
@AIbutterFlay2 ай бұрын
@@tech-practice9805 hahaha not my friend gentle friend i was trolling, u r doing a good job, n i very gratefull for your answer not momre grolling to fans XD
@kdta9124 күн бұрын
I'm running Ollama on RX 5600 XT. Works pretty well. I'm on PopOS 22.04
@K00LD00D2 ай бұрын
Thank you
@tomwawer57142 ай бұрын
Thanks! What biggest ollama model can you run on Mac mini? Qwen:32b?
@tech-practice98052 ай бұрын
I tested Qwen:14b, speed is about 10 tokens/s. kzbin.info/www/bejne/d2nFgJ2FYseierc 32B will need bigger ram.
@tomwawer57142 ай бұрын
Thanks!
@vctrro2 ай бұрын
M4 Pro Max (32) - 59.89 tokens/s
@大支爺2 ай бұрын
Mmmmm........ I've 4090 with 192Gb DDR5.
@DigiDriftZone2 ай бұрын
So M4 is still around 4x slower than even a pretty old 3080TI? - I love my Macbook Pro M3 Max and Mac Mini M1 but I think for LLMs I will stick to my Nvidia 3090 Intel desktop for now :)
@Chidorin2 ай бұрын
do you need that speed if it’s faster than you can read?🤔
@DigiDriftZone2 ай бұрын
@ for training models to recognise video surveillance images it’s a difference of 2 weeks on a 4 year old 3090 vs 8 weeks on a brand new m4 pro, so yes :)
@Duckstalker13402 ай бұрын
If you don’t train but only inference then M4 is the best in terms of initial value as well as running value due to huge power efficiency advantage
@AIbutterFlay2 ай бұрын
m4 ultra it is supposed to beat 4090 :S
@DigiDriftZone2 ай бұрын
@@AIbutterFlay That will be exciting to see! - currently my M3 Max just isn't fast enough for training models, it's fine for running them though.
@zx9rmario2 ай бұрын
How many tokens can a GTX 3060 12GB compared to 3080Tia?
@tech-practice98052 ай бұрын
about 60 t/s
@rhadiem2 ай бұрын
4060TI 16gb is a good value for PC if you're buying new and need to do AI.
@MuhammadFahrezaАй бұрын
@@rhadiemhow many token/s?
@XiantezАй бұрын
I tried to watch this video, but the "mouth sounds" the microphone is picking up is making it difficult to focus. All I hear now is smacking, spit moving around between breaths.