The ONLY Local LLM Tool for Mac (Apple Silicon)!!

  Рет қаралды 19,586

1littlecoder

1littlecoder

Күн бұрын

LM Studio 0.3.4 ships with an MLX engine for running on-device LLMs super efficiently on Apple Silicon Macs.
MLX support in LM Studio 0.3.4 includes:
Search & download any supported MLX LLM from Hugging Face (just like you've been doing with GGUF models)
Use MLX models via the Chat UI, or from your code using an OpenAI-like local server running on localhost
Enforce LLM responses in specific JSON formats (thanks to Outlines)
Use Vision models like LLaVA and more, and use them via the chat or the API (thanks to mlx-vlm)
Load and run multiple simultaneous LLMs. You can even mix and match llama.cpp and MLX models!
lmstudio.ai/bl...
🔗 Links 🔗
❤️ If you want to support the channel ❤️
Support here:
Patreon - / 1littlecoder
Ko-Fi - ko-fi.com/1lit...
🧭 Follow me on 🧭
Twitter - / 1littlecoder
Linkedin - / amrrs

Пікірлер
@gue2212
@gue2212 3 ай бұрын
Thanks for making me aware of the MLX version. Beware: My installed version only updated to 0.2.31 and I had to download 0.3.4 from LMStudio AI!
@RedShiftedDollar
@RedShiftedDollar 18 күн бұрын
LLMs are kinda like lossy compression algorithms that store all kinds of training data in an abstract form as a superposition of training weights. The data can then be recovered in a lossy way by prompting the network to retrieve it.
@balubalaji9956
@balubalaji9956 4 күн бұрын
there is a difference in traditional data retrieval you wanted a data say '123' you ask for '123' : input and out are same in deep learning algorithms input and out are different input, output can be '123' and 6 (1+2+3 ) respectively.
@RedShiftedDollar
@RedShiftedDollar 4 күн бұрын
@@balubalaji9956 Ask for a picture of Pikachu and you will get a picture of Pikachu only if the model weights encode information about Pikachu and this will only happen if the model was trained using pictures of Pikachu. The act of training is an abstract method of encoding. The act of prompting triggers an abstract method of decoding. I even ran your example of compression through Chat-gpt. You said compression is ask for 1,2,3 and get 1,2,3. So I prompted Chat-GPT with: Write only “1, 2, 3”. And it literally only gave me “1, 2, 3”
@nomaximum
@nomaximum 9 күн бұрын
"That's a lot of people watching your mediocre content" - hilarious! :) I did not knew that Skynet had humor...
@CarlosValero
@CarlosValero 27 күн бұрын
Thanks for this!
@maxziebell4013
@maxziebell4013 3 ай бұрын
Thank I just installed it... nice M3 here
@1littlecoder
@1littlecoder 3 ай бұрын
Enjoy the speed!
@phanindraparashar8930
@phanindraparashar8930 3 ай бұрын
Can u makea video on Fine-tuning Embeddings and LLMs (also include how to create dataset to train on custom data) It will be very interesting
@1littlecoder
@1littlecoder 3 ай бұрын
Thanks for the idea, Will try to put together something!
@Pregidth
@Pregidth 3 ай бұрын
Hey man, this is really great! Thanks. Hopefully Ollama is integrating it. They seem a bit lame past weeks.
@1littlecoder
@1littlecoder 3 ай бұрын
Hope so!
@modoulaminceesay9211
@modoulaminceesay9211 3 ай бұрын
Thanks for the tutorial
@HealthyNutrition-y
@HealthyNutrition-y 3 ай бұрын
🔥🔵“Intelligence is compression of information.” This is one of the most useful videos I believe I have ever watched on KZbin.🔵
@yorkan213swd6
@yorkan213swd6 8 күн бұрын
Great, when can I use the NPU ?
@made-simple
@made-simple 12 күн бұрын
so basically its better than ollama ?
@modoulaminceesay9211
@modoulaminceesay9211 3 ай бұрын
what is the difference between this and ollama
@supercurioTube
@supercurioTube 3 ай бұрын
How did you conclude that running the same model was faster via MLX than with the llama.cpp backend? Comparing with llama-3.1 8b 8-bit, I get the same generation speed between LM Studio/MLX and Ollama/llama.cpp (33.6 tok/s on M1 Max 64GB)
@monkeyfish227
@monkeyfish227 3 ай бұрын
Aren’t they both use mlx? Isn’t that the same speed then?
@CitAllHearItAll
@CitAllHearItAll Ай бұрын
Are you loading the same model in different tools? You have to download the MLX model and GGUF versions separately. Then load one at a time and test. MLX is decently faster for me always.
@vigneshpadmanabhan
@vigneshpadmanabhan Ай бұрын
Doesn’t support M4 yet?
@CitAllHearItAll
@CitAllHearItAll Ай бұрын
I’ve been using it all week on M4.
@ET-zw4pk
@ET-zw4pk 9 күн бұрын
AI is made for you bro
@thewatcher255
@thewatcher255 5 күн бұрын
I was looking at llama2 with mlx, but having issues installing MLX. Should I just try this?
@JoyCatDev
@JoyCatDev 15 күн бұрын
I have a mac mini M4 40tokend/s only using llama3.2
@simongentry
@simongentry Ай бұрын
yes but how free are you to run a couple of llms at same time? especially if you’re code bouncing.
@esuus
@esuus 3 ай бұрын
awesome, thanks! was looking for this. you could have gotten to the point a bit more, but whatever :D .mlx is the way to go!
@1littlecoder
@1littlecoder 3 ай бұрын
You mean gotten to the point sooner ?
@gregsLyrics
@gregsLyrics 3 ай бұрын
WOW! Brilliant vid. M3 Max currently. What is the largest size model that can run? I can't wait to try this out. I want to train a model for my legal work. Fingers crossed this can help.
@monkeyfish227
@monkeyfish227 3 ай бұрын
Depends on how much ram you have. Look at the models how big they are. You can only use around 70-75% of your ram for vram which is needed to load the entire model.
@adamgibbons4262
@adamgibbons4262 3 ай бұрын
Is there a model for Swift only programming?
@KushwanthK
@KushwanthK 21 күн бұрын
why I cannot see any model loading in mac ? Also Its not search able for me in lm studio to select and load the model?
@MrWormBeast
@MrWormBeast 16 күн бұрын
same question, here
@chidorirasenganz
@chidorirasenganz 21 күн бұрын
How does this compare to something like Private LLM?
@1littlecoder
@1littlecoder 21 күн бұрын
Is that from Anything? From what I know only LMstudio supports mlx but not sure if more support has come recently
@chidorirasenganz
@chidorirasenganz 21 күн бұрын
@ Private LLM is a local chat app available for MacOS and iOS that uses OmniQuant for its models. I don’t know if they use MLX but they do use Game Mode to further focus performance. I was curious on how it compares to LM Studio
@build.aiagents
@build.aiagents 3 ай бұрын
Phenomenal 🤖
@usmanyousaf-i2i
@usmanyousaf-i2i 3 ай бұрын
can we use this in intel mac..?
@1littlecoder
@1littlecoder 3 ай бұрын
you can use this, but the mlx bit won't work
@PiratesZombies
@PiratesZombies Ай бұрын
is M2 8/512 work?
@ProSamiKhan
@ProSamiKhan 3 ай бұрын
One model is of Dhanush, and the other is of Tamanna. Can they both be prompted together in a single image? If yes, how? Please explain, or if there's a tutorial link, kindly share.
@balubalaji9956
@balubalaji9956 4 күн бұрын
😂
@alx8439
@alx8439 3 ай бұрын
Pls, give Jan AI a try. LM Studio is based on llama cpp, but proprietary closedsource and God only knows what it is doing - mining shitcoins, sending telemetry, collecting your personal data - you'll never know. Jan AI is open source and based on the same llama cpp and gets the same benefits as llama cpp gets
@zriley7995
@zriley7995 Ай бұрын
But we need mlx support 😢😢😢
@alx8439
@alx8439 Ай бұрын
@zriley7995 original llama.cpp has it. LM Studio added ZERO to the under-the-hood functionally - just slapped its own UI on top of it
@KRIPAMISHRA-rz7hg
@KRIPAMISHRA-rz7hg 2 ай бұрын
Whats your PC spec ?
@andrewwhite1576
@andrewwhite1576 Ай бұрын
It’s a Mac so the one titled Mac specs😂
@benarcher372
@benarcher372 Ай бұрын
Anyone know a decent model for generation of Go code? Like for solving Advent of Code puzzles.
@1littlecoder
@1littlecoder Ай бұрын
try with qwen coder series of models
@benarcher372
@benarcher372 Ай бұрын
@@1littlecoder Thanks for the information! I'll try that on my M4
@benarcher372
@benarcher372 Ай бұрын
@@1littlecoder Now tested, very briefly, the lmstudio--community/Qwen2.5-Coder-32B-Instruct-MLX-8bit. So far good results. Nice to be able to do this 'off-line' (on a local machine)
@SirSalter
@SirSalter Ай бұрын
Let’s go ahead and say “go ahead” every other sentence
@1littlecoder
@1littlecoder Ай бұрын
@@SirSalter did I use it to much 😭 sorry
@judgegroovyman
@judgegroovyman Ай бұрын
@@1littlecodernah youre perfect. That guy is just grumpy and thats fine :) you rock!
@1littlecoder
@1littlecoder Ай бұрын
@@judgegroovyman thank you sir ✅
@Christophe-d9k
@Christophe-d9k 3 ай бұрын
With the presented qwen2-0-5b-instruct model(352.97 MB), It's about twice faster on your M3 max (221 tok/sec) than on my RTX 3090 ( 126 tok/sec) but, with the llama-3.2-3B-4bit model (2.02 GB) speeds are similar on both device. this is probably due to the amout of available vram (24GB on 3090)
@theycallmexavier
@theycallmexavier 26 күн бұрын
Jan is better
@1littlecoder
@1littlecoder 26 күн бұрын
I don't think they have got mlx support have they ? I have a Jan video as well
@tollington9414
@tollington9414 3 ай бұрын
Ollama is excellent. Don’t dis it
@1littlecoder
@1littlecoder 3 ай бұрын
@@tollington9414 didn't
Turn ANY Website into LLM Knowledge in SECONDS
18:44
Cole Medin
Рет қаралды 146 М.
FREE Local LLMs on Apple Silicon | FAST!
15:09
Alex Ziskind
Рет қаралды 229 М.
Une nouvelle voiture pour Noël 🥹
00:28
Nicocapone
Рет қаралды 9 МЛН
Chain Game Strong ⛓️
00:21
Anwar Jibawi
Рет қаралды 41 МЛН
Apple MLX Makes "Personal" ChatGPT Reality
5:19
Dailytekk 2
Рет қаралды 6 М.
I Made an iOS App in MINUTES with This AI Tool!
13:20
Creator Magic
Рет қаралды 672 М.
Local LLM Fine-tuning on Mac (M1 16GB)
24:12
Shaw Talebi
Рет қаралды 24 М.
[1hr Talk] Intro to Large Language Models
59:48
Andrej Karpathy
Рет қаралды 2,4 МЛН
Apple Tried to Destroy This Mac… But I Found One!
26:36
Snazzy Labs
Рет қаралды 350 М.
Turn ANY FOLDER into LLM Knowledge in SECONDS
9:34
1littlecoder
Рет қаралды 6 М.
This AI Technology Will Replace Millions (Here's How to Prepare)
53:17
Is MLX the best Fine Tuning Framework?
19:08
Matt Williams
Рет қаралды 10 М.
Une nouvelle voiture pour Noël 🥹
00:28
Nicocapone
Рет қаралды 9 МЛН