The ONLY Local LLM Tool for Mac (Apple Silicon)!!

Рет қаралды 19,586

Күн бұрын

LM Studio 0.3.4 ships with an MLX engine for running on-device LLMs super efficiently on Apple Silicon Macs.
MLX support in LM Studio 0.3.4 includes:
Search & download any supported MLX LLM from Hugging Face (just like you've been doing with GGUF models)
Use MLX models via the Chat UI, or from your code using an OpenAI-like local server running on localhost
Enforce LLM responses in specific JSON formats (thanks to Outlines)
Use Vision models like LLaVA and more, and use them via the chat or the API (thanks to mlx-vlm)
Load and run multiple simultaneous LLMs. You can even mix and match llama.cpp and MLX models!
lmstudio.ai/bl...
🔗 Links 🔗
❤️ If you want to support the channel ❤️
Support here:
Patreon - / 1littlecoder
Ko-Fi - ko-fi.com/1lit...
🧭 Follow me on 🧭
Twitter - / 1littlecoder
Linkedin - / amrrs

Пікірлер

@gue2212 3 ай бұрын

Thanks for making me aware of the MLX version. Beware: My installed version only updated to 0.2.31 and I had to download 0.3.4 from LMStudio AI!

@RedShiftedDollar 18 күн бұрын

LLMs are kinda like lossy compression algorithms that store all kinds of training data in an abstract form as a superposition of training weights. The data can then be recovered in a lossy way by prompting the network to retrieve it.

@balubalaji9956 4 күн бұрын

there is a difference in traditional data retrieval you wanted a data say '123' you ask for '123' : input and out are same in deep learning algorithms input and out are different input, output can be '123' and 6 (1+2+3 ) respectively.

@RedShiftedDollar 4 күн бұрын

@@balubalaji9956 Ask for a picture of Pikachu and you will get a picture of Pikachu only if the model weights encode information about Pikachu and this will only happen if the model was trained using pictures of Pikachu. The act of training is an abstract method of encoding. The act of prompting triggers an abstract method of decoding. I even ran your example of compression through Chat-gpt. You said compression is ask for 1,2,3 and get 1,2,3. So I prompted Chat-GPT with: Write only “1, 2, 3”. And it literally only gave me “1, 2, 3”

@nomaximum 9 күн бұрын

"That's a lot of people watching your mediocre content" - hilarious! :) I did not knew that Skynet had humor...

@CarlosValero 27 күн бұрын

Thanks for this!

@maxziebell4013 3 ай бұрын

Thank I just installed it... nice M3 here

@1littlecoder 3 ай бұрын

Enjoy the speed!

@phanindraparashar8930 3 ай бұрын

Can u makea video on Fine-tuning Embeddings and LLMs (also include how to create dataset to train on custom data) It will be very interesting

@1littlecoder 3 ай бұрын

Thanks for the idea, Will try to put together something!

@Pregidth 3 ай бұрын

Hey man, this is really great! Thanks. Hopefully Ollama is integrating it. They seem a bit lame past weeks.

@1littlecoder 3 ай бұрын

Hope so!

@modoulaminceesay9211 3 ай бұрын

Thanks for the tutorial

@HealthyNutrition-y 3 ай бұрын

🔥🔵“Intelligence is compression of information.” This is one of the most useful videos I believe I have ever watched on KZbin.🔵

@yorkan213swd6 8 күн бұрын

Great, when can I use the NPU ?

@made-simple 12 күн бұрын

so basically its better than ollama ?

@modoulaminceesay9211 3 ай бұрын

what is the difference between this and ollama

@supercurioTube 3 ай бұрын

How did you conclude that running the same model was faster via MLX than with the llama.cpp backend? Comparing with llama-3.1 8b 8-bit, I get the same generation speed between LM Studio/MLX and Ollama/llama.cpp (33.6 tok/s on M1 Max 64GB)

@monkeyfish227 3 ай бұрын

Aren’t they both use mlx? Isn’t that the same speed then?

@CitAllHearItAll Ай бұрын

Are you loading the same model in different tools? You have to download the MLX model and GGUF versions separately. Then load one at a time and test. MLX is decently faster for me always.

@vigneshpadmanabhan Ай бұрын

Doesn’t support M4 yet?

@CitAllHearItAll Ай бұрын

I’ve been using it all week on M4.

@ET-zw4pk 9 күн бұрын

AI is made for you bro

@thewatcher255 5 күн бұрын

I was looking at llama2 with mlx, but having issues installing MLX. Should I just try this?

@JoyCatDev 15 күн бұрын

I have a mac mini M4 40tokend/s only using llama3.2

@simongentry Ай бұрын

yes but how free are you to run a couple of llms at same time? especially if you’re code bouncing.

@esuus 3 ай бұрын

awesome, thanks! was looking for this. you could have gotten to the point a bit more, but whatever :D .mlx is the way to go!

@1littlecoder 3 ай бұрын

You mean gotten to the point sooner ?

@gregsLyrics 3 ай бұрын

WOW! Brilliant vid. M3 Max currently. What is the largest size model that can run? I can't wait to try this out. I want to train a model for my legal work. Fingers crossed this can help.

@monkeyfish227 3 ай бұрын

Depends on how much ram you have. Look at the models how big they are. You can only use around 70-75% of your ram for vram which is needed to load the entire model.

@adamgibbons4262 3 ай бұрын

Is there a model for Swift only programming?

@KushwanthK 21 күн бұрын

why I cannot see any model loading in mac ? Also Its not search able for me in lm studio to select and load the model?

@MrWormBeast 16 күн бұрын

same question, here

@chidorirasenganz 21 күн бұрын

How does this compare to something like Private LLM?

@1littlecoder 21 күн бұрын

Is that from Anything? From what I know only LMstudio supports mlx but not sure if more support has come recently

@chidorirasenganz 21 күн бұрын

@ Private LLM is a local chat app available for MacOS and iOS that uses OmniQuant for its models. I don’t know if they use MLX but they do use Game Mode to further focus performance. I was curious on how it compares to LM Studio

@build.aiagents 3 ай бұрын

Phenomenal 🤖

@usmanyousaf-i2i 3 ай бұрын

can we use this in intel mac..?

@1littlecoder 3 ай бұрын

you can use this, but the mlx bit won't work

@PiratesZombies Ай бұрын

is M2 8/512 work?

@ProSamiKhan 3 ай бұрын

One model is of Dhanush, and the other is of Tamanna. Can they both be prompted together in a single image? If yes, how? Please explain, or if there's a tutorial link, kindly share.

@balubalaji9956 4 күн бұрын

😂

@alx8439 3 ай бұрын

Pls, give Jan AI a try. LM Studio is based on llama cpp, but proprietary closedsource and God only knows what it is doing - mining shitcoins, sending telemetry, collecting your personal data - you'll never know. Jan AI is open source and based on the same llama cpp and gets the same benefits as llama cpp gets

@zriley7995 Ай бұрын

But we need mlx support 😢😢😢

@alx8439 Ай бұрын

@zriley7995 original llama.cpp has it. LM Studio added ZERO to the under-the-hood functionally - just slapped its own UI on top of it

@KRIPAMISHRA-rz7hg 2 ай бұрын

Whats your PC spec ?

@andrewwhite1576 Ай бұрын

It’s a Mac so the one titled Mac specs😂

@benarcher372 Ай бұрын

Anyone know a decent model for generation of Go code? Like for solving Advent of Code puzzles.

@1littlecoder Ай бұрын

try with qwen coder series of models

@benarcher372 Ай бұрын

@@1littlecoder Thanks for the information! I'll try that on my M4

@benarcher372 Ай бұрын

@@1littlecoder Now tested, very briefly, the lmstudio--community/Qwen2.5-Coder-32B-Instruct-MLX-8bit. So far good results. Nice to be able to do this 'off-line' (on a local machine)

@SirSalter Ай бұрын

Let’s go ahead and say “go ahead” every other sentence

@1littlecoder Ай бұрын

@@SirSalter did I use it to much 😭 sorry

@judgegroovyman Ай бұрын

@@1littlecodernah youre perfect. That guy is just grumpy and thats fine :) you rock!

@1littlecoder Ай бұрын

@@judgegroovyman thank you sir ✅

@Christophe-d9k 3 ай бұрын

With the presented qwen2-0-5b-instruct model(352.97 MB), It's about twice faster on your M3 max (221 tok/sec) than on my RTX 3090 ( 126 tok/sec) but, with the llama-3.2-3B-4bit model (2.02 GB) speeds are similar on both device. this is probably due to the amout of available vram (24GB on 3090)