Either one should be fine, as long as you have 48GB of RAM. This demo was run on an M2 Max Mac Studio, but an M4 Mac Mini should work as well.
@christiancrow21 күн бұрын
@PrivateLLM 2 grand for a 48 GB system , I would be interested in base model I wonder if it could run faster on newest llama
@PrivateLLM21 күн бұрын
@@christiancrow Base model with 16Gb of RAM can easily run Gemma 2 9B, Llama 3.1 8B, Qwen 2.5 14B. Checkout the model list on our website for a full list of models along with RAM requirements. privatellm.app/en#models
@Lp-ze1tgАй бұрын
Is there a tutorial for private llm?
@PrivateLLM21 күн бұрын
You can check this out: privatellm.app/blog/run-local-gpt-on-ios-complete-guide The article is slightly dated, and we need to revise it with new model recommendations. We will do it soon.
@tak4272Ай бұрын
Does privateLLM have an OpenAI-compatible API? If it doesn't, then being somewhat faster in inference compared to Ollama won't be a significant advantage. Many software applications are compatible with OpenAI's API, so using Ollama offers various benefits. I think without an API, it would just be a chatbot.
@PrivateLLM21 күн бұрын
@@tak4272 We’re working on adding an HTTP API. We’ve always supported extension through macOS shortcuts which llama.cpp wrappers lack. Also, Ollama has additional features that we’ll never be able match: Slow inference, and low quality RTN quantized models.