This is amazing, seriously! The standard model in Ollama is censored, but with HuggingFace, I can pull an uncensored version directly without having to do it manually.
@VibudhSinghАй бұрын
Super useful update! Thanks for sharing it Sam. Really looking forward on how to serve ollama model on local pc to cloud!
@jayhu6075Ай бұрын
This is incredibly useful information for anyone experimenting with different quantization methods for model loading. Many thanks for sharing.
@letis7766Ай бұрын
amazing info, waiting for the cloud video got it running locally rn! Keep it coming
@AnthonyGarlandАй бұрын
Thanks for sharing. I tried it and it works great
@ibrahimhalouane8130Ай бұрын
Previously I used to pull the open webui with custom script to load either gptq or gguf model, a lot more complex but better control.
@formigarafaАй бұрын
Where I can find more detwils about those quantization labels? I think I got the bit part but I want to know more about the k, l, m thing...
@miklefeldmanАй бұрын
Good stuff. But we've been able to do this for ages 😅 Just wget any GGUF file from HuggingFace and use ollama create Modelfile. Or am I missing something?
@samwitteveenaiАй бұрын
yes that is true I made a video of how to do it a long time ago. This 1 makes it much easier to do and also brings in the right meta data for ML-Chat template etc if the model is setup right on HF.
@dennisestenson7820Ай бұрын
0:40 I've been using this for weeks. Very, very handy.
@gaozheАй бұрын
I am using macbook as daily routine,and i wonder know when ollama can use MLX models?
@sammcj2000Ай бұрын
Q4_K_M is a mid-to-low quality quant, they're only really acceptable for large parameter size models (30b+), for small (7-14b~) you're better off sticking to at least Q6_K. Q8_0 is essentially not worth it for any size.
@toadlguyАй бұрын
This is great (I think) 😊. I’m sure I can figure it out, but if you know, does this means that the HF Transformers library (which is PyTorch based) is now working with llama.cpp or is Ollama now working with PyTorch (Python)? Is there any difference between Ollama native models and huggingface models as far as size or speed? How about vision models?
@samwitteveenaiАй бұрын
no it is not the Transformers lib that is working it is just the Hub that stores the models. They have over 1mil models but not all those will work with Ollama only the GGUF ones. Do a search for GGUF and you can find many of the popular models will be converted to GGUF. there are apparently around 45k GGUF models on the Hub
@novantha1Ай бұрын
HF Transformers is now working with Llama CPP and Ollama formally; I don’t know the exact relationship, but Huggingface kicks back some of the revenue from GGUF models being run back to the CPP projects upstream
@toadlguyАй бұрын
Hug-o-Llama (Face)??
@Michael-b7z8y22 күн бұрын
Wow !!!
@denijane89Ай бұрын
Considering ollama not always run on gpu, that's questionable fun. llama.cpp works on gpu well, so for now for me at least it's the better option. Otherwise, pretty cool development. /I'm using nvidia under linux and that's always a pain. so maybe it doesn't apply to every os/
@billykotsos4642Ай бұрын
YEEEESSSSSS
@actellimQTАй бұрын
This vid based af
@BreeAiSolutionsАй бұрын
Hey Sam i am a huge fan of your work can i please chat to you about an AI service website that i just launched ?I just need your feedback on it if you dont mind...