Ollama + HuggingFace - 45,000 New Models

Рет қаралды 10,837

Күн бұрын

Пікірлер: 22

@nufh Ай бұрын

This is amazing, seriously! The standard model in Ollama is censored, but with HuggingFace, I can pull an uncensored version directly without having to do it manually.

@VibudhSingh Ай бұрын

Super useful update! Thanks for sharing it Sam. Really looking forward on how to serve ollama model on local pc to cloud!

@jayhu6075 Ай бұрын

This is incredibly useful information for anyone experimenting with different quantization methods for model loading. Many thanks for sharing.

@letis7766 Ай бұрын

amazing info, waiting for the cloud video got it running locally rn! Keep it coming

@AnthonyGarland Ай бұрын

Thanks for sharing. I tried it and it works great

@ibrahimhalouane8130 Ай бұрын

Previously I used to pull the open webui with custom script to load either gptq or gguf model, a lot more complex but better control.

@formigarafa Ай бұрын

Where I can find more detwils about those quantization labels? I think I got the bit part but I want to know more about the k, l, m thing...

@miklefeldman Ай бұрын

Good stuff. But we've been able to do this for ages 😅 Just wget any GGUF file from HuggingFace and use ollama create Modelfile. Or am I missing something?

@samwitteveenai Ай бұрын

yes that is true I made a video of how to do it a long time ago. This 1 makes it much easier to do and also brings in the right meta data for ML-Chat template etc if the model is setup right on HF.

@dennisestenson7820 Ай бұрын

0:40 I've been using this for weeks. Very, very handy.

@gaozhe Ай бұрын

I am using macbook as daily routine,and i wonder know when ollama can use MLX models?

@sammcj2000 Ай бұрын

Q4_K_M is a mid-to-low quality quant, they're only really acceptable for large parameter size models (30b+), for small (7-14b~) you're better off sticking to at least Q6_K. Q8_0 is essentially not worth it for any size.

@toadlguy Ай бұрын

This is great (I think) 😊. I’m sure I can figure it out, but if you know, does this means that the HF Transformers library (which is PyTorch based) is now working with llama.cpp or is Ollama now working with PyTorch (Python)? Is there any difference between Ollama native models and huggingface models as far as size or speed? How about vision models?

@samwitteveenai Ай бұрын

no it is not the Transformers lib that is working it is just the Hub that stores the models. They have over 1mil models but not all those will work with Ollama only the GGUF ones. Do a search for GGUF and you can find many of the popular models will be converted to GGUF. there are apparently around 45k GGUF models on the Hub

@novantha1 Ай бұрын

HF Transformers is now working with Llama CPP and Ollama formally; I don’t know the exact relationship, but Huggingface kicks back some of the revenue from GGUF models being run back to the CPP projects upstream

@toadlguy Ай бұрын

Hug-o-Llama (Face)??

@Michael-b7z8y 22 күн бұрын

Wow !!!

@denijane89 Ай бұрын

Considering ollama not always run on gpu, that's questionable fun. llama.cpp works on gpu well, so for now for me at least it's the better option. Otherwise, pretty cool development. /I'm using nvidia under linux and that's always a pain. so maybe it doesn't apply to every os/

@billykotsos4642 Ай бұрын

YEEEESSSSSS

@actellimQT Ай бұрын

This vid based af

@BreeAiSolutions Ай бұрын

Hey Sam i am a huge fan of your work can i please chat to you about an AI service website that i just launched ?I just need your feedback on it if you dont mind...

@samwitteveenai Ай бұрын

Sure just message me on LinkedIn is best