Ollama - Local Models on your machine

Рет қаралды 70,382

Күн бұрын

Site: www.ollama.ai/
My Links:
Twitter - / sam_witteveen
Linkedin - / samwitteveen
Github:
github.com/samwit/langchain-t... (updated)
github.com/samwit/llm-tutorials
Timestamps
00:00 intro
01:07 What is Ollama
01:30 Ollama Models
02:19 Installing Ollama
03:13 Running Ollama
06:17 Customizing Ollama

Пікірлер: 114

@chautauquatrail 8 ай бұрын

I just did the ollama install yesterday, you are awesome for being able to produce these so quickly.

@kenchang3456 8 ай бұрын

Thanks Sam, very interesting. It's amazing how fast the whole LLM ecosystem is moving.

@VincentVonDudler 8 ай бұрын

Thanks, Ollama.

@DanielSpringer 8 ай бұрын

Definitely has a docker vibe. I like it!

@paulmiller591 5 ай бұрын

Thanks Sam Great video. These are some of the best videos on AI tools. I need to master this for my work and your approach to communication really works for me. Cheers. Keep up all things Langchain, please.

@user-ut8ts5gv2g 3 ай бұрын

nice video. I managed to create a customized model by watching this.

@nexuslux 8 ай бұрын

Nice and to the point video. Appreciate it!

@kp_kovilakam 7 ай бұрын

Thank you for the introduction!

@sonurocks341 Ай бұрын

Great demo ! Thank you !!

@sandrocavali9810 3 ай бұрын

Excellent intro

@riflebird4842 8 ай бұрын

Thanks for the video, keep it up

@richardchiodo2200 3 ай бұрын

I picked up a 12gb 3060 from my local microcenter for a pretty good price, and am now running Ollama with Open WebUI for the frontend, and they have a community repository for prompts and modelfiles. The biggest hurdle was passing the gpu through my proxmox host to my vm.

@sitrakaforler8696 7 ай бұрын

LLama2 unsencored was quite surprising for me x) By the way. THANK YOU FOR yOUR VIDEO Every time i need to use Ollama I'm using your video to be sure of the command "Ollama run" hahah

@willi1978 6 ай бұрын

it looks like the uncensored versions are a lot better. then it is not always giving a paragraph why it can't do what you ask it to do

@jeffsteyn7174 4 ай бұрын

Its not just about been techincal. Its also about been productive. Do you want to spend your time building something useful or time trying to figure out how a badly maintained and documented piece of software works.

@theh1ve 8 ай бұрын

Another great flag this Sam. I would be interested to see this running so you can make API calls.. currently using Text Gen Web UI as a server and this looks like it would be a good alternative.

@jidun9478 8 ай бұрын

Nice, thank you. It runs so much faster than Text Gen Web UI! I wish they'd make it easier for you to add a custom choice of models though (that is a real draw back).

@samwitteveenai 8 ай бұрын

I have a video coming that shows how to do exactly that. Its actually pretty easy.

@nembalage 6 ай бұрын

super helpful.

@twobob 8 ай бұрын

nice one thanks

@tusharbokade8378 8 ай бұрын

Interesting!

@photorealm 2 ай бұрын

Ollama for windows is out and available for download. I am testing it and it works fabulous but its very slow for me on windows. I don't think its using my nvidia GPU and can't seem to find a way to hook the GPU in under windows. But just got started, I love the fact that it is serving to a local http port as well as command line.

@FreakyStyleytobby 8 ай бұрын

Fantastic video Sam, thank you! Ollama looks great but the big, 70B models still remain beyond the reach of typical RAM. Do you know of any way, (be it API or other) to get access to Llama70B and be able to run arbitrary tokens on the model? There are some APIs like TogetherAI but they only let you to run the endpoints like /prediction, not much more.

@Leonid.Shamis 8 ай бұрын

Great video, as usual :) I have been using Ollama on Linux and it has been working great. I know that Ollama can be used via API but I was wondering whether its API is compatible with OpenAI API and can be used as a replacement for OpenAI API inside LangChain. Looking forward to more videos about Ollama. Thank you.

@IanScrivener 8 ай бұрын

There are dedicated langchain and LlamaIndex connectors for Ollama. Ollama is different to OpenAI’s.. better IMO

@VaibhavPatil-rx7pc 8 ай бұрын

Thanks

@xdasdaasdasd4787 7 ай бұрын

awesome! I was hoping to use a custom model but didnt fully understand :(

@mohamed_elmardi 8 ай бұрын

For windows users they can use WSL

@fontende 8 ай бұрын

Win 10 was last for me, I found perfect Linux of PikaOS, an Ubuntu without ubuntu snaps or other crap.

@franciscojlobaton 8 ай бұрын

Please, more. Más por favorrrr

@mbottambotta 8 ай бұрын

Thank you Sam for posting this video. Very accessible, clearly explained. Question: what I could not see is if ollama enables you to choose the model size. I.e., whether you want llama2 7b, 13b or 70b for example.

@brando2818 8 ай бұрын

3:37 You can, specify with ollama run llama2-uncensored Just go to the models page, then click one. It'll tell u the command if you're using cli

@samwitteveenai 8 ай бұрын

Yes you can pick this take a look on the models page.

@iainattwater1747 8 ай бұрын

I used the docker container just released and it works in windows.

@VijayChintapandu 2 ай бұрын

can you provide the docker container link. from where did you downloaded

@AlphaSynapse 3 ай бұрын

ollama is now available for windows (windows 10 or above)

@zacharymacaroni7649 Ай бұрын

good video :)

@MarcellodeSales 7 ай бұрын

It seems like it's Docker :D Same feeling.. Ollama will captalize one CloudNative Software Engineers

@Knowledge_Nuggies 3 ай бұрын

I'd be interested to learn how to build a RAG system or local LLM agent with tools like Ollama, LM Studio, LangChain etc.

@AndyAinsworth 8 ай бұрын

LM Studio for Windows and Mac is a great way to achieve the same with a lot less setup! Also has a great internal model browser which suggests what models might run on your machine.

@AndyAinsworth 8 ай бұрын

It can also run as an API with a click in the UI. Definitely been the easiest way for me to test out a load of different LLMs locally, nice user interface with history and markdown support.

@alx8439 8 ай бұрын

Lm studio is a proprietary software. God only knows what else it is doing on your PC - gathering and sending out your data while you are sleeping, mining bitcoins, using your PC as exit node for TOR, keylogging everything you type - you can only guess

@IanScrivener 8 ай бұрын

Agree, LM Studio is great. Can be run in OpenAI API mode.. which replicates OpenAI's API format.. and so can be easily used with langchain, LlamaIndex etc

@AndyAinsworth 8 ай бұрын

@@IanScrivener Yeah, I'm hoping to get it setup to use the API via LM Studio with Microsoft AutoGen which provides a multi agent work flow with a code Interpreter.

@scitechtalktv9742 8 ай бұрын

@@AndyAinsworththis is what I want to do also! Have you had any progress and success with this?

@attilavass6935 8 ай бұрын

What are the pros and cons of using such "local" Ollama models on Colab Pro with 2 TB of Drive?

@morespinach9832 3 ай бұрын

If we get these locally in our cloud, is there a best practice to keep them updated?

@liji8672 8 ай бұрын

Hi, Sam, good video. my little question is that if you llama2 model ran on your cpu?

@samwitteveenai 8 ай бұрын

Pretty sure it was running on Metal and using the Apple Silicon GPUs. It is certainly a quanitized model though, which helps .

@IanScrivener 8 ай бұрын

You CAN run any llama.cop tool on CPU… though it is MUCH slower than GPU. MacOS Metal GPU is surprisingly fast…

@user-di5tl3iv8l 3 ай бұрын

thanks for the video! how can I make ollama run the 13gb tar file i download locally?

@Shawn-lk2ze 6 ай бұрын

I'm new to this topic and I just binged your videos. How does this compare to vLLM from your previous video? I get ollama is more user-friendly, but I'm more curious about the performance?

@samwitteveenai 6 ай бұрын

vLLM is for more for serving full resolution models in the cloud and Ollama is for doing. vLLM shines when you have some strong GPUs to use etc.

@Shawn-lk2ze 6 ай бұрын

@@samwitteveenaiGot it! Thanks!

@BionicAnimations Ай бұрын

Hi. How do I uninstall one of them off my MacBook Pro? I am using it in terminal.

@samwitteveenai Ай бұрын

ollama rm llama3 If you just type ollama in the command line you should be able to see all the commands

@ghrasko 6 ай бұрын

In fact, it was quite easy to install ollama on Windows 10 using Windows Subsystem for Linux (WSL). In a Windows command prompt: wsl --install -d Ubuntu (this downloads and runs the Ubuntu distribution giving a Linux prompt) ollama pull llama2:13b (this downloads the selected model) ollama run llama2:13b (this runs the selected model) At this point you can white user test that will be sent to the model. This did not work for me, the keyboard input is not correctly directed to the application. This is possibly a compatibility issue with this Linux emulation. But I could fully use the downloaded models from simple Python programs directly or through Langchain.

@nelavallisivasai8740 4 ай бұрын

It's taking more time for me to get responses from local model not as fast as yours. Can you please tell me what processor you are using? What are minimum hardware requirements to run LLM models to get faster responses?

@abhijitkadalli6435 8 ай бұрын

feels like docker

@alx8439 8 ай бұрын

There are bunch of alike tools (simple to use for non technical ppl). The most prominent is gpt4all. Yeah from the guys who fine tuned first llama back in march/April on their own handcrafted datasets. These guys from ollama were definitely inspired by docker, based on the syntax and architecture :)

@technovangelist 8 ай бұрын

a few of the maintainers were early Docker employees

@guanjwcn 8 ай бұрын

Thanks, Sam. Do you know what tricks ollama uses to make it run so smoothly locally?

@samwitteveenai 8 ай бұрын

They are using quantized models and on macOS they are using metal etc.

@IanScrivener 8 ай бұрын

Ollama uses llama.cpp under the hood… the fastest LLM Inference Engine. Period. Many other apps also use llama.cpp: Kobold, Oogabooga, etc Many other apps use Python inside … easier to build much much slower performance.

@samwitteveenai 8 ай бұрын

@@IanScrivener They have the llama.cpp running on Metal on macs right. It feels like it is more than just on cpu etc. honestly haven't looked under the hood much it.

@ronelgem23 4 ай бұрын

Does it require VRAM or just regular RAM?

@user-ex4mf6ky6x 6 ай бұрын

Hi I am using ollama for past 2months, yes its giving the good results but what i need is it is possible to set the configuration file for ollama like setting the parameters for ollama to get a most accurate results can you make one video about how to set the custom parameters.

@XiOh 8 ай бұрын

when is the windows version coming out? O.o

@GrecoFPV 2 ай бұрын

Can we give this power to N8n ? Connect our local ollama with our selfhosted N8n ?

@pensiveintrovert4318 6 ай бұрын

Any idea how to load a model that is already on my disk?

@samyio4256 4 ай бұрын

Also another question. Do you really run this on a mac mini? If so, how much ram does your machine have?

@samwitteveenai 4 ай бұрын

32gb of RAM

@wendten2 8 ай бұрын

"Its Llama for those who dont have technical skills" .. the PC version is currently only available on Linux... xD

@Thelgren00 15 күн бұрын

Can i use this to install ai town..default method was too complex for me

@rookandpawn 2 күн бұрын

I'm coming from text-generation-webui, how can i use that model folder for ollama?

@NoidoDev 8 ай бұрын

# New Software In the past: It only runs in WIndows, but maybe in a few years it will be available on MacOS and one day but probably never on Linux. Today: At the moment it supports MacOS and Linux, but apparently Windows support is coming soon as well.

@samyio4256 4 ай бұрын

If the used model talks to an api, how it this a local usage? Id like to know where the prompt data goes to? Will it go to a database and the Model loads it after? Or is the model hosted seperatly in a monitored env? My basic question is, who gets the data from the input prompt?

@samwitteveenai 4 ай бұрын

the data is only on your machine. It is all running locally. It can run an api on your machine and you can then expose that if you want to use it from somewhere else. If you are just using it on your machine all data stays on your machine.

@samyio4256 4 ай бұрын

@@samwitteveenai wow! Thats a complete game changer! Thanks! ill sub, insane content!

@Ryan-yj4sd 8 ай бұрын

Can you run fine tuned models?

@IanScrivener 8 ай бұрын

Yes, and your own Loras…😊

@Gerald-iz7mv 2 ай бұрын

what port does the webserver on? can i set that port?

@merselfares8965 3 ай бұрын

would a i3 11gen with 8 ram and 630uhd graphics card be enough ?

@samwitteveenai 3 ай бұрын

honestly not sure. It will probably run but you may get very slow tokens per second

@abobunus 7 ай бұрын

how to make your own language model? for example I want to take some texts and force AI to use only this text to answer my questions

@volkanazer9997 3 ай бұрын

Let me know when you've got it figured out. I'm curious about this as well.

@kevinehsani3358 8 ай бұрын

I am sure windows users can probably install it under wsl

@samwitteveenai 8 ай бұрын

I was wondering about this. i asked one of my staff to give it a quick try it but he couldn't get it working.

@VijayChintapandu 2 ай бұрын

My system is very slow when I am running Ollama. My system is mac M2. Is this issue?

@samwitteveenai 2 ай бұрын

depends which model you are trying to run. The video was done on a M2 Mac Mini

@SharePointMaster 2 ай бұрын

@@samwitteveenai ohh thanks for the reply. mine also same Mac air with M2 chip but it was slow. I will check

@foolcj9999 3 ай бұрын

can u make a video of ollama interaction using voice input. and it replies back like whisper

@samwitteveenai 2 ай бұрын

Interesting Idea!

@HitopFaded 3 ай бұрын

I’m trying to run it in a python environment if possible to build on top of it

@samwitteveenai 3 ай бұрын

I have another vid there on Ollama's Python SDK etc

@HitopFaded 3 ай бұрын

@@samwitteveenaithanks I’ll check it out

@stanTrX 2 ай бұрын

can i upload and work with documents with ollama?

@samwitteveenai 2 ай бұрын

yes you will need to code it to do a custom RAG

@stanTrX 2 ай бұрын

@@samwitteveenai thanks good man but whats custom RAG?

@kunalr_ai 8 ай бұрын

Why this new model

@LITTLEFREDOX2 3 ай бұрын

windows version is here

@DaeOh 8 ай бұрын

Would you consider not referring to models like Llama and Mistral as "open-source?" It sets a precedent. "Freeware," maybe?

@alx8439 8 ай бұрын

It's a good question how we should refer to such models. It's not 100% Foss compliant because of the restrictions which come into place if you have like 700 millions of users, if my memory serves me well. But this is more like restriction for a couple of companies like ms, Google, tiktok. Who cares about them? Or am I missing something bigger?

@spirobel2.0 8 ай бұрын

Mistral is completely open

@clray123 8 ай бұрын

Do not mix Llama and Mistral together. Mistral has a truly open license, Llama is the Facebook/Meta poison.

@DaeOh 8 ай бұрын

It's not open-source because you can't reproduce it without the source (training data)... Just making the equivalent of binaries available for commercial use doesn't make something "open-source..."

@fontende 8 ай бұрын

I run only locally and cloud services are anyway blocked to our region (quite many people don't have access to such, more than 2 billions, China+dozen others countries, mostly by political non-scentific logic). And hardware allow such, thankfully to China which recycle servers and bringing on market quite a secret chips from Intel, like Xeon 22 cores, which was never released outside enterprise, it costs only 150 bucks. My motherboard Asrock X99 Extreme4 become defacto standard in China for such socket, also 150 bucks, can be filled with 256gb Ram, i've bought mine in 2020th during Gpt2, which was impossible to run locally in it's max size 1558millions, there wasn't any of current tools, i was able to run by terminal 774millions on Gpu and it's was a mess of text.