Ollama AI Home Server ULTIMATE Setup Guide

Рет қаралды 25,774

Күн бұрын

Пікірлер: 89

@DigitalSpaceport 2 ай бұрын

Check out the QUAD 3090 AI SERVER RIG BUILD kzbin.info/www/bejne/gH-XdpuXgpypr9k and hit SUBSCRIBE so you don't miss the next video we release on this!

@xtcee 2 ай бұрын

These videos are great. Keep doing different ones and benchmarks they are so useful. Especially showing different size models so one can gauge what is required to run one.

@DigitalSpaceport 2 ай бұрын

Thanks I am developing my own logic testing so it doesnt overlap hard on others, but already have some very interesting findings around logic tests in the LLAMA3.1 branch. I do aim to test and incorporate into a kinda ongoing comparison many models. Thinking of Mixtral large next.

@TheInternalNet 3 ай бұрын

Wow this is really really exciting. As someone who is diving in headfirst and about to purchase my rig this month. I'm interested in everything to do with this. I am excited to see what other applications this is good for and what hardware would be the best for what I am trying to do. Thank you again

@DigitalSpaceport 2 ай бұрын

Thanks I am super excited also. The past 2 months have held some advancements that really changed how I view local Ai capabilities. Im playing catchup as well on some of this but happy to share along the way. There is another Insane class video and a much lower spec on deck. Im working on the benchmarks now and will also be sharing that also. The use cases approach for these systems already has me blown away. So much document processing power will be in that first video.

@TheInternalNet 2 ай бұрын

@@DigitalSpaceport I would love to see how that plays out as well as what it would take to do some data generation as well as fine tuning and training. Maybe even see that 405 model just run period. Finding a way to split it up between machines. So many possibilities

@nufh 2 ай бұрын

Hey, thank you for this tutorial. I managed to do it on my machine.

@DigitalSpaceport 2 ай бұрын

Glad it helped!

@jorgeguzman8083 2 ай бұрын

Just want to mention that I have a much less powerful home server with a single gpu, but I'm loving using nixos instead of ubuntu. I started with ubuntu, but I really love how I can just add in persistent docker containers as a module or a development environment as a module and mount an external hard drive as a module and just plug and play everything without worry about breaking the whole thing by installing something the wrong way. Add in CUDA versions and remove CUDA versions, etc. Just a suggestion. Claude ai is pretty good at writing the modules I want and fixing the errors that I run into, but I really recommend using nix for a home server, because it's easy to upgrade, downgrade, add features and remove them just by commenting out lines of the nixos configuration file.

@DigitalSpaceport 2 ай бұрын

Ive thought about it. Big fan of chris and alex self hosted podcast and he does love it. I usually use proxmox on all the things, but demo ubuntu for end users to provide a common base that is easy to understand instructions wise.

@Br4ne 2 ай бұрын

yes finaly some AI content! i love it. i really would appreciate if you would do some content on a solution that can make decissions based on a set of PDF files (through either RAG or Training)

@DigitalSpaceport 2 ай бұрын

working on said content now

@nicosilva4750 Ай бұрын

Wow! Really appreciate the detail you go into setting this kind of a rig up. Was wondering why you didn't use nVlink on the 3090's.

@DigitalSpaceport Ай бұрын

For inference I dont think it will help but I need to actually test that theory out

@bustabob08 3 ай бұрын

Great content and I’m really looking forward for you to test some low budget rigs

@DigitalSpaceport 3 ай бұрын

Awesome thanks! I've got some cheaper graphics cards that im excited to share their performance in a desktop system.

@slowesttimelord 3 ай бұрын

Great video! Amazing how far we’ve come in such little time. Will you also be covering how to rent out compute resources when it’s idling?

@DigitalSpaceport 3 ай бұрын

Yes I am I hope a Quad GPU setup hopefully is attractive. My cable modem speeds might not be attractive I fear but thats good info also.

@Prometheus-Cybersecurity 2 ай бұрын

Can you do a vid on training your LLM? RAG?

@DigitalSpaceport 2 ай бұрын

RAG I have done and there is a video on it. I will have a much more in depth few videos on training. Its all on a storyboard being shot ant RAG is very high (and easy to achieve)

@sondrax 2 ай бұрын

I know it’d require a dif motherboard … but what do you guys think of a server build w 6 P40s to run Llama 3.1 70B 16Q? Trying to achieve enough vram with budget to NOT have to go down to q8 or q4… Wonder would it work but be too slow to use? And hey! Thanks man… for posting these… aren’t too many peeps trying so it’s hard for a novice to plan out a system build…. So thanks for the work involved!

@DigitalSpaceport 2 ай бұрын

Id be checking into Pascal gen fp capabilities to see if they align with 3.1 and also cuda version support. I think pascal is getting too old for some more modern cuda workloads to support. I know fp8 and fp16 is a major difference between 3090 and 4090 for instance and has impacts on some quants and models. Im honestly learning like crazy rn so could also be wrong on that. So much info overload!

@rhadiem 2 ай бұрын

I keep looking for conclusion charts on your tests but don't see any in your chapter markers :( Please consider this. Lots of content to watch, definitely focused on those who save me time. Cheers.

@DigitalSpaceport 2 ай бұрын

Yes this is actually a very hard topic to get right and very easy to get wrong. Im working with a psychologist now to help create assessments that matter to real humans and there will be a non-video web based component as well that ships with those videos. Its a surprisingly hard task to do this right as the edge of SOTA progresses on capabilities very fast.

@mikemikez9490 2 ай бұрын

I love the 8B model, but I was disappointed with its ability to review documents uploaded to open web UI and reference them accurately. It’s 90% hallucinations . Is there a no code friendly rag system that I could implement locally? I’d like to use 3.1 8b or gemma2 9b it help me edit my writing for a book I’m working.

@DigitalSpaceport 2 ай бұрын

Yes. Ive experienced this also and so much yes. Im searching for this also, its gotta be easier for normies like us.

@maxmustermann194 5 күн бұрын

@@DigitalSpaceportqwen 2.5 at 14b scanned a 90 page manual just fine and summarised the info bits I was looking for pretty much perfectly. Running it on a single 12 GB RTX 3080ti. Really impressive for the model size.

@GrandpasPlace Ай бұрын

Iv got the 8b model running on a system with two 1060ti 8g cards and it runs fine.

@DigitalSpaceport Ай бұрын

Yep I have a video out tomorrow and I got to play around with my 1070ti and benched it and its actually pretty fast on an 8b. They also can mix and match seamlessly irregardless of vram size or generation or width

@GrandpasPlace Ай бұрын

@@DigitalSpaceport Mine feels a little slow but it is very usable. I have it paired with anythingllm and N8N. I use N8N to create flows. So for example I ask how it is feeling and it does an api call to pull the system health (python script) then uses api calls to anythingllm to formulate a response based on the health information of the machine it is running on. Ive written a bunch of little scripts like that so that N8N can pull the real time data and feed it to the LLM to create the response. ;)

@DigitalSpaceport Ай бұрын

That's very cool! I just added n8n to my eval list. Sounds pretty awesome if paired with say glances or netdata as well

@GrandpasPlace Ай бұрын

@@DigitalSpaceport Ive not seen glances. Ill add it to my list to check out, thank you.

@BigFourHead Ай бұрын

very interesting, but can i ask what would someone use this for? after some real world examples? is it to just host your own chatGPT?

@Drkayb 3 ай бұрын

Excellent work. Could you cover how the Mistral Large 2 model works on there?

@DigitalSpaceport 3 ай бұрын

Okay Ill put that in the benchmarks video. Any other models?

@Drkayb 3 ай бұрын

@@DigitalSpaceport Probably Codellama, Codestral and Deepseek coder v2 on the shortlist. Maybe also Mixtral 8x7b, and Mixtal 8x22b if that has a quantized version. I'm mostly looking into humaneval and logic type models atm, so mainly interested in Llama 3.1 and Mistral Large 2 for TTS-type stuff, chat interaction, audiobook generation, websearch integration etc

@jonathansadventures8487 2 ай бұрын

I am building a single 4080 super system to run llama. My focus is going to be coding, and I am looking foward to seeing more of your videos

@DigitalSpaceport 2 ай бұрын

Awesome the 4xxx series is awesome for rigs as they idle so very low. The first full in depth model review video drops this week.

@pewpaws 9 күн бұрын

Where are all of the links? I looked high and low for your written part with links but couldn't find it. I was able just to type in a bunch of stuff from your screen and make it work lol. Thanks, I had been trying and failing piecing incomplete tutorials together for 2 weeks.

@DigitalSpaceport 8 күн бұрын

Written piece and GPU Rack Modification Instructions digitalspaceport.com/ollama-gpu-3090-home-server-quad-gpu-ai-training-rig/ I've added this link to the description, and you can also check the description for the links to the AI Server Build video and all of the parts for it.

@pewpaws 8 күн бұрын

@@DigitalSpaceport TY! And for anyone that reads : your tutorial + AMD Ryzen 5 2600 + GTX 1660 Super runs Minstral 7B flawless. 22b will run but slow and crashes (not worth).

@szebike Ай бұрын

Awesome!

@martin777xyz Ай бұрын

Any update on running the 405? Able to run CLI only, or also via web? 🙏

@Mathingon 3 ай бұрын

I think it’s an ollama issue, I had this before when llama 3.1 came out and was running the 8B version. I deleted the model, waited a day to reinstall and it worked fine after that.

@DigitalSpaceport 2 ай бұрын

After reading this I went even lazier and just rebooted the machine. It then worked lol. Thanks for the inspiration 🙏

@rdsii64 3 ай бұрын

I think I'm going to build an uncensored open source model to run on my own hardware. I'm ball'n on a budget so I'm going with old school epyc 7551's and amd 16 gig graphics cards.

@DigitalSpaceport 3 ай бұрын

This is the person I am doing this for. A self hosting bad@$$. Ollama has training support i'm seeing, but have yet to attempt training outside the cloud. So much learning going on right now its crazy.

@MikaDo0 2 ай бұрын

hello nice video series why are u using the 22.04 instead of the new one ?

@DigitalSpaceport 2 ай бұрын

Had issues with CUDA versions but in reading those may have been fixed. It looks like it was on nvidia side

@ivanczarapanau9417 2 ай бұрын

Cool rig! Running 70b at that speed is impressive, running 405b too I know a tool that can simplify things you mentioned about docker, ollama and webui, ping me if that's relevant

@DigitalSpaceport 2 ай бұрын

ping

@Prometheus-Cybersecurity 2 ай бұрын

Can you talk about how to train it with RAG?

@DigitalSpaceport 2 ай бұрын

High on the Next whiteboard

@ElementX32 Ай бұрын

Sir, do you lend your skills for a fee? I want a home lab and just don't know where to start. I know mentioned that you're in Austin, Texas and I'm in North Texas (DFW) area, Grand Prairie to be exact.

@DigitalSpaceport 25 күн бұрын

No I dont consult, but I have considered it. Not going to yet but maybe in the future.

@janezkricej9129 Ай бұрын

Is it possible to run llama3.1 70b on 2 3090s ? I am aware that the performance will be slower. I am building a similar rig but starting with only 2 3090s. I really appreciate the time and effort you put into your videos. Cheers!

@DigitalSpaceport Ай бұрын

Sir your answer is my latest video. I show you exactly what you can expect on 2 3090s. kzbin.info/www/bejne/Y5nId4N-gN5moLs

@janezkricej9129 Ай бұрын

@@DigitalSpaceport You are the best! This is extremely helpful! Thank you once again for providing quality content.

@MrI8igmac 6 күн бұрын

I spent all day trying to get docker and open webui running with ollama. The web ui has no models in the drop down bars. I have a few gaming machines i want to cluster. But my test run was an epic fail.

@DigitalSpaceport 6 күн бұрын

You need to download the models from yhe ollama website tag that you want. Here ie a timestamp that shows this. kzbin.info/www/bejne/ip6xhHehn6mHhdU

@AP-ib7rf 2 ай бұрын

tell me about the case for that beast!

@jonm7547 10 күн бұрын

I am trying to upload pdf documents and analyse with some of these models, same ollama-docker -webui setup, Linux. So far all they see are snipets of documents uploaded , which makes these LLMs useless. The only one that can do this properly appears to be ChatGPT40, however, I am not comfortable uploading my pdfs on line as there are some confidentiality issues. If anyone can give me an idea on how this should be done on the locally installed LLMs, and which LLM is best for this, this would be greatly appreciated.

@DigitalSpaceport 9 күн бұрын

Oh you likely need to have your document ingest setup and running. I recommend watching this video I have over here: kzbin.info/www/bejne/f3TCfXqjps-lr8k which does cover the document settings I am using but I didnt cram it into the title. I will do a dedicated full feature RAG video on document ingesting and vector databases very soon to cover this in detail. I agree 100% about uploading documents. You are their next sellable datapoint product!

@bluesquadron593 3 ай бұрын

Which model can I use to work with documents?

@DigitalSpaceport 2 ай бұрын

If you are using the document upload directly in openwebui, Ive had issue is putting it lightly. Major hallucy. RAG is really the answer, but its non trivial as it is now.

@bluesquadron593 2 ай бұрын

@@DigitalSpaceport exactly the same as experience

@egokhanturk 3 ай бұрын

Dude, you have so many systems; why do you use Rufus instead of using Ventoy?

@DigitalSpaceport 3 ай бұрын

Former Ventoy user here. I've got concerns I outlined in this issue on gh. github.com/ventoy/Ventoy/issues/2795#issuecomment-2041631870

@KJ-xt3yu 3 ай бұрын

me wondering if its capable to 2x cable to another board setup 🍿🍿🍿🍿 and than another....and another...compute, for your compute, for your compute....?

@KJ-xt3yu 3 ай бұрын

wasnt there a pcie gen that was ment to connect directly from motherboard header to another motherboard via cable and onboard support?

@KJ-xt3yu 3 ай бұрын

potentially opening the door to localized expandable compute hardware systems/labs and maaaaybe bringing back "lan party" 🍿🍿🍿😂

@DigitalSpaceport 3 ай бұрын

Im not unfamiliar with High Performance Parallel Compute, which is in essence what your use case would be at an enterprise level for that. However they abandoned that pathway in favor of RDMA a pretty good while back. If you can access the RAM of another machine directly and at near line speeds, its the same end result. RDMA us what is used for that exact purpose in all the state of the art GPU clusters to interconnect currently and train massive models.

@KJ-xt3yu 3 ай бұрын

@@DigitalSpaceport me soo not wanting 4 boards, dual cpu's, and all the pcie lanes, and the bridge between all of them 🍿....why hello vr and ai 😎🍿

@LucasAlves-bs7pf 3 ай бұрын

Why this machine must be a server and not your main computer?

@kiracrossings 3 ай бұрын

AI needs raw power. and putting 4x RTX 3090 24GB and 512GB DDR4 ram in a normal pc format, cost more money then getting a server motherboard. but you can run the 7b instead of the 70b model on a high end gamers pc

@DigitalSpaceport 3 ай бұрын

If you are just doing inference on a 7b it can absolutely be a desktop. These same steps would work there on consumer class also. If you aim for training, then you need pcie full lane bandwidth at max generational speed. Also if you want to run a 70b vs a 7b, you need a lot of GPUs. Ex server gear provides substantially more horsepower per $ and can scale better as a result of that if you are doing both.

@mohammdmodan5038 2 ай бұрын

try vLLM

@DigitalSpaceport 2 ай бұрын

Yes indeed this is on deck. I want to test ALL models more easily. I'm redoing the software setup into a more agnostic manner so it can run multiple model servers which has been somewhat more challenging then I anticipated at first

@hooni-mining 2 ай бұрын

@DigitalSpaceport 2 ай бұрын

👋

@NetrunnerAT 3 ай бұрын

Sorry ... your rigs are fare away from home server builds.

@mrlost117 3 ай бұрын

@@NetrunnerAT to you

@DigitalSpaceport 3 ай бұрын

I know. Ive only got 4 24GB cards running right now in this system. Im working on it.

@celestinakamura6076 2 ай бұрын

@NetrunnerAT maybe for you my friend but if you compare what a large business can have its no where close

@DigitalSpaceport 2 ай бұрын

Im not trying to be a big business running anything here..... If the video didnt make that apparent (it should have)

@ElementX32 Ай бұрын

@NetrunnerAT agreed! Not to mention this guy is highly intelligent. However, I'm going to try my hand at building the same rig. @DigitalSpaceport you're clearly one of my favorite content creators by far. Thank you sir for sharing your vast knowledge.