Check out the QUAD 3090 AI SERVER RIG BUILD kzbin.info/www/bejne/gH-XdpuXgpypr9k and hit SUBSCRIBE so you don't miss the next video we release on this!
@xtcee2 ай бұрын
These videos are great. Keep doing different ones and benchmarks they are so useful. Especially showing different size models so one can gauge what is required to run one.
@DigitalSpaceport2 ай бұрын
Thanks I am developing my own logic testing so it doesnt overlap hard on others, but already have some very interesting findings around logic tests in the LLAMA3.1 branch. I do aim to test and incorporate into a kinda ongoing comparison many models. Thinking of Mixtral large next.
@TheInternalNet3 ай бұрын
Wow this is really really exciting. As someone who is diving in headfirst and about to purchase my rig this month. I'm interested in everything to do with this. I am excited to see what other applications this is good for and what hardware would be the best for what I am trying to do. Thank you again
@DigitalSpaceport2 ай бұрын
Thanks I am super excited also. The past 2 months have held some advancements that really changed how I view local Ai capabilities. Im playing catchup as well on some of this but happy to share along the way. There is another Insane class video and a much lower spec on deck. Im working on the benchmarks now and will also be sharing that also. The use cases approach for these systems already has me blown away. So much document processing power will be in that first video.
@TheInternalNet2 ай бұрын
@@DigitalSpaceport I would love to see how that plays out as well as what it would take to do some data generation as well as fine tuning and training. Maybe even see that 405 model just run period. Finding a way to split it up between machines. So many possibilities
@nufh2 ай бұрын
Hey, thank you for this tutorial. I managed to do it on my machine.
@DigitalSpaceport2 ай бұрын
Glad it helped!
@jorgeguzman80832 ай бұрын
Just want to mention that I have a much less powerful home server with a single gpu, but I'm loving using nixos instead of ubuntu. I started with ubuntu, but I really love how I can just add in persistent docker containers as a module or a development environment as a module and mount an external hard drive as a module and just plug and play everything without worry about breaking the whole thing by installing something the wrong way. Add in CUDA versions and remove CUDA versions, etc. Just a suggestion. Claude ai is pretty good at writing the modules I want and fixing the errors that I run into, but I really recommend using nix for a home server, because it's easy to upgrade, downgrade, add features and remove them just by commenting out lines of the nixos configuration file.
@DigitalSpaceport2 ай бұрын
Ive thought about it. Big fan of chris and alex self hosted podcast and he does love it. I usually use proxmox on all the things, but demo ubuntu for end users to provide a common base that is easy to understand instructions wise.
@Br4ne2 ай бұрын
yes finaly some AI content! i love it. i really would appreciate if you would do some content on a solution that can make decissions based on a set of PDF files (through either RAG or Training)
@DigitalSpaceport2 ай бұрын
working on said content now
@nicosilva4750Ай бұрын
Wow! Really appreciate the detail you go into setting this kind of a rig up. Was wondering why you didn't use nVlink on the 3090's.
@DigitalSpaceportАй бұрын
For inference I dont think it will help but I need to actually test that theory out
@bustabob083 ай бұрын
Great content and I’m really looking forward for you to test some low budget rigs
@DigitalSpaceport3 ай бұрын
Awesome thanks! I've got some cheaper graphics cards that im excited to share their performance in a desktop system.
@slowesttimelord3 ай бұрын
Great video! Amazing how far we’ve come in such little time. Will you also be covering how to rent out compute resources when it’s idling?
@DigitalSpaceport3 ай бұрын
Yes I am I hope a Quad GPU setup hopefully is attractive. My cable modem speeds might not be attractive I fear but thats good info also.
@Prometheus-Cybersecurity2 ай бұрын
Can you do a vid on training your LLM? RAG?
@DigitalSpaceport2 ай бұрын
RAG I have done and there is a video on it. I will have a much more in depth few videos on training. Its all on a storyboard being shot ant RAG is very high (and easy to achieve)
@sondrax2 ай бұрын
I know it’d require a dif motherboard … but what do you guys think of a server build w 6 P40s to run Llama 3.1 70B 16Q? Trying to achieve enough vram with budget to NOT have to go down to q8 or q4… Wonder would it work but be too slow to use? And hey! Thanks man… for posting these… aren’t too many peeps trying so it’s hard for a novice to plan out a system build…. So thanks for the work involved!
@DigitalSpaceport2 ай бұрын
Id be checking into Pascal gen fp capabilities to see if they align with 3.1 and also cuda version support. I think pascal is getting too old for some more modern cuda workloads to support. I know fp8 and fp16 is a major difference between 3090 and 4090 for instance and has impacts on some quants and models. Im honestly learning like crazy rn so could also be wrong on that. So much info overload!
@rhadiem2 ай бұрын
I keep looking for conclusion charts on your tests but don't see any in your chapter markers :( Please consider this. Lots of content to watch, definitely focused on those who save me time. Cheers.
@DigitalSpaceport2 ай бұрын
Yes this is actually a very hard topic to get right and very easy to get wrong. Im working with a psychologist now to help create assessments that matter to real humans and there will be a non-video web based component as well that ships with those videos. Its a surprisingly hard task to do this right as the edge of SOTA progresses on capabilities very fast.
@mikemikez94902 ай бұрын
I love the 8B model, but I was disappointed with its ability to review documents uploaded to open web UI and reference them accurately. It’s 90% hallucinations . Is there a no code friendly rag system that I could implement locally? I’d like to use 3.1 8b or gemma2 9b it help me edit my writing for a book I’m working.
@DigitalSpaceport2 ай бұрын
Yes. Ive experienced this also and so much yes. Im searching for this also, its gotta be easier for normies like us.
@maxmustermann1945 күн бұрын
@@DigitalSpaceportqwen 2.5 at 14b scanned a 90 page manual just fine and summarised the info bits I was looking for pretty much perfectly. Running it on a single 12 GB RTX 3080ti. Really impressive for the model size.
@GrandpasPlaceАй бұрын
Iv got the 8b model running on a system with two 1060ti 8g cards and it runs fine.
@DigitalSpaceportАй бұрын
Yep I have a video out tomorrow and I got to play around with my 1070ti and benched it and its actually pretty fast on an 8b. They also can mix and match seamlessly irregardless of vram size or generation or width
@GrandpasPlaceАй бұрын
@@DigitalSpaceport Mine feels a little slow but it is very usable. I have it paired with anythingllm and N8N. I use N8N to create flows. So for example I ask how it is feeling and it does an api call to pull the system health (python script) then uses api calls to anythingllm to formulate a response based on the health information of the machine it is running on. Ive written a bunch of little scripts like that so that N8N can pull the real time data and feed it to the LLM to create the response. ;)
@DigitalSpaceportАй бұрын
That's very cool! I just added n8n to my eval list. Sounds pretty awesome if paired with say glances or netdata as well
@GrandpasPlaceАй бұрын
@@DigitalSpaceport Ive not seen glances. Ill add it to my list to check out, thank you.
@BigFourHeadАй бұрын
very interesting, but can i ask what would someone use this for? after some real world examples? is it to just host your own chatGPT?
@Drkayb3 ай бұрын
Excellent work. Could you cover how the Mistral Large 2 model works on there?
@DigitalSpaceport3 ай бұрын
Okay Ill put that in the benchmarks video. Any other models?
@Drkayb3 ай бұрын
@@DigitalSpaceport Probably Codellama, Codestral and Deepseek coder v2 on the shortlist. Maybe also Mixtral 8x7b, and Mixtal 8x22b if that has a quantized version. I'm mostly looking into humaneval and logic type models atm, so mainly interested in Llama 3.1 and Mistral Large 2 for TTS-type stuff, chat interaction, audiobook generation, websearch integration etc
@jonathansadventures84872 ай бұрын
I am building a single 4080 super system to run llama. My focus is going to be coding, and I am looking foward to seeing more of your videos
@DigitalSpaceport2 ай бұрын
Awesome the 4xxx series is awesome for rigs as they idle so very low. The first full in depth model review video drops this week.
@pewpaws9 күн бұрын
Where are all of the links? I looked high and low for your written part with links but couldn't find it. I was able just to type in a bunch of stuff from your screen and make it work lol. Thanks, I had been trying and failing piecing incomplete tutorials together for 2 weeks.
@DigitalSpaceport8 күн бұрын
Written piece and GPU Rack Modification Instructions digitalspaceport.com/ollama-gpu-3090-home-server-quad-gpu-ai-training-rig/ I've added this link to the description, and you can also check the description for the links to the AI Server Build video and all of the parts for it.
@pewpaws8 күн бұрын
@@DigitalSpaceport TY! And for anyone that reads : your tutorial + AMD Ryzen 5 2600 + GTX 1660 Super runs Minstral 7B flawless. 22b will run but slow and crashes (not worth).
@szebikeАй бұрын
Awesome!
@martin777xyzАй бұрын
Any update on running the 405? Able to run CLI only, or also via web? 🙏
@Mathingon3 ай бұрын
I think it’s an ollama issue, I had this before when llama 3.1 came out and was running the 8B version. I deleted the model, waited a day to reinstall and it worked fine after that.
@DigitalSpaceport2 ай бұрын
After reading this I went even lazier and just rebooted the machine. It then worked lol. Thanks for the inspiration 🙏
@rdsii643 ай бұрын
I think I'm going to build an uncensored open source model to run on my own hardware. I'm ball'n on a budget so I'm going with old school epyc 7551's and amd 16 gig graphics cards.
@DigitalSpaceport3 ай бұрын
This is the person I am doing this for. A self hosting bad@$$. Ollama has training support i'm seeing, but have yet to attempt training outside the cloud. So much learning going on right now its crazy.
@MikaDo02 ай бұрын
hello nice video series why are u using the 22.04 instead of the new one ?
@DigitalSpaceport2 ай бұрын
Had issues with CUDA versions but in reading those may have been fixed. It looks like it was on nvidia side
@ivanczarapanau94172 ай бұрын
Cool rig! Running 70b at that speed is impressive, running 405b too I know a tool that can simplify things you mentioned about docker, ollama and webui, ping me if that's relevant
@DigitalSpaceport2 ай бұрын
ping
@Prometheus-Cybersecurity2 ай бұрын
Can you talk about how to train it with RAG?
@DigitalSpaceport2 ай бұрын
High on the Next whiteboard
@ElementX32Ай бұрын
Sir, do you lend your skills for a fee? I want a home lab and just don't know where to start. I know mentioned that you're in Austin, Texas and I'm in North Texas (DFW) area, Grand Prairie to be exact.
@DigitalSpaceport25 күн бұрын
No I dont consult, but I have considered it. Not going to yet but maybe in the future.
@janezkricej9129Ай бұрын
Is it possible to run llama3.1 70b on 2 3090s ? I am aware that the performance will be slower. I am building a similar rig but starting with only 2 3090s. I really appreciate the time and effort you put into your videos. Cheers!
@DigitalSpaceportАй бұрын
Sir your answer is my latest video. I show you exactly what you can expect on 2 3090s. kzbin.info/www/bejne/Y5nId4N-gN5moLs
@janezkricej9129Ай бұрын
@@DigitalSpaceport You are the best! This is extremely helpful! Thank you once again for providing quality content.
@MrI8igmac6 күн бұрын
I spent all day trying to get docker and open webui running with ollama. The web ui has no models in the drop down bars. I have a few gaming machines i want to cluster. But my test run was an epic fail.
@DigitalSpaceport6 күн бұрын
You need to download the models from yhe ollama website tag that you want. Here ie a timestamp that shows this. kzbin.info/www/bejne/ip6xhHehn6mHhdU
@AP-ib7rf2 ай бұрын
tell me about the case for that beast!
@jonm754710 күн бұрын
I am trying to upload pdf documents and analyse with some of these models, same ollama-docker -webui setup, Linux. So far all they see are snipets of documents uploaded , which makes these LLMs useless. The only one that can do this properly appears to be ChatGPT40, however, I am not comfortable uploading my pdfs on line as there are some confidentiality issues. If anyone can give me an idea on how this should be done on the locally installed LLMs, and which LLM is best for this, this would be greatly appreciated.
@DigitalSpaceport9 күн бұрын
Oh you likely need to have your document ingest setup and running. I recommend watching this video I have over here: kzbin.info/www/bejne/f3TCfXqjps-lr8k which does cover the document settings I am using but I didnt cram it into the title. I will do a dedicated full feature RAG video on document ingesting and vector databases very soon to cover this in detail. I agree 100% about uploading documents. You are their next sellable datapoint product!
@bluesquadron5933 ай бұрын
Which model can I use to work with documents?
@DigitalSpaceport2 ай бұрын
If you are using the document upload directly in openwebui, Ive had issue is putting it lightly. Major hallucy. RAG is really the answer, but its non trivial as it is now.
@bluesquadron5932 ай бұрын
@@DigitalSpaceport exactly the same as experience
@egokhanturk3 ай бұрын
Dude, you have so many systems; why do you use Rufus instead of using Ventoy?
@DigitalSpaceport3 ай бұрын
Former Ventoy user here. I've got concerns I outlined in this issue on gh. github.com/ventoy/Ventoy/issues/2795#issuecomment-2041631870
@KJ-xt3yu3 ай бұрын
me wondering if its capable to 2x cable to another board setup 🍿🍿🍿🍿 and than another....and another...compute, for your compute, for your compute....?
@KJ-xt3yu3 ай бұрын
wasnt there a pcie gen that was ment to connect directly from motherboard header to another motherboard via cable and onboard support?
@KJ-xt3yu3 ай бұрын
potentially opening the door to localized expandable compute hardware systems/labs and maaaaybe bringing back "lan party" 🍿🍿🍿😂
@DigitalSpaceport3 ай бұрын
Im not unfamiliar with High Performance Parallel Compute, which is in essence what your use case would be at an enterprise level for that. However they abandoned that pathway in favor of RDMA a pretty good while back. If you can access the RAM of another machine directly and at near line speeds, its the same end result. RDMA us what is used for that exact purpose in all the state of the art GPU clusters to interconnect currently and train massive models.
@KJ-xt3yu3 ай бұрын
@@DigitalSpaceport me soo not wanting 4 boards, dual cpu's, and all the pcie lanes, and the bridge between all of them 🍿....why hello vr and ai 😎🍿
@LucasAlves-bs7pf3 ай бұрын
Why this machine must be a server and not your main computer?
@kiracrossings3 ай бұрын
AI needs raw power. and putting 4x RTX 3090 24GB and 512GB DDR4 ram in a normal pc format, cost more money then getting a server motherboard. but you can run the 7b instead of the 70b model on a high end gamers pc
@DigitalSpaceport3 ай бұрын
If you are just doing inference on a 7b it can absolutely be a desktop. These same steps would work there on consumer class also. If you aim for training, then you need pcie full lane bandwidth at max generational speed. Also if you want to run a 70b vs a 7b, you need a lot of GPUs. Ex server gear provides substantially more horsepower per $ and can scale better as a result of that if you are doing both.
@mohammdmodan50382 ай бұрын
try vLLM
@DigitalSpaceport2 ай бұрын
Yes indeed this is on deck. I want to test ALL models more easily. I'm redoing the software setup into a more agnostic manner so it can run multiple model servers which has been somewhat more challenging then I anticipated at first
@hooni-mining2 ай бұрын
hi
@DigitalSpaceport2 ай бұрын
👋
@NetrunnerAT3 ай бұрын
Sorry ... your rigs are fare away from home server builds.
@mrlost1173 ай бұрын
@@NetrunnerAT to you
@DigitalSpaceport3 ай бұрын
I know. Ive only got 4 24GB cards running right now in this system. Im working on it.
@celestinakamura60762 ай бұрын
@NetrunnerAT maybe for you my friend but if you compare what a large business can have its no where close
@DigitalSpaceport2 ай бұрын
Im not trying to be a big business running anything here..... If the video didnt make that apparent (it should have)
@ElementX32Ай бұрын
@NetrunnerAT agreed! Not to mention this guy is highly intelligent. However, I'm going to try my hand at building the same rig. @DigitalSpaceport you're clearly one of my favorite content creators by far. Thank you sir for sharing your vast knowledge.