DIY 4x Nvidia P40 Homeserver for AI with 96gb VRAM!

  Рет қаралды 60,822

AI HOMELAB

AI HOMELAB

Күн бұрын

Пікірлер: 345
@AI-HOMELAB
@AI-HOMELAB 3 ай бұрын
Important Note: First enable "above 4g decoding" before you insert a GPU with more than 16gb of VRAM. Otherwise your computer won't post.
@thomaslindell5448
@thomaslindell5448 2 ай бұрын
Thanks for that
@CoreyPL
@CoreyPL 2 ай бұрын
That option should be enabled even with single card with high amounts of vRAM, so the device could request PCI BAR space greater than 4GB without colliding with any other device on the system, since PCI BAR is responsible for address ranges of whole system, not just the GPU. It should be generally ok to enable this on any 64bit modern OS.
@billkillernic
@billkillernic Ай бұрын
doesnt ram have lifetime warranty?
@CoreyPL
@CoreyPL Ай бұрын
@@billkillernic 1. Not all RAM is sold with a "Lifetime warranty", especially server RAM. 2. "Lifetime warranty" means "for the lifetime of the product", so until this product is actively being made and sold (so until EoL), not your lifetime or lifetime of your server. 3. Warranty usually applies only to the original purchaser and is not transferred to the 2nd hand market. Most times it is tied to a original server purchase. 4. Since RAM was bought as "used" it had at most a seller warranty (if any) or any of the platforms "secure buy" options.
@billkillernic
@billkillernic Ай бұрын
@@CoreyPL nah that's too much red tape, I would still try to RMA it, companies that have high profit margins usually are not assholes to little fish especially on products that dont have a high rate of failure. I have RMAed mobos well outside their 2 year warranty, 5.1 sound systems (logitech) and also ram that was at least 6 years old without me having a receipt etc. surely it may depend on the individual that handles your case and on the company but I still would try .
@brian2590
@brian2590 2 ай бұрын
The price is just right to offset power usage with Lithium batteries and solar. I am just getting setup for solar with minor offsets. It is good to see more interest in self hosting AI. The industry is nauseating right now. I firmly believe it is WE who will properly apply this technology in practical ways and put it to real world use. I build a similar setup last year, so far no major issues. Needed to eventually add a node with a 4090 as my applications are now supporting up to 8 simultaneous users. P40s run batch jobs... 4090 is user facing and more interactive. Looking forward to more videos and exciting applications from the homelab community.
@AI-HOMELAB
@AI-HOMELAB 2 ай бұрын
Sounds like you have an awesome set up! Do you use it for your business? I really believe in the same principles. Everyone needs to be able to host his or her own AI models. This is how we will all profit from ML, not just big companies. However - Nvidia really needs to implement fair pricing on Vram. What they do right now is an insane overinflation on high Vram Cards fir the people who can afford them...
@TheInternalNet
@TheInternalNet 2 ай бұрын
These are the game GPUs I was looking at for my build. Into a supermicro chassis. Thank you so much
@LabiaLicker
@LabiaLicker 2 ай бұрын
Just looked at the ebay cost and was blown away..... You've earned yourself another sub
@AINMEisONE
@AINMEisONE 2 ай бұрын
Super Good job!!! Thanks for your contribution to helping others!
@CossuttaDario
@CossuttaDario 2 ай бұрын
I have so many questions you could make videos about! -How to properly parallelize agentic workflows over multiple gpus -inference llm peformace comparisons between different Nvidia GPU generations provided same vram size - how to work with the npu in the new AMD laptop CPUs - how do laptop GPUs compare between them for these kinds of workloads(ofc isn't optimal but if needed..) Great video, insta sub!
@TheLibertyfarmer
@TheLibertyfarmer 2 ай бұрын
I've done some of those experiments. My 3x P40 server can inference up to a single ~120b Q4 model with cpu offloading, or ~12 8b Q4 models simultaneously. I've done laptop GPU experiments with a range of laptop GPUs. From Nvidia Maxwell to Turing. A high end Maxwell mobile GPU (like the 8GB 980m/Tesla M6/Quadro M5000m) is still enough for a conversational speed local chatbot, as well as Stable Diffusion. The 6GB Pascal and Turing based mobile GPUs are each incrementally faster than the previous and still capable of 8b Q4 models. The 8GB Pascal cards are a little better. Turing gen mobile GPUs are strong performers. the 8Gb Turing cards pump out ~2x the Tok/s of pascal. IMO the 16GB Quadro RTX 5000 mobile is a best budget option for experimentation. You can find used Dell Precision 7740 mobile workstations with the 16GB RTX 5000 (faster inference than P40) for under a grand on ebay, or upgrade a cheaper 7730 with one for a little less. I run both a 7740 and an upgraded 7730 with the Quadro RTX 5000 mobile and am very happy with both.
@AI-HOMELAB
@AI-HOMELAB 2 ай бұрын
@@CossuttaDario Thank you very much for the video ideas and your nice Feedback! I am sure I can do most of the suggested contents. I played around with many of these components. ;-)
@k1tajfar714
@k1tajfar714 Ай бұрын
@@TheLibertyfarmer Why dont you have any VIDEOSSSSSSS 😭😭😭😭😭😭😭 i need help on this topic!
@TheLibertyfarmer
@TheLibertyfarmer Ай бұрын
@@k1tajfar714 I'm always too busy doing 'the thing' to make videos about it. I will say that this channel is solid. I'm sure the owner has plans for some of those videos.
@AI-HOMELAB
@AI-HOMELAB Ай бұрын
I definitely will & thank you 😉✌️.
@TheLibertyfarmer
@TheLibertyfarmer 2 ай бұрын
Very nice. The total system price really jumps up at 4+ P40s. A high value system with 3 P40s for those that can't afford that price tier can be built out of a surplus HP Z840 workstation. I have my eye on some v4 Xeon dual processor mining boards (with full bandwidth PCIe slots) for my next rig. I'll be making an open air build with 5+ P40s. Figure I've come this far might as well get weird with it.
@AI-HOMELAB
@AI-HOMELAB 2 ай бұрын
Sounds awesome! Let me know how it goes! I'm on my way of creating a closed server with 6x P40ties, 3x M40ties and an RTX3090. I hope this might be enough to run Llama 3.1 405 in Q4 or Q3. I guess Mistral Large 2 in Q8 would also do the job. 🙈 I guess we all go a bit too far with it. But hey, it's fun, isn't it? 😂
@v0idbyt3
@v0idbyt3 Ай бұрын
haven't watched the video yet but i love this idea and the specs in the description seem great
@AI-HOMELAB
@AI-HOMELAB Ай бұрын
It's quite capable. Everything until Mistral Large or Wizzard 8*22b is more than usable. You are only out of luck with llama 405b - works, but snail mode 😅
@MrMoonsilver
@MrMoonsilver Ай бұрын
You are a conoisseur, I can tell from that sick EVGA case.
@AI-HOMELAB
@AI-HOMELAB Ай бұрын
I love my EVGA case! This has been one of their masterpieces. Unfortunately they don't do 16gpu server cases. So the next one will be built by a friend snd myself. 😬 & I promise it's going to be nice! (Just will probably take quite some time. Anyway: Thank you! 😊
@daedalusdenton9144
@daedalusdenton9144 16 күн бұрын
I remember seeing that case in Microcenter way back. I don't think they make it anymore. Thing is a monster.
@gregorengelter1165
@gregorengelter1165 2 ай бұрын
Very cool video! I do have one tip regarding the cooling of graphics cards, which you should definitely take a look at. I converted my M40 GPU to water cooling, as you can use the water block of a Titan X for this. As far as I know, you can use the water block of a Titan Xp for the P40. This means you can use normal fans and don't have to use 3D printed parts and loud fans. Of course it's more expensive, but you can make the device super quiet.
@ProjectPhysX
@ProjectPhysX 2 ай бұрын
I've been considering a 4x P40 setup for computational fluid dynamics simulations. VRAM capacity is the main limitation here. 96GB is enough for 1.8 billion cells, truly gigantic. With your RAM issue: try reseating the CPU, and check there is no bent pins. Also try swapping the undetected RAM stick with another RAM stick. It's probably just a bad connection, not a dead RAM stick.
@AI-HOMELAB
@AI-HOMELAB Ай бұрын
@erroneousbosch That's not correct. It 's not suited for training models. For inference though, it works perfectly well.
@AI-HOMELAB
@AI-HOMELAB Ай бұрын
@erroneousbosch That is not correct. It is not suitable for training models. But for inference it works perfectly well.
@WINTERMUTE_AI
@WINTERMUTE_AI 2 ай бұрын
2400USD actually seems really cheap for this... But of course, in America we are getting used to paying $10 for a big mac...
@giovannidurante4134
@giovannidurante4134 29 күн бұрын
Nonsense commenti, go have a look at the cost of a single video card in the video
@AI-HOMELAB
@AI-HOMELAB 29 күн бұрын
@WINTERMUTE_AI is referring to the cost of the whole build. =)
@AI-HOMELAB
@AI-HOMELAB 29 күн бұрын
And I guess mixed the 1 in 2100 up with a 4. ✌️
@kevinclark1466
@kevinclark1466 2 ай бұрын
Great video…. Looking forward to more in the future!
@AI-HOMELAB
@AI-HOMELAB 2 ай бұрын
@@kevinclark1466 Thank you very much! New content is coming in the middle of August. Am in vacation in Canada at the moment. =) RTX 3060, 3090, combinations and the Tesla M40 are projects I already started. I'll also try to create a machine which can run Llama 405b on the GPU - trying it on the GPU would also be fun potentially. 😇
@dholzric1
@dholzric1 2 ай бұрын
I have one i'm finishing up with 2 p40's and an amd 5700H. Thanks for sharing the shroud. I was about to make my own in Fusion, but i would rather not.
@AI-HOMELAB
@AI-HOMELAB 2 ай бұрын
@@dholzric1 Your welcome, glad if I could help with that! Two P40ties should do the trick for quite a lot of LLMs. Even Llama 3.1 70b should run.
@TheLibertyfarmer
@TheLibertyfarmer 2 ай бұрын
@@AI-HOMELAB 4bit quants of 70b models need a little offloading for the added vram requirements for the context window, but it is just a few layers and inference speed is around ~ 10TPS depending on your inference software.
@motomason-fv6ux
@motomason-fv6ux 2 ай бұрын
Awesome build thanks for sharing and benchmarking! Sadly the P40 has gone to $350+ so probably worth looking at 30/40-90's now, still a great setup though!
@AI-HOMELAB
@AI-HOMELAB 2 ай бұрын
True... and quite unfortunate - it's a nice card, but it's not worth more than 200 USD at maximum. I am hoping to see prices of used 3090 fall even more as the 5000series gets announced & released. But we'll have to wait and see. Thank you for your positive comment! 😃
@Gastell0
@Gastell0 2 ай бұрын
6:00 - That's not for cooling mount, that for chassis mounting so card doesn't move during transportation =) servers generally have 80mm fans that are forcing air through these, you can do same with 2x80 5k RPM fans to cool 4 passive cards
@jaycewilson4808
@jaycewilson4808 2 ай бұрын
I created a script that controls the fan speed based on the hottest GPU. The script uses Nvidia SMI to get the GPU temperature and IPMI tools to change the fan speed according to the GPI temperature. So, the script is dependent on the motherboard supporting IPMI tools.
@BaldyMacbeard
@BaldyMacbeard Ай бұрын
Bro that's pretty affordable. My 2x3090 system cost me 1200 for the GPUs alone...
@AI-HOMELAB
@AI-HOMELAB Ай бұрын
But then you also got them for a great price. 😃 And 3090ties are quite a lot faster in inference. But pay attention if you would want to mix'em with P40ties. Not the easiest thing to do and to get working. I'll experiment on that some more myself. 🙈
@zachzimmermann5209
@zachzimmermann5209 2 ай бұрын
Very interesting, but due to all of the downsides, the sound issues, the relatively slow inference speeds, the lack of support for newer tech such as bf16 well as the limited support for non-datacenter multi gpu setups I feel that a used RTX 3090 or a used M2 mac with 32gb + ram is a better option for the price. Particularly if you plan to be within earshot of the server while using it (these are homelabs afterall). I'm still unsure about the compatibility here, but assuming it's similar and you are still okay with the multi-gpu tradeoffs, the 3090 route even offers an upgrade path.
@AI-HOMELAB
@AI-HOMELAB 2 ай бұрын
The 3090 obviously is the better choice if you plan spending a little more. But if you only want to run LLMs below a certain size (Mono-Design (till around 120b max), or Mixture of experts (200b max), this works quite well. If you plan on spending more and running bigger LLMs I'd go with 3090ties or 4060TIs. But there are many alternative paths. ✌️
@wtfeatapples
@wtfeatapples 8 сағат бұрын
beautiful case
@IllD.
@IllD. Ай бұрын
Thanks for the video. Didn't know this was even an option. Was gonna fork up the cash for a 4090 ti, but this seems like a much better option.
@AI-HOMELAB
@AI-HOMELAB Ай бұрын
First of all: Thanks for watching! Now my advice/ question: Do you want to use it for text to image or for LLMs? If you want to use it for LLMs: Do you plan on running huge LLMs like Mistral Large? Depending on what you want to do, this kind of build can make a lot of sense. But multiple rtx 4060ti's could also be a good bet, if you want to spend a little more. In Switzerland we get those for around 450 USD. If you want you can write down what you want to use it for and I will brainstorm a bit. ✌️ The 4090 ti has the advantage of the tensor cores and potential NV Link in the future. I want to make sure, you go for the right card. ✌️
@IllD.
@IllD. Ай бұрын
​@@AI-HOMELAB I guess I always wanted to generate images with the long term plan of devloping these images into videos. I've always wanted to create short films / movies but I don't have the budget or the network to make them. I'm currently using Midjourney & Runway, but I do not like how I have to subscribe to them. Plus, Midjourney can't do loras as well as stable diffusion/ flux. Aside from that, I guess I'd also like to create code and assets for video games. Also, I appretiate your interest in helping me. Thanks a lot.
@AI-HOMELAB
@AI-HOMELAB Ай бұрын
@IllD. Okey, I see. If you can get a P40 for a fair price (below 200USD), it's a good option. You only need one card, as 24gb of Vram is enough for Flux or Stable Video. But one caveat: a Rtx 3090 for instance will be much faster than a P40 (around 7times faster). If you get a 3090 used sometimes you can get them for around 800 USD or cheaper. The RTX Lineup is just much faster for CNNs. There is also the chance that upcoming text to video tools won't support the P40. With all that said: If you like to tinker around the P40 will do its job just fine. By the way: Flux dev even runs on an rtx3060 (12gb). I did test that. But for video generation you might want more Vram. Just also know: Video generation is not quite there in open source. Black Forest Lab will publish smth. and it seems to be awesome. But it's not already here. I think you should either go with at least a Rtx4060TI (16gb) or higher if you want a RTX card, or I'd look out for a good deal on ebay for the P40. Prices fluctuate wuite a bit and they sometimes are horribly overpriced. Don't pay more than 200 USD. The card is not worth more than that. Does that help? If you need more advice, just write a comment and I'll answer.
@IllD.
@IllD. Ай бұрын
​@@AI-HOMELAB Thanks for the help. Guess a used rtx is fine for my standards.
@cnmoro55
@cnmoro55 2 ай бұрын
Nice setup. Would be nice to see some benchmarks using vLLM with batched inference You might hit way more tokens/s overall
@Rewe4life
@Rewe4life 29 күн бұрын
I am struggling with utilizing multiple GPUs for my AI tasks. Especially for stable diffusion training. Are u able to help me with some tips on that? I am running 2 P40s in my rig but I am only utilizing one…frustrating 😢
@AI-HOMELAB
@AI-HOMELAB 29 күн бұрын
Yes I think I can help you. github.com/czg1225/AsyncDiff This is the only repo that does this as far as I know. I haven't tested it yet. But it should do exactly what you want. It can be integrated into automatic 1111. This should nearly double your inference speeds. ✌️ Does that help? There are apps that can use many gpus in batch jobs (each gpu does an image at the same time). I forgot about the name. But I can have a look at my apps if you wish. Does that help? =) Greets Simon
@TrueHelpTV
@TrueHelpTV 2 ай бұрын
thank fuck Im not alone on this journey at literally the same time
@MilkyToasty
@MilkyToasty 11 күн бұрын
The titan for the 1000 series water-cooler bolts on to the p40 with very little modifications
@The-Weekend-Warrior
@The-Weekend-Warrior 29 күн бұрын
Hi! I'm planning to stick P40s or P100s in a DELL server with 2 Xeon E5-2650 V2s. I'm not sure they support AVX2, they surely support AVX. How important is AVX2? What happens if they don't and I still install these GPUs in there?
@AI-HOMELAB
@AI-HOMELAB 29 күн бұрын
It only supports avx: www.techpowerup.com/cpu-specs/xeon-e5-2650-v2.c1661. It should be fine as long as you don't offload any layers to the cpu. The cpu would only be a bottleneck if you'd use it to offload some layers as avx2 would speed up vector operations on the cpu. If you want I can simulate it on an older machine. But it will maybe take a week until I get there. ✌️
@AI-HOMELAB
@AI-HOMELAB 29 күн бұрын
Some apps might not work or you'll need to search for older or beta versions: lmstudio.ai (See hardware requirements). I had an older version running once. But it barely supported any models anymore...
@AI-HOMELAB
@AI-HOMELAB 29 күн бұрын
It might be worth waiting for me to retest it. (If you didn't already buy it).
@jonathangranados3645
@jonathangranados3645 2 ай бұрын
Nice explanation about everything, most interesting was when you mentioned about dont buy Kepler for learning computing, Pascal like P40 recomended because Pascal and compatible with latest update with cuda, my mistake as I am new on this I bought recently Tesla K80 without knowing anything. what is your advice ? I have Workstation Dell T5600 2 processors XEON 2687W total 32 cores. 64gb ram
@AI-HOMELAB
@AI-HOMELAB 2 ай бұрын
@@jonathangranados3645 Hey Thanks for your positive Feedback first of all! As for your question about the Nvidia K80. It's a bit a difficult card as it splits two pools of 12gb Vram around its two on board GPU. It shouldn't be a problem for LM Studio as far as I know. Try out Pinokio and see how far you get. For CNNs like Stable Diffusion: Perhaps Async Usage could be usable: github.com/czg1225/AsyncDiff I am going to create a guide for the card in the coming two or three weeks with a suitable fan adapter for the card. Can you still wait a moment?
@peterxyz3541
@peterxyz3541 Ай бұрын
Thanks. I was researching this path.
@AI-HOMELAB
@AI-HOMELAB Ай бұрын
You're welcome. Happy if I could help. =) Some general advice: Check that your CPU supports enough lanes (if you go with a server cpu - this should not be a problem). Also: Enable Above 4g decoding begore inserting the GPUs. Sometimes motherboards don't post if you do not follow this way. 😅 Greets Simon
@peterxyz3541
@peterxyz3541 22 күн бұрын
@@AI-HOMELABthanks. I believe I may follow your path with one exception. No 40mm fans, I was thinking about 80mm push and pull and a custom channel like what you have. It’s going to look ugly
@peterxyz3541
@peterxyz3541 22 күн бұрын
@@AI-HOMELABhave you looked into ROCm’s new found stability and compatibility? Their Mi60 is 30gb per card at 160w
@AI-HOMELAB
@AI-HOMELAB 22 күн бұрын
Nice pick! Haven't trated AMD yet, but it's on my plan. I found these on ebay for around 320 USD and performance seems to be awesome. There are a few caveats as it seems: (A description of one seller): This unit does NOT work with windows. It only works with Linux. This unit does not work with any Radeon drivers and only works with ROCm. For multiple cards to work in one system, this card typically needs a server motherboard. In our testing with a server motherboard, we are able to connect up to 4 of these MI60 cards at once, but not more than 4. www.ebay.ch/itm/125006475381?_skw=Amd+MI60+32GB&itmmeta=01J8Y87DVF6TYCTFDG327GYZCE&hash=item1d1af77075:g:KJwAAOSwlelhlGtQ&itmprp=enc%3AAQAJAAAA8HoV3kP08IDx%2BKZ9MfhVJKliWz5OS2bdBItK5EPzczITGcYY1x9%2BkiXE%2BjI9m6eupR60taoSoY9RASZhqxFsq3ikCHZooXNC5g3bKNEVk7wILTFsTEWfV9hQyvbwE4DRkHXeY%2B4Eandt7F6jizw%2F7l%2FTVc8aF8hRs1yzZKwF5bKlntmqqZqCf6UOE7XXmhhIfF2IbNOYS8iSpMmTEPxC764DKCgnagC7F4XLDSZQ6L72MWQTKA%2BwXfgjtSHosmIsmlQo0g5C2lcfYY9h2xd1Z1UK0UJPaTnYbUUGmXf0FDeIGttM%2FLlM1roMYofyCd%2FT9Q%3D%3D%7Ctkp%3ABk9SR-bdncjHZA I might get one of these to test it. This seems interesting! )
@AI-HOMELAB
@AI-HOMELAB 22 күн бұрын
@peterxyz3541 Doesn't matter as long as it works! =D
@rhadiem
@rhadiem 2 ай бұрын
Since you have the modeling skills, you could model a custom 4x P40 fan mount and use a large high-pressure fan instead of those little ones, and see how that goes.
@virtuous_pixel
@virtuous_pixel 2 ай бұрын
Bump
@AI-HOMELAB
@AI-HOMELAB 2 ай бұрын
Static pressure should be better on 40mm server fans than on 96mm pc fans as far as I know. A 96mm fan can move more air but with less static pressure. But I'll experiment with it in a coming video. This isn't a huge effort to try. ✌️
@alignedfibers
@alignedfibers 14 күн бұрын
I think the Accelerate package allows the parallelization and it was working to use my CPU and GPU at same time, so in theory should allow to use multiple gpu, however I believe a configuration is needed to make pytorch to see the gpus properly.
@shkim2011
@shkim2011 Ай бұрын
I have a question re the total VRAM: since the 4 cards are not connected via NVLink, what is the max model size you can train (for example fine tuning)? Isn’t the max size still 24 GB? And I mean loading into VRAM, not due to llama.cpp or quantization. The reason for my question is that I weigh your configuration versus 2x 3090 connected via NVLink to run LLM models up 48 GB
@AI-HOMELAB
@AI-HOMELAB Ай бұрын
I am honestly not sure about this. For inference you can obviously pool your Vram - even without NVLink. But if you want to train models, I'd go for a solution that supports NVLink. Training without NVLink should be possible but slower. If you are serious about training, I'd go with two 3090 cards.
@jackpuntawong5866
@jackpuntawong5866 16 күн бұрын
Max model size will be 96GB but it’ll be bandwidth limited. You’ll see GPU at 90-100% for a while then ramp down to 0% during the back propagation phase.
@AI-HOMELAB
@AI-HOMELAB 16 күн бұрын
@jackpuntawong5866 Not really as far as I know. GGUFs work in a serial manner, meaning each GPU calculates its layers and passes the result to the next. The “bumping up and back to zero” happens because each GPU works independently, using its own VRAM, and the bandwidth shouldn’t be an issue here since the data transfers between GPUs are limited and only occur at specific points in the process.
@jackpuntawong5866
@jackpuntawong5866 16 күн бұрын
@@AI-HOMELAB oh, very interesting 🤔
@profengel
@profengel 2 ай бұрын
Can you test the 16-Bit version of the models (Llama3.1 or Gemma)? Because the older GPUs don‘t support quantization like you tested. Check out the latest video from trelis research, where he explained this inference problem with older GPUs
@profengel
@profengel 2 ай бұрын
OK. I found a comparison. kzbin.info/www/bejne/mp6WeY2Kh8mfmJosi=6JFXfzdNz3RsfPTW Amazing what the P40 can do. Thanks for sharing!!!
@AI-HOMELAB
@AI-HOMELAB 2 ай бұрын
I surely can. But the P40 does support Quantization up to a certain degree. It can just get inefficient (slow). But I don't observe that. If I run the same workloads without GPU support this will run a lot slower... (I tested that for myself). Especially the Wizard models. But I can add the FP 16 Version of the models for my tests. Do you want a test for the safetensors? Earlier I even ran Tesla M40ties with that workflow. There really should be no issues with that. I have read about problems with wuantization with the K80 as it dies not support the newest Cuda version. The P40 does natively support Int8 (www.tomshardware.com/news/nvidia-tesla-p40-p4-inference,32680.html). Q8 should be representative. But I think the reason why I can still run Q4 quants with a speedup and even below is because of software/ driver emulation (because it has support for the newest cuda). I am also really not the only one using these cards for those Quants. You can find posts of people using them since 2021. But back than support was limited. But for instance: www.reddit.com/r/LocalLLaMA/comments/13n8bqh/my_results_using_a_tesla_p40/ (The results from that discussion are legacy. You can get better speeds right now easily thanks to newer drivers and better optimization). But I'll surely watch the video. Is it recent? You will find lots of posts from older tests where didn't have the awesome support which has been generated by the boom right now. By the way: there are even faster quants than gguf. I use these cause they are the most widely adopted and the interface from LM Studio is really nice. But if you use the text generation web UI you can ger better speeds with other formats than gguf. They just can sometimes require a bit more knowledge. Anyway: I'll add FP16 for my tests with the M40 and K80! =)
@MichelStumpf
@MichelStumpf 2 ай бұрын
Great stuff, I do have a silly question: is LM Studio or Msty able to split a large model to multiple GPUs or tools like Stable Diffusion? Typically if you use Mixtral 8x22 it will require more than the 24G of a single GPU in your rig.
@AI-HOMELAB
@AI-HOMELAB 2 ай бұрын
LM Studio does split the LLMs (btw. your available GPUs). There are ways to do it with CNNs ;stable diffusion) but they are a bit more involved.
@paelnever
@paelnever 2 ай бұрын
What do you think about tesla k80 GPUs? You can find them second hand even cheaper.
@AI-HOMELAB
@AI-HOMELAB 2 ай бұрын
@@paelnever There are a couple of caveats with the K80. It houses two GPU chips with 2x12gb Vram. Without tricks you can't run CNNs which use a huge amount Memory (text to picture). They also do not support the newest Cuda Drivers (plug and play solutions could be hacky/ difficult). I'll cover the K80 in an upcoming clip. But to keep it simple: M40ties with 24gb VRAM are the better choice. You have unified memory and they support the newest cuda drivers. =) They can also be found for around 100 USD (even now). But if you wish to use K80ties: I'll show how to use them in a couple of weeks. Perhaps another alternative are modded 2080TIs. They come with 22gb of Vram. But they are more expensive (550 USD). They can be found on Ebay. ✌️
@Gideonblade
@Gideonblade 2 ай бұрын
Nice setup
@AI-HOMELAB
@AI-HOMELAB 2 ай бұрын
Thanks! 🙏
@sondrax
@sondrax 27 күн бұрын
Very exciting man! Mucho Gracias for sharing your explorations! QUESTIONS: anyone know how the scaling works? For example… the memory of the P40s combines to allow you to run Llama 3.1 70B FP8! Which is incredible feat for the money… But… how is the actual processing of the LLM performed? Do all 4 P40s actually work on the processing of data? Or is their memory used to load the model, but the processing is performed mainly by one card? Specifically… wondering what would be the result of removing ONE P40… and inserting a 3090…? Would this drastically speed up the tokens per second? Or would the power of the 3090 just be deluged by the P40s? Hope my question makes sense…. Anyone have any thoughts on this!?
@AI-HOMELAB
@AI-HOMELAB 27 күн бұрын
First of all: Thank you! =) Now when it comes to your question: If you load the weights in gguf, the layers are processed serially. They get distributed on all the gpus. GPU1 calculates the first couple of layers, hands the results to gpu 2 which then calculates the next couple of layrrs snd so on. I can see this in the GPU load graph. They "spark" for a little bit and then the next one does. This means they all contribute but not in parallel unfortunately. For this one would use VLLM. Now on theory this repo would do that (special branch for P40): github.com/cduk/vllm-pascal Though I have to yet test this one. If you add a 3090 to let's say four P40ties and use gguf this will do you little good. It works, but the speedup of only some layers will barely be noticeable.
@sondrax
@sondrax 26 күн бұрын
@@AI-HOMELAB thanks man! Though not the answer I was hoping for with the naivety of a ‘newbie’ … I do so much appreciate the explanation (and maybe even understand some of it!) 🙃 I’ll check out the 🔗 you provided and am super excited to see what you do next on your channel! Current, ever shifting strategy: based on your work and my budget, looking to try building a machine with four P40s & two 3090s…. Do you have ANY rough idea what typical TOKS (tokens per second) such a system would yeald with something along the lines of Llama 3.1 70B 16FP)? I’m shooting for 8. (I know… but what can I expect with my 💰 limitations!?) Just from messing around at a local college, I seem to get far more nuanced and helpful responses from the FP16 verse FP8…. The FP8 seems to just stop paying attention to the relative prompt / RAG data along the way…) [and if this isn’t the right place for this question… np… just don’t respond and I’ll hit you up with a more informed version of the question down the line!] Much appreciation for the effort & time you’ve put into sharing your results! It’s information that’s hard to find and very valuable… with ♥️ from Texas! Bart Clark
@AI-HOMELAB
@AI-HOMELAB 26 күн бұрын
Hey Bart! Nice to exchange some thoughts with you! This is a domain we are all learning smth. daily. I started in 2021 with doing my first experiments - and I know how hard it sometimes can be to find reliable sources. So I am glad if I can help. ✌️ Now for your question: The P40 kind of sucks at FP16 - no other way to tell that. You loose an insane amount of computational power at FP16 compared to, let's say q_8 (around 40% inference speed loss in gguf). With my four p40ties I get 3.61 T/s in q_8 (llama 3.1 70b). In FP16 I'd be around 2-2.2 T/s. You'd get some speedup from the 3090ties. Perhaps 3T/s as a reachable goal. Also: with vllm you should get some speedup. But if you are aiming for 8T/s I'd probably go with 4060TIs (8x). Just remember that you'll need a server cpu and motherboard for that (available Pcie lanes from the cpu). 4060TIs would also make it easy to use vllm and you get tensor cores. I'll test those in the coming week. Maybe go for modded 2080tis (I haven't tested those, but they got 22gb of vram at 500USD - bay area USA). & yes, there is some more capability to be found in fp16. Do you use the models to code or for logic, physics, math and so on? Qwen 2.5 72b is amazing in those domains. (Better than llama 70b). This is what my next video is about. But for code you want to use the fp16 version... Does that help? By the way: Just keep asking! I am glad if I can help. Also keep me updated on how your build goes! I find it fascinating to see what everyone is building! =D Warm greetings from Switzerland (although it starts getting colder 🙈), Simon By the way: Do you study? (Just bc. you mentioned your college) =)
@sondrax
@sondrax 25 күн бұрын
@@AI-HOMELAB Brother of The Digital Realms! Thanks so much for the awesome reply…. Though it be mostly bad news for me and my budget: looks like the only way to get an even almost acceptable - for me - Ts count would be to go with all 3090s…. And now with Llama 3.2 jumping to 90B… I’d STILL come up 40 or so gigs of VRAM short! (Though I don’t need ‘vision’ for my use.) What I’m doing is… similar to coding, in a sense…. Working on a very intricate writing project wherein the text of the novel is coded to changing but specific formulas (think rhymed iambic pentameter on steroids … where you can reverse the process and the text will yield playable music notation. I know… rather an odd use! But I’ve been doing it manually on and off for 50 years and AI REALLY speeds up this process. Allot.) It very well may be that I’m gonna be forced to use FP8. And it could well be that I just haven’t figured out the magic way to prompt the AI. But I’ve found messing around with help from a college student on their system… that FP16 sure does stick to the ‘rules’ and doesn’t tend to go off the deep end / hallucinate like I’ve found with FP8. Sorry my use and explanation is so esoteric it likely seems ridiculous… maybe I could explain it better but will do all a favor and not try. 🙃 I will keep lurking and absorbing your findings, looking for a solution I can afford (around $5k). Thanks again for sharing your adventures in AI… it’s truly fascinating to me and I - and others! - really appreciate the time / energy you e put into answering many of our questions! Keep up the work and don’t get too cold! (We could use some of that weather here during Summer months… 🥵) Oh yes… though I was in college far too many decades… I’m 60 yesterday and have been out long enough to pay off my student loans! Best regards! ♨️
@uropig
@uropig 14 күн бұрын
Great Video!
@AI-HOMELAB
@AI-HOMELAB 14 күн бұрын
Thank you! Appreciated! =))
@Fordtruck4sale
@Fordtruck4sale 4 күн бұрын
I'd still recommend the used 3090 route for most people. Just a better bang for buck. I love workstation/server cards but that's just the way it is.
@AI-HOMELAB
@AI-HOMELAB 4 күн бұрын
Based on? ✌️
@Fordtruck4sale
@Fordtruck4sale 4 күн бұрын
@@AI-HOMELAB more recent CUDA support, better performance,less trouble to set up, easier to sell if you realize it's not for you. 3090s can handle all the modern arch and can be had for cheap on FB marketplace. Just set power limit lower on them for inference. Only thing against them is they're always more than 2 slot. Smallest one I could find is the 3090 that Dell built and they're hard to come by and are just over 2 slot
@AI-HOMELAB
@AI-HOMELAB 4 күн бұрын
Fair arguments. If you can get a 3090 cheap*ish it'll obviously be the better card. But if you still can get a P40 at the price I had to pay, I'd say it's still worth it. I never saw a 3090 beliw 700USD (at least in my region). P40ties are still easily sold as prices went up. But right now with where the prices are, there are better options. We'll show some of them shortly. The 3090 is an amazing card. If you can get it for cheap: Do it. But I still think the P40 holds its value if you can get it for below 200USD.
@НеОбычныйПользователь
@НеОбычныйПользователь 3 күн бұрын
@@AI-HOMELAB And I'm still wondering what happens if you add an RTX3090 as a CUDA0 device (on which context is mostly processed) to 3 Tesla P40s. The 4 Tesla P40s takes a very long time to process large contexts. I wonder if this method could be used to significantly reduce that time (and even increase the gerenation speed, although it's not the main priority).
@fredbompard7097
@fredbompard7097 2 ай бұрын
Nice video !! Thank you 🎉
@Strawberry_ZA
@Strawberry_ZA Ай бұрын
fascinating rig, really awesome silicone! The airflow configuration seems sub-optimal though - nothing against you as the builder though, these do seem like excellent value components.
@AI-HOMELAB
@AI-HOMELAB Ай бұрын
First of all: Thanks! =) Airflow-wise: What would you change? I am always open to suggestions! I want to build optimal rigs. I know some suggested 96mm fans - which I will test out. I was a bit hesitant using those because I thought they lack air pressure - air flow would be better though.
@Strawberry_ZA
@Strawberry_ZA Ай бұрын
@@AI-HOMELAB you're welcome, nothing really you can change I think.tge GPUs blow front to back but the CPU blows bottom to top
@squirrel6687
@squirrel6687 2 ай бұрын
How does this setup compare to two NVLinked 3090s? I'm currently running two 3090 XC3s with an 11900KF on a Z590 Dark, 32GB RAM (though I'm planning to upgrade to a Z690 Dark Kingpin with a 13900K and 64GB DDR5 6000 which I have on hand), powered by an EVGA P2 1200W PSU and cooled with an EVGA CLC 280. The build is housed in a Cooler Master HAF XB EVO with a modded motherboard tray to support EATX and a Noctua 320mm top fan. For gaming, I use Windows 11 Pro, but for workstation tasks, I prefer Debian-everything, including the AIO control, is supported. At the end of the day, NVLink still offers faster VRAM pooling than PCIe 4.0 or 5.0. The Z690 Dark is EVGA's last motherboard supporting 8x, 8x direct to the CPU, which most PCIe 5.0 boards don't. It seems like the industry is phasing out NVLink and SLI at the consumer level, likely in preparation for the AI boom, IMHO. The XC3 3090s are a perfect fit on the Dark (Kingpin) motherboards due to the spacing. The Cooler Master case supports hot-swapping HDDs and other features, though I did have to deal with a warped motherboard tray. Luckily, as a former metal fabricator with a BS in Mathematics, it was easy to fix with some wood and a hammer. The only downside of the Z590 Dark is the lack of Thunderbolt and 10GB Ethernet, which I might be able to address with a ribbon cable to the third PCH-connected 4x slot. Unfortunately, the Z690 Kingpin also lacks these features, and it can't access any iGPU, which is a bit of a bummer
@rhadiem
@rhadiem 2 ай бұрын
Less oversharing may give you more helpful responses. 4x P40s have a LOT more VRAM than dual 3090's, which limits what you can do. The 3090's are obviously faster, but if you can't load the model you can't run it at any speed.
@AI.Musixia
@AI.Musixia 10 күн бұрын
i wonder only one thing why no soc-ai m.2?
@reinerheiner1148
@reinerheiner1148 2 ай бұрын
Wow, I thought yea sure but who can afford it before watching the video... But those p40 are really dirt cheap, in comparison to what else we could buy with 24gb of ram. Nice!!! A few questions: If the model is small and fits within one GPU, are you sure that still all 4 are used for inference? How would the performance be for fine tuning? Also, I'd love a video benchmarking the CPU for inference, especially for models bigger than 96gb (Mistral Large, LLama 3.1 405b), but I suspect its going to be quite slow. But how slow? Thanks again for the video!
@AI-HOMELAB
@AI-HOMELAB 2 ай бұрын
Hi! Yes, if you use LM Studio it will by default (when using the cuda backend) distribute the model weights across the GPUs. You can spot that if you look at your GPU-Memory utilization. It won't use the computing power of all GPUs simultaneously, but rather calculate the first layers on GPU1, then the following on GPU2 and so on. There are some formats which allow for asynchronous performance. I chose GGUF for this presentation because it's the easiest to get into, as it also supports older GPUs not like other format (exllama2 for instance), and it's also the most user as far as I know. For finetuning you'd probably be better off with ising P100 GPUs (better memory bandwidth). But this is a part of this huge ML field in which I don't have much experience yet. I'll also test the CPU for inference. But I can already tell you, for models like llama 3.1 405b instruct it would be painfully slow (like waiting over a minute for a word). With partial GPU offloading it gets better. Mistral large v2 runs fine on the GPUs - on a 7551p this will probably not be the case though. I'll create a video about that tomorrow. I hope I'll get it done in the next 24hours. 🤞🏻
@AI-HOMELAB
@AI-HOMELAB 2 ай бұрын
Okey, sry - this just created so many ideas in my head while starting it. It will be there on Monday. But I promise it will be worth the waiting time.
@bellisarius6968
@bellisarius6968 5 күн бұрын
yooo where do i get that EVGA case?!?!?!
@JonVB-t8l
@JonVB-t8l 2 ай бұрын
Using HBCC on a Vega 56 with a 1st Gen Epyc 7551 server with 128GB of ram, I was able to have 128GB of VRAM, and still have 8 left over for the OS. I considered using Optane DDR4 ECC dims for a cheap solution to high ram quantities in a cheaper motherboard, but it's not commonly supported on AMD. Now that I've confirmed it works, Vega 20 workstation GPUs are the target because they support PCIE 4.0 so double the bandwidth to the ram.
@JonVB-t8l
@JonVB-t8l 2 ай бұрын
My goal is quality, not token quantity
@derstreit
@derstreit 2 ай бұрын
This is interesting Idea. I have a Radeon 7 on my shelf, maybe I should try this.
@JonVB-t8l
@JonVB-t8l 2 ай бұрын
@@derstreit Funny you say that, because the only Vega 20 GPU that doesn't support PCIE 4.0 is the Radeon 7 lol. Sry to break the bad news. I just bought an MI60 and I have a Cascade lake Xeon Scalable on the way so I can make use of those $50 128GB DDR4 Optane Pmem sticks. 128GB of cheap-ish ECC and 256GB of Optane.
@derstreit
@derstreit 2 ай бұрын
@@JonVB-t8l OK, sucks. I also have a Vega 56, so I can try that. Test Bench would be a Threadripper with 64GB Ram.... Time will tell
@JonVB-t8l
@JonVB-t8l 2 ай бұрын
@@derstreit If you want to try LLAMA 3.1 405b thats not enough ram, your gonna need to dedicate a bunch of NVME space as swap space and it's gonna hurt performance way more unless you can saturate that PCIE 3.0 x16. Just a heads up. But I confirmed it works and I'll post results on Level1 as soon as I get this single prompt setup ready.
@thedanyesful
@thedanyesful 2 ай бұрын
Are 40mm fans really the only option? You've got a 3D printer you ought to be able to build a shroud that gets you just as many CFM from big fans with a much lower noise level.
@AI-HOMELAB
@AI-HOMELAB 2 ай бұрын
I guess I will try this out later and create a video about this. When I first built the server I started with one card and planned to add to it. This just seemed like the easiest most flexible path. And I thought/ still am unsure if 96mm fans deliver enough air pressure. But we'll see. =)
@i1pro
@i1pro Ай бұрын
Pascal graphics cards were slow. Is the low Teraflops processing power a problem? .
@AI-HOMELAB
@AI-HOMELAB Ай бұрын
I’m not sure how you arrived at the conclusion that Pascal cards are slow. Cards like the 1080 TI on which the P40 is based on were beloved at its time. The Pascal architecture, particularly in cards like the P40, still offers strong performance for inference tasks in CNNs and LLMs. While the FP16 performance may be lower compared to newer architectures, for many inference workloads that rely on FP32 or INT8 precision, the Pascal cards still hold up well. If you’re interested, you can check out our benchmarks where we compare the P40 against newer cards like the RTX 3090 and the RTX 3060 to give you a better perspective on its performance in these specific tasks.
@brachisaurous
@brachisaurous Ай бұрын
Think the P40s are reference 1080 design. Could liqiud cool with 1080/ti gpu waterblocks?
@AI-HOMELAB
@AI-HOMELAB Ай бұрын
Should work - if you are interested in that, there has been a comment from another person, who if I remember correctly did this. A quick search from myself: www.reddit.com/r/watercooling/comments/135z19l/tesla_p40_waterblock/ Should be compatible. =) I might have to get into this myself at a later point. ✌️
@brachisaurous
@brachisaurous Ай бұрын
@AI-HOMELAB have 2 p40s and 2 1080 waterblocks,, cpu pump block combo, some rads ,x99 board and a 2697 v3...just need some ddr4 ram and looks like we can get a very capable AI rig
@AI-HOMELAB
@AI-HOMELAB Ай бұрын
This sounds nice! 😃 DDR4 ECC Ram can be found "cheapish" on ebay. But yeah, 2.4ghz is the frequency. But as long as you don't rely on offloading, it works great.
@dfcastro
@dfcastro 8 күн бұрын
Why not a noctua 40mm fan that is quiet silence, like 14.9dB(A)?
@AI-HOMELAB
@AI-HOMELAB 7 күн бұрын
Not enough air flow. ✌️
@theuniversityofthemind6347
@theuniversityofthemind6347 11 күн бұрын
Hi Simon, love your content! I have an Alienware m18 R2 with an Intel i9-14900HX, NVIDIA RTX 4090 (24GB), 64GB RAM, and 8TB storage, but I struggle to run LLaMA 70B models. Could you create a video for users like me on optimizing setups (8-bit quantization, mixed precision, etc.) to run large models efficiently? Your help would be greatly appreciated. Many Thanks!
@AI-HOMELAB
@AI-HOMELAB 8 күн бұрын
This seems like an interesting idea! I'll make sure to test it. But you may need some patience. I still construct my 12-16 gpu rig. And progress is a bit like with Teslas FSD: It's definetly going to be done by tomorrow (oh damn - this doesn't work...). 😅🙈 Anyway: normally for your setting I'd go with LM studio, Qwen 2.5 72b or llama 3 70b (in q_4). Then load as many layers as possible to the gpu, use as many cpu threads as I may. This could help quite a bit. LM Studio normally only uses 4cpu threads... even if you'd have 64 or whatever. ✌️ Hope this somewhat helps for the moment! Thank you for tour nice comment! =) Greets Simon
@theuniversityofthemind6347
@theuniversityofthemind6347 8 күн бұрын
@@AI-HOMELAB Hey!! Thanks sooo much for your reply! Are you saying that you may be able to record a step by step video on this but I will need to wait a while before you can find the time to do it? If so that would be amazing!!
@delahaije25
@delahaije25 3 ай бұрын
but can it run crysis
@AI-HOMELAB
@AI-HOMELAB 3 ай бұрын
@@delahaije25 Certainly if you run the geforce drivers. 😂
@gambiarran419
@gambiarran419 Ай бұрын
I shall have a great allowance shortly from work for a powerhouse 70b AI build. If you wish to build and test the system for channel content, reach out 👍
@AI-HOMELAB
@AI-HOMELAB Ай бұрын
Hi there! I am not completely sure what you are referring to. Do you want me to build the server for you, or do you want me to cooperate with you in choosing the parts and setting up everything? =) Either way: Are you on X/ Twitter? This way I could DM you. It would be interesting to accompany someone when building a bigger ML system like we do it here. ✌️
@jelliott3604
@jelliott3604 2 ай бұрын
@AI-HOMELAB, you mentioned testing V100s . I have 4xV100s (SXM2, 16GB) with heatsinks and most of a Dell C4130 GPU server (mainboard, GPU daughter board and PSU - I am still missing risers and some of the cables) to put them in - plus 2xE5v4 Xeon CPUs and 256+GB DDR4 RAM. It's almost all from either eBay or China and collected over about 16 months as prices and availability have changed. Eg. The 4xV100s had an average cost of £150 each. The dual CPUs (36 cores @ 3ish GHz) I got for £36 for the pair. The RAM cost £150 for the 256GB. The Dell C4130 server components, they are expensive. Some of the cables are either a ridiculous price or else have a huge wait time until they arrive from the USA or China. Am currently waiting for 1 more cable to ship from the USA to the UK when I have everything I need to get it up and running (at least the GPU and daughterboard end of things, though I might still need to connect it to a workstation/desktop motherboard)
@chimpo131
@chimpo131 2 ай бұрын
lol eat 💩😂😂
@AI-HOMELAB
@AI-HOMELAB 2 ай бұрын
Wow, this should deliver a really nice output speed for inference. You akso seem to have gotten these cards for an insanely low price! I wish these cards were still available at this price point... I guess I'll have to be patient. Am hoping for lower prices on the RTX3090 when the 5000series drops. I already own a 3090 and a 3060. But right now I feel it's too expensive to change my whole stack to them. Anyway: once you've built your rig, let me know how it went. 😃
@RimlaAri
@RimlaAri Ай бұрын
And what can you do with that? Is this just for training and studying?
@AI-HOMELAB
@AI-HOMELAB Ай бұрын
Exactly the opposite. This setup is mostly suited for inference tasks (usind LLMs and CNNs). It lacks tensor cores, so it's not designed for training tasks. They originally designed the P100 for that. But it also lacks tensor cores. These architectures (Pascal) nowadays are only used for running ML models, not for training. ✌️
@edengate1
@edengate1 25 күн бұрын
niceeee
@MaverickTangent
@MaverickTangent 25 күн бұрын
I bought my P40 for like $150 US earlier this year 2024 and now Sept 2024 they are $300 US. I don't know if they are selling that high, but maybe they will come down. I have a video mounting a 1080ti cooler on one.
@AI-HOMELAB
@AI-HOMELAB 25 күн бұрын
Yeah, prices went up to a crazy level lately... This card is great, but it's not worth 300 USD. I'll have a look at your modification video! Hoe are the thermals?
@MaverickTangent
@MaverickTangent 24 күн бұрын
@AI-HOMELAB I had to use a pwm controller, since I run the fan on a sata connector with an adapter, runs about 65C under normal inference load with the fan at 65%. 54C under straight 12v. Even at 100% 12v it's quieter than your 40mm fans. But at 65% pwm it sounds just like running a light game. It's not bad.
@kublikus
@kublikus 2 ай бұрын
Hello. Did you try to run llama 405b? What is the speed on this configuration?
@AI-HOMELAB
@AI-HOMELAB 2 ай бұрын
You'll find it in my second video. But basically: It runs - but insanely slow (below 0.5T/s) as I need to offload to many layers.
@danobot12
@danobot12 Ай бұрын
Is that price guidance of no more than 200usd for a P40 GPU still accurate? I can't find them anywhere for that price just a month after this video was uploaded.
@AI-HOMELAB
@AI-HOMELAB Ай бұрын
That price guidance is still acurate, but in another sense: The price of these has risen. Please don't buy one for more than 200 USD. It's not worth more than that. If you can't find one that is cheaper on ebay, make an offer to the seller, write to them or wait a little. The prices fluctuate quite a bit. There's one card you can still find easily for under 200 USD, it has less but faster VRAM (Nvidia P100 16gb). I bought mine at this price in january and may. The prices will come down again. But right now they are a bit hyped. Hope this helps.
@НеОбычныйПользователь
@НеОбычныйПользователь Ай бұрын
Have you tried replacing the GPU0 with an RTX3090 with this build? I have a 4xP40 build myself and I'm wondering if such an upgrade makes sense.
@AI-HOMELAB
@AI-HOMELAB Ай бұрын
I am creating a 16GPU Server where 6GPUs will be P40ties. I'll try to mix'em with an RTX 3090 to figure out how well this works. I am quite a bit off from finishing this one as we are creating a custom case. But I can test this configuration in the coming days. I can create a small Video. Should be up until Sunday, Alright?
@НеОбычныйПользователь
@НеОбычныйПользователь Ай бұрын
@@AI-HOMELAB Thank you, that would be very interesting. In my experience, the best performance on Tesla P40's systems is achieved using llamacpp and its forks; I personally use koboldcpp with CUDA version 12 support (koboldcpp_cu12) - it improves performance a bit more. The llamacpp developers have made many optimizations specifically for P40 - even flash attention support, which greatly increased the performance of these video cards. But this program does a lot of its work on GPU0 - it works much harder than other GPUs. It is very interesting to know what will happen if this particular GPU is replaced with a much more powerful one.
@RagdollRocket
@RagdollRocket 17 күн бұрын
Hey! I get why you'd go for 4x P40s-at €150 each, they cost around €600, which is budget-friendly compared to a used RTX 3090 at €600-750. But keep in mind, the 3090 is 30% faster, has tensor cores, and with NVLink (using dual 3090s), you get 2-3x faster GPU communication. Each P40 has 24 GB VRAM, but unlike dual 3090s with NVLink, they can't share memory as efficiently. For training, a single 3090 takes 8 hours, dual 3090s take 4-5 hours, while 4x P40s take 12-15 hours. Plus, power-wise, 4x P40s use 1,000W vs. 350W for a 3090, meaning €50 vs. €17.50 per 100 hours in Germany. Curious-what made you choose the P40s?
@AI-HOMELAB
@AI-HOMELAB 17 күн бұрын
Hey! Yes, you got some very valid and well thought through arguments here. The reason why I bought P40 cards back then was, because of these reasons: 1. The cheapest second hand 3090 I could find was still at 950 CHF (I didn't build this in July 24, but from Dec. 23- Jan. 24.) At this moment, the P40 just could get me much more Vram for the price at a still "okayish" performance. 2. I do not train any models, but run them in inference. This is why I die care about the eaw Vram amount. In inference, I can use what I got in total. VLLM (one branch), also does support the P40, and it was also easily usable in most formats. 3. I also do not run the server 24/7, only when I need it. (Which is to say for about an hour a day.) Now for the drawbacks: Yes, the P40 has no NVLink and no tensor cores. It's Vram also is slower bandwith wise. It's also less efficient. At the end of the day, what mattered to me was beeing able to lacally run big models. But I am starting to run huge models like Llama 405b, which makes me think I'll need something more powerfull for the future. ;-) In your experience: How much does NVLink help for running inference (T/s speedup)? For the next iteration of server, I got some cards in my head. Right now I am testing some cards. I'll also test the 3090 in a dual config, once I find it at a good price point (need the same as I already got.)
@RagdollRocket
@RagdollRocket 17 күн бұрын
@@AI-HOMELAB ich habe leider keine dual rtx 3090, nur recherchiert. Ich selbst hab' ne single rtx 3080, die auch ganz ok ist, ich will aber upgraden auf die dual rtx3090. wo wohnst du denn eigentlich in .ch? Komme aus Konstanz. LG und danke für die ausführliche Antwort :)
@AI-HOMELAB
@AI-HOMELAB 16 күн бұрын
@RagdollRocket Hey, Ja dann wohnst du ganz um die Ecke. Ich wohne in Frauenfeld, 20' von dir entfernt. Danke dir für die Überlegungen. Von einem guten Austausch profitieren alle. ✌️ Studierst du Informatik? Du scheinst sich auch sehr gut auszukennen, auch wenn ich deine Videos auf KZbin anschaue. ✌️
@RagdollRocket
@RagdollRocket 14 күн бұрын
@@AI-HOMELAB hey, danke für die Rückmeldung. Ich habe in Konstanz meinen Abschluss in Software Engineering gemacht, bin aber schon 'ne ganze weile fertig mit dem Studium. Studierst du noch? LG!
@AI-HOMELAB
@AI-HOMELAB 13 күн бұрын
@RagdollRocket Dann hast du bereits einiges mehr an Wissen in dem Bereich als ich. Ich studie aktuell im Master Didaktik der Medien und Informatik (UZH, HSLU, PHSZ). Dabei haben wir ca. 50% der Bachelor Module der Informatik. Bin aber erst im 2. Jahr meines Masters. ML konnte ich jedoch als Schwerpunkt wählen. Allerdings lerne ich gefühlt privat mit Tests, Experimenten & Builds fast gleich viel. 🙈 Liebe Grüsse auch von mir! =)
@Cocoacookierun443
@Cocoacookierun443 Ай бұрын
I use the same hardware and on the Tesla V100, I'm currently trying to run everything, but everything is very complicated on the windows server
@AI-HOMELAB
@AI-HOMELAB Ай бұрын
Okey, which problems do you have? There is quite some chance that I can help. 🫡 By the way: Nice! I always wanted a V100. Do you run the 16gb or 32 gb version?
@nerdywork
@nerdywork 3 ай бұрын
Thank you, very informative video! I may grab a P40 in the future and make a video trying to water cool it. I was also thinking about getting an amd 7900xtx and making a video documenting how difficult it is to test AI workloads on it. I think you have the potential to make some really cool videos with just a few updates to your style. Feel free to reach out if you ever want some help or advice. I’m not a KZbin master either, but I’m working on it, and I see several things we could improve on your videos quickly.
@AI-HOMELAB
@AI-HOMELAB 3 ай бұрын
@@nerdywork Thank you for your kind offer! I guess my friend and I who are running the channel will first have to find out our own style. But an exchange could be interesting once we are a bit further on with our channel. =) I think the two ideas would be great, as there aren't a lot of videos about cooling solutions for Tesla cards. It's also always reported that AMD cards are harder to get AI workloads running on. 😅 (For instance Pinokio seems to prefer Nvidia). I really hope we get some more budget friendly cards with more Vram.
@whiiiz9410
@whiiiz9410 2 ай бұрын
@nerdywork if you can afford the 7900xtx, it's a better and more modern buy. You will effortlessly be able to run your workloads. Some come bundled with AI quicksets, which is basically one click run.
@noth606
@noth606 2 ай бұрын
Just started the video, but for future things I'm sure it would be helpful to others if you used some currency that means something, or better yet more than one. Like Euros/Dollars that are something virtually everyone has a rough idea of the value of even if they don't use them. Talking about swiss francs is like talking about crates of coconuts or containers of iron ore. Worth something, but it could be super cheap, super expensive or anywhere in between. I can look it up of course but it means pausing the video to go look up a currency that no one uses except 3 mountain goats, figuratively speaking, and it will negatively affect retention of your video.
@AI-HOMELAB
@AI-HOMELAB 2 ай бұрын
I did (but later in the video): Have a look at the chapter costs, where I converted all the costs to USD (25:12). But yes, I'll definitely keep an eye on always telling not just only one currency. =) I must have forgotten to convert it in the beginning - sorry about that. 😅 I will also search for US prices in future videos and not just convert them.
@Mino12890
@Mino12890 2 ай бұрын
Hi, how do you get an image from the computer ? Does the AMD Epyc 7551P have integrated graphics ? If yes, so do the 7002 series ? Or do you have another gpu for graphic ?
@AI-HOMELAB
@AI-HOMELAB 2 ай бұрын
@@Mino12890 The motherboard has a small vga chip integrated. Server Motherboards tend to have a vga chip - but be sure to check if your MoBo comes with one. ✌️
@Mino12890
@Mino12890 2 ай бұрын
@@AI-HOMELAB Thanks for your answer. The MoBo has to comes with a VGA port or a specific VGA chip ? (I'm looking at the H12SSL-i from SuperMicro)
@AI-HOMELAB
@AI-HOMELAB 2 ай бұрын
@@Mino12890 Sure thing👍 =) It has an integrated small graphics unit. www.supermicro.com/en/products/motherboard/h12ssl-i ->ASPEED AST2500 BMC graphics Just be sure to also get a vga to HDMI Adapter (max res. is about Full HD).
@Mino12890
@Mino12890 2 ай бұрын
​@@AI-HOMELABabout the OS, I can use whatever I want ? Win 10/11, Ubuntu ? On the MoBo web page there are not that much compatible OS. Any idea or I should not care ?
@AI-HOMELAB
@AI-HOMELAB 2 ай бұрын
@@Mino12890 Now that depends if you want to use plug and play Apps or want to code a bit for yourself. For plug and play: Use Win 10&11. LM Studio & Pinokio will work out of the box. Linux is the better option if you want to be on the bleeding edge and code yourself. But for most people windows is the better choice as bleeding edge features get implemented in pinokio insanely fast. ✌️
@aveenof
@aveenof Ай бұрын
What's the software you used for LLM testing?
@AI-HOMELAB
@AI-HOMELAB Ай бұрын
We used LM Studio. ✌️
@aveenof
@aveenof Ай бұрын
@@AI-HOMELAB Cool. Looks like it's Windows and Mac only for now
@AI-HOMELAB
@AI-HOMELAB Ай бұрын
LM Studio works on Linux. Just haven't played with it on linux myself. =)
@GrulbGL
@GrulbGL Ай бұрын
i actually have an AI with llama2 and Flux to image creation. i'm considering to buy P6000 with 24 gb of vram, i currently have a 4070S with 12bg, i'm thinking if it will be an upgrade. Locally, they're selling p6000's for arround $538,44 USD the cheapest i could find (a 4090 is $2153,78 for comparisson). 12 GB for 4k images simply does not compute, does not run with 12b parameters. do you think those p6000 will be an wise upgrade?
@AI-HOMELAB
@AI-HOMELAB Ай бұрын
Okey, no matter what you do. Don't buy a P6000. These are overpriced as hell. It's basically a P40 with integrated cooling.If you have an RTX 4070 - a P6000 will be slower (I got a video where I compare the 3060 with a P40. The 3060 is about two times as fast in inference). The P40 just has more Vram. This is where it shines. At the price point you are thinking about, I'd rather save up some more cash and buy an RTX 3090 (used). This gives you a speed up and more VRAM. Hope this helps! =) And if you consider a Pascal Card, don't pay more than 200USD for the P40. You can also write to the sellers on ebay. Neither the P40 nor the P6000 are worth more. 😅
@mczaga
@mczaga Ай бұрын
What is it for?If I do the same what should I do with?
@AI-HOMELAB
@AI-HOMELAB Ай бұрын
I use it mainly for local LLM and CNN Inference. But this is something one has to be into. I love being able to test these models locally. You could also train models, although this configuration is not really built for that. You could also serve models online. But so far I didn't do this. Short answer: If you want to run your "own" version GPT4/ Midjourney (obviously not those weights, as they are proprietary), this is one way to achieve it.
@treniotajuodvarnis5503
@treniotajuodvarnis5503 Ай бұрын
not sure if this x4 bifurcator will eork as it clearly says max 75w that is how much it can draw from MB and another 55w from sata power, but 4 gpus will draw 150w as they take 75w from MB, althoug have to check the gpu with gpu-z, 3090FE draw power frfom pcie slot up to 70 while 4090 not much
@AI-HOMELAB
@AI-HOMELAB Ай бұрын
The gpu's are not powered by this board but by a dedicated 8pin pcie power connector going from the card to the power supply. I am already using it on a "beta style" case project. This is only a problem if you connect gpus without dedicated pcie connectors. So no need to worry about that. 🙂🫡
@AI-HOMELAB
@AI-HOMELAB Ай бұрын
Each P40 can pull 250watts. This card would otherwise burn. 🔥😬
@treniotajuodvarnis5503
@treniotajuodvarnis5503 Ай бұрын
@@AI-HOMELAB just check with gpu-z readings, it has sensors for pcie slot power draw and 8pin seperately, as I said my 3090s draw abowe 60w from pcie and the rest from 8pin
@AI-HOMELAB
@AI-HOMELAB Ай бұрын
Hm... this seems like a fair point. I didn't encounter problems until now and tested 4GPUs on one bifurcation board with full load (P40ties). But I'll be sure to check the readings from gpu-z.
@AI-HOMELAB
@AI-HOMELAB Ай бұрын
@treniotajuodvarnis5503 I guess it should not be a problem with the P40 cards (250w TDP, which two 8pin pcie connections can deliver at 150w each). The 3090 I got has three 8pin connections (450w). (Should be fine) But I'll double check. This is an important consideration. If not now, then going forward. Thank you for pointing that out.
@markspade3899
@markspade3899 23 күн бұрын
Bro how many concurrent Inference this can handle at a time?
@AI-HOMELAB
@AI-HOMELAB 23 күн бұрын
That kind of depends on the model you want to serve. For instance: A7b model in q_4 will use around 3.5-4gb Vram. 96gb of vram /4gb per instance = 24 instances peak. But if you want a llama 3.2 90b model in q_4, then I can only serve 2 instances. ✌️
@stormk-1130
@stormk-1130 2 ай бұрын
For ia? are you making money with that atl east?
@AI-HOMELAB
@AI-HOMELAB 2 ай бұрын
ML is smth. I deeply enjoy. That's all. ;-)
@stormk-1130
@stormk-1130 2 ай бұрын
@@AI-HOMELAB I wish i have the money you have to just spend it in stuff lolol. good for you my man
@pietmulbregt3904
@pietmulbregt3904 2 ай бұрын
Do you have a link to the seller of your P40's? I was also looking to bey some but most sellers dont look very trustworthy.
@AI-HOMELAB
@AI-HOMELAB 2 ай бұрын
@@pietmulbregt3904 Sure, I bought two of them from: www.ebay.ch/str/preylserverstore?mkcid=16&mkevt=1&mkrid=5222-175126-2357-0&ssspo=4l33mqqmt1s&sssrc=3418065&ssuid=r31-oziosiw&widget_ver=artemis&media=COPY And two from: www.ebay.ch/itm/404453313559?itmmeta=01J48S54714CC569V56AT2XHC5&hash=item5e2b4bcc17:g:73sAAOSwst5k5MaC But prices are at an insane level right now. I'd really wait until they drop below 200 USD again.
@pietmulbregt3904
@pietmulbregt3904 2 ай бұрын
@@AI-HOMELAB Indeed the prices are high, for the GPU i will be waiting for them to drop a bit but the rest of the system i can get together if the price is right. Do you have any links off the other materials you bought, like the ram is way overpriced where i am looking, and the case i can only find for 300, so a bit much. Sorry for all the questions :)
@AI-HOMELAB
@AI-HOMELAB 2 ай бұрын
@@pietmulbregt3904 Sure thing 👍 For the Ram for instance: www.ebay.ch/itm/364108339085 The case: It seems that EVGA discontinued the production of it. Sorry about that. You might need to search a bit. At the moment you're probably better off looking for an alternative. I am going to create a video about an even bigger build with a 6U server case (mining rig). But this needs a bit more modifications. But it only costs 60USD. Do you need the links fir the Motherboard and the CPU?
@pietmulbregt3904
@pietmulbregt3904 2 ай бұрын
@@AI-HOMELAB I found the combo on Ebay and have ordered them, it is a shame the 10gb version is not available or so expensive. I can only fit a short rack in my server closet so a rackmounth for a gpu server is no option, but i hope to find a good case for it all. worst case i will buy a new one for full price but till than i am limited to a 3060 12 gb for my llm models so the pressure is on :)
@AI-HOMELAB
@AI-HOMELAB 2 ай бұрын
I got one quick idea that I did think of myself. I found a person who mods 2080TIs to 22gb of Vram. It's more on the expensive side, but cheaper than a rtx3090. But I haven't ordered from them. They also sell on Ebay. But depending on your budget - perhaps an option for the GPUs. 2080ti22g.com/collections/all This cost 435 Swiss Francs/ around 510 USD. It's not the deal of your life but the best I know from my short time memory 😅. Perhaps M40ties could be worth a try. Going to test'em in the coming weeks. I did have aome running earlier but I want to retest before giving solid recommendations. This only cost around 120 USD. Be careful to go for the 24gb Vram one though if you consider it.
@fabriziobertoglio7342
@fabriziobertoglio7342 28 күн бұрын
They are more powerful of 2 rtx 3090?
@AI-HOMELAB
@AI-HOMELAB 28 күн бұрын
No 😅, not in terms of TFLOPs. But I can load much bigger models. See, if you can afford a 4x 3090 setup, go for it. It will give you better inference speed the same amount of Vram at a higher bandwidth and more Tflops and NVLink. This configuration is for people wanting to load big models at a budget.
@Ela-t9k9d
@Ela-t9k9d 2 ай бұрын
Can you try and show us llma 3.1 405b model?
@AI-HOMELAB
@AI-HOMELAB 2 ай бұрын
This is going to be the "can it run crysis" of the OS community. 😂 Sure! I can do that. Will be there in the coming 3-5 days. I think i can get it to perhaps run at 1T/s with some GPU offload. I will also try out Mistral Large 2 (123b).
@fisherbu
@fisherbu 2 ай бұрын
Can LM Studio work with 4 P40s together?
@AI-HOMELAB
@AI-HOMELAB 2 ай бұрын
@@fisherbu Yes, without any problem. It uses all the Vram you have got as a pool.
@ageell2004
@ageell2004 Ай бұрын
do you plan to do guides for setting up the platform such as installing the LLM and make all gpu running in concurrently.
@AI-HOMELAB
@AI-HOMELAB Ай бұрын
More or less yes - but with restrictions. Certain older GPUs just are not supported with Cuda. But Vulcan works. But yes, I plan to do that.
@legendofachilles
@legendofachilles 2 ай бұрын
Could you please share the specs for the 12 GPU build that you are planning? Will you be using a different motherboard? I will try to duplicate you build.
@AI-HOMELAB
@AI-HOMELAB 2 ай бұрын
Hey, I am sorry but I don't yet want to give my specs away for that build. I am also still thinking about which GPUs I want to combine or if I'll use only one type of GPUs. ✌️ Hope you understand that. =)
@AI-HOMELAB
@AI-HOMELAB 2 ай бұрын
I am still waiting for my DDR5 Ram to arrive - hope it won't take an eternity. ✌️
@legendofachilles
@legendofachilles 2 ай бұрын
I understand. In the meantime, I'm building out the above in a 3U rack chassis with server PSUs.
@saiarunprakash
@saiarunprakash Ай бұрын
p40 card for 580Swiss franc is really cheap is it for 1 card or 4 cards
@AI-HOMELAB
@AI-HOMELAB Ай бұрын
I paid 130CHF per P40. But the cheapest I can find one right now is double that price. I'd either wait, make a price offer to the cheapest seller (don't go over 200CHF, the card is not worth more) or go for another card. The P109 only has 16gb of Vram. But its Vram has a better bandwith. This card can be found for below 200 USD. The RTX 4060 TI with 16gb of Vram can be found for 400CHF new. There are also modded 2080TIs for 430CHF on the web. (22gb of Vram).
@legendofachilles
@legendofachilles 2 ай бұрын
Which motherboard are you using?
@AI-HOMELAB
@AI-HOMELAB 2 ай бұрын
www.gigabyte.com/ch/Enterprise/Server-Motherboard/MZ01-CE1-rev-1x ✌️
@legendofachilles
@legendofachilles 2 ай бұрын
@@AI-HOMELAB Thanks!
@asocialconsciousness8535
@asocialconsciousness8535 2 ай бұрын
can you run the new llama 3.1 405b model on this thing?
@AI-HOMELAB
@AI-HOMELAB 2 ай бұрын
I mean kind of. But not at a usable speed (0.3 T/s) at Q2. 😅 If you want ti run llama 3.1 405b at usable inference speed your budget will need to increase quite a bit. I'd probably at least go with RTX 3090ties or higher. VLLM would deliver better results. But either way: You'd need at least double the Vram to get smth. usable.
@asocialconsciousness8535
@asocialconsciousness8535 Ай бұрын
@@AI-HOMELAB What about one of thoes boards the crypto miners use? ya know the ones that you can plug a dozen gpus in? would something like that be a good option for an ai server? I have no idea why type of system ram they support tho
@AI-HOMELAB
@AI-HOMELAB Ай бұрын
@asocialconsciousness8535 You basically already hinted at it. First point: You also want to have a powerfull CPU to not get bottlenecked by that. Secondly: I've not seen many mining Motherboards which support huge amounts of Ram. I only found once up to 64gb or so. The PCIE Slots are mostly 1x instead of 16x, which could make loading models a bit slow and I don't know if they support SLI if you want to go with sth. like a 3090. But I've not tested that personally. Could be an interesting idea for a test though. ✌️
@jakobro1794
@jakobro1794 2 ай бұрын
the RAM itslef isn't that much tough. Nowadays SAP servers like BW run on multiple TB's
@AI-HOMELAB
@AI-HOMELAB 2 ай бұрын
True - but also really expensive 🙈 & if you want to run inference of models like llama 405 in 8bit, this is enough. The problem lies in having 405gb of Vram 🙈.
@jakobro1794
@jakobro1794 2 ай бұрын
@@AI-HOMELAB got you, virtual ram is different. In SAP these big databases over 1tb run bare metal on the server…
@ianmatejka3533
@ianmatejka3533 2 ай бұрын
What is the case called?
@AI-HOMELAB
@AI-HOMELAB 2 ай бұрын
EVGA DG 87 ✌️
@sondrax
@sondrax Ай бұрын
Question for those of you who might know / or have an educated guess!?: Interested in running Llama 3.1 70b Q8. 1 would 6 P40 config make the 3 tokens per second any better? 2. If you did have a capable motherboard, if you added 2 3090s to the 4 P40s… could all cards ‘work together’ to improve the 3 tokens per second? Aiming for 8 to 10 per second w/o breaking the bank… Thanks guys!
@AI-HOMELAB
@AI-HOMELAB Ай бұрын
1. No, as the weights are distributed to all gpus and run serially. You'd have to try the special branch of VLLM that us specialised for P40ties. Sorry didn't yet get to test that. 2. Yes, you may combine P40ties with 3090ties. This is what I do with my "temporary test rig" right now. You first need to install the Tesla drivers, reboot, then install the studio drivers and reboot. It will make it faster. But if you want a big speedup the eatio needs to be favorable for the 3090ties. One 3090 and a P40 will result in an inference speed between the two. Does that help?
@AI-HOMELAB
@AI-HOMELAB Ай бұрын
Ah yes, Qwen 2.5 72b seems to be massively more capable than Llama 3.1 70b from my first vibe tests. Other "cheapish" options for GPUs: -Rtx 4060 Ti (16gb) (found some in a sale for 400USD (but not suitable for training- only inference) -Modded RTX 2080TIs for around 500USD from the bay area with 22gb of Vram (but I have never ordered one - so proceed with caution).
@careinc5705
@careinc5705 28 күн бұрын
Is there a reason why you use gguf models instead of exl2 model quants? Gguf models get loaded into RAM and inference is done via the graphics cards while exl2 models run on the graphics card and inference is done there aswell.
@AI-HOMELAB
@AI-HOMELAB 28 күн бұрын
@careinc5705 Adoption rate: Right now there are many more using gguf bc. this quantization is much more available, more UIs use it and easier to adopt (ppl. without a gpu can use it). I do completely agree with you: If you use the oobabooga text gen. webui for instance, you get better performance with exllama2 quants. I guess I'll do a "quant formats" series in the future. Right now I am just using what most people use.
@youtubeccia9276
@youtubeccia9276 2 күн бұрын
and what will you train with that? :)
@AI-HOMELAB
@AI-HOMELAB 9 сағат бұрын
Hey there! This setup is really not ment for training models. Due to the lack of tensor cores it wouldn't be very efficient for training models. We test models in inference. This is what this setup was supposed to do. The upcoming server build also is tailored towards inference. We plan to do machines for training models in the future. But for now, well 😅✌️
@Keeeeeeeeeeev
@Keeeeeeeeeeev 2 ай бұрын
try arctic's 15k rpm server fans instead of 6k version
@AI-HOMELAB
@AI-HOMELAB 2 ай бұрын
If the GPUs are under full load, the air throuput is needed. But yes, it would be enough for only running llms.
@Keeeeeeeeeeev
@Keeeeeeeeeeev 2 ай бұрын
@@AI-HOMELAB have you considered p100? Ok 16gig X GPU but hmb2 should definitely help in performances. I understand it's much more $ for 96 gigs vram though
@AI-HOMELAB
@AI-HOMELAB 2 ай бұрын
@@Keeeeeeeeeeev I have not personally compared them. But it would be an idea for the future. I don't think you get a massive speed benifit in Inference speed. Yes it has a bit more Flops. But I think if you're interested in training models, the P100 is the way to go. Maybe I will compare them in a future Clip. ✌️
@AI-HOMELAB
@AI-HOMELAB 2 ай бұрын
@@Keeeeeeeeeeev But if you get one, let me know how it goes! =)
@bazooza
@bazooza 3 ай бұрын
mate you need to water cool this... and put an external radiator so you won't hear the noise
@AI-HOMELAB
@AI-HOMELAB 3 ай бұрын
@@bazooza Perhaps that will be a follow up in the future. ;-)
@bazooza
@bazooza 3 ай бұрын
​@@AI-HOMELAB By "external," I mean actually mounting the radiator outside, similar to a household AC system, and it would be best to have the pump located outdoors as well. This way, there would be zero noise, even at 100% load.
@AI-HOMELAB
@AI-HOMELAB 3 ай бұрын
@@bazooza Would be interesting - but it's not an option right now, as we live in an appartement (which itself not even has an ac).
@Keeeeeeeeeeev
@Keeeeeeeeeeev 2 ай бұрын
@@bazooza where do you get water blocks?
@Vermino
@Vermino 2 ай бұрын
@@Keeeeeeeeeeev You want to look for a GTX 1080 waterblock. I actually was able to buy an AIO and mount it on my 1x P40. but ID-COOLING ICEFLOW 240 VGA was discontinued earlier this year.
@krizdeep
@krizdeep 2 ай бұрын
How would this fare with 405B llama ?
@AI-HOMELAB
@AI-HOMELAB 2 ай бұрын
Well, let's say your fine if you have A lot of time 😂. I am creating a video right now where I Benchmark big models on quality and speed. Should be online in about 8hours or at latest tomorrow. While I can run Llama 405b I'd not advice to do so on such a machine - no way around, you just need much more Vram. (200gb at least...) I will create a computer with that amount soon though.
@krizdeep
@krizdeep 2 ай бұрын
Looking forward to it…
@AI-HOMELAB
@AI-HOMELAB 2 ай бұрын
It will be there in about 2-3h. ✌️
@이민희-o7v
@이민희-o7v 2 ай бұрын
Add a little more budget How about changing it to MZ01-CE0? this motherboard support to dual 10Gbase-T ports
@AI-HOMELAB
@AI-HOMELAB 2 ай бұрын
I only use it for myself. This provides enough speed for me. The network port wasn't my priority for a "home usage system". But depending on what you want to do it would be important.
@edengate1
@edengate1 25 күн бұрын
Why not Quadro GPUs from nvidia?
@AI-HOMELAB
@AI-HOMELAB 25 күн бұрын
More expensive, have a look at the P6000 pricing. ✌️
@edengate1
@edengate1 25 күн бұрын
@@AI-HOMELAB p6000 is cheaper than 4090 and has 24vram... i dont understand
@AI-HOMELAB
@AI-HOMELAB 25 күн бұрын
Why do you compare it to the 4090? Obviously it's cheaper than the 4090. But it's more expensive than the P40, which we are talking about here in this video. The only two advantages the P6000 has against the P40 is dedicated cooling and an HDMI out. But at doible the price of a P40, that's a bad deal. But uf you can get it at a great price, sure go for it. =)
@edengate1
@edengate1 25 күн бұрын
@@AI-HOMELAB sorry i confused your video with another.
@AI-HOMELAB
@AI-HOMELAB 25 күн бұрын
@edengate1 No worries. =)
@강은희-z8c
@강은희-z8c 2 ай бұрын
what about ram clock speed??
@AI-HOMELAB
@AI-HOMELAB 2 ай бұрын
2400 mhz ✌️
@이민희-o7v
@이민희-o7v 2 ай бұрын
Thank u ​@@AI-HOMELAB
@Ruhgtfo
@Ruhgtfo 17 күн бұрын
好屌 hao diao 👍
@ZHuang-ei1eq
@ZHuang-ei1eq Ай бұрын
How are these 4 GPUs connected to each other?
@AI-HOMELAB
@AI-HOMELAB Ай бұрын
They are connected through PCIe (no SLI or NVLINK). This is no problem in a usecase like LLMs. As different layers of the model are loaded to different GPUs. One could also use VLLM, there is a branch in Github. But I did not test it so far. For CNNs you can either do batchjobs, which are then handed to each GPU individually or use AsyncDiff from Github (github.com/czg1225/AsyncDiff). Does that help?
@ZHuang-ei1eq
@ZHuang-ei1eq Ай бұрын
@@AI-HOMELAB I have 4 Nvidia 3090 connected through PCIe, but it seems 24GB is the threshold. I can't work anything larger than 24GB. Even I have 4x24GB in total. I might have done something wrong then.
@AI-HOMELAB
@AI-HOMELAB Ай бұрын
@ZHuang-ei1eq Okey, do you see al four RTX 3090 in your task manager? Otherwise you might have not activated "Above 4g decoding" in your bios. Which UI do you use? LM Studio or another?
@ZHuang-ei1eq
@ZHuang-ei1eq Ай бұрын
@@AI-HOMELAB I use Python. Good news to know that activate "Above 4g decoding" links my multiple GPUs. Let me try it.
@ZHuang-ei1eq
@ZHuang-ei1eq Ай бұрын
@@AI-HOMELAB Motherboard is wsx299 sage.
@k1tajfar714
@k1tajfar714 Ай бұрын
This is what i exactly want to buiild! thank you for the great video! i have actually zero bucks. i hope you can give me recommendations on this one. i currently have spent 200 bucks on a X99-WS motherboard so i'll have 4PCIe at full x16 if i dont hook any m.2 NVMEs i assume. so thats awesome, it also has a 10C/20Th Xeon low profile, 32GB ram, and a okayish CPU cooler. i have already saved 200$ more and i dont know what to do. i was going to buy one or two P40s and later upgrade to 4 of them. but now i cannot even afford one, they're there for almost 300 bucks im afraid. one option is to go with M40s but im afraid they're trash for LLMs and specifically for Stable Diffusion stuff. they're pretty old. i'm lost i'd love to get help from you. if you thought you'd have time we can discuss it. i can mail you or anything you'd think is aapropriate. special thanks. K1
@AI-HOMELAB
@AI-HOMELAB Ай бұрын
Now first of all: Glad you liked my content! Now for your questions: I'd have some ideas: One thing you can try is to offer a seller (ebay) 180 Dollars for a P40 and hope he agrees. The P40 definitely is not worth 300 USD. It's a great card but not at that price. Another alternative is to go for used RTX3060ties (12gb variant). These sell for below 200 USD used. You could also save up for a used 4060ti with 16gb Vram. At a higher Budget: There are modded 2080ties with 22gb of Vram (I have not tested these). But you can get them from ebay or at this page: 2080ti22g.com/?srsltid=AfmBOorOh__PZo1EAQBxBZOrtkJTebxgPel91P70JOhYLQi21A1Unatw (Now again - I haven't had one of these in my hands). Nvidia P100 gpus with 16gb of VRAM still are sold for 200USD and I think you could get these a bit cheaper if you write with the seller. Please pay attention- these also come in 12gb versions. The great thing about the P100 is memory bandwidth (double the P40). I'd not go with the M40. It's just quite a bit slower. It works, but it's slower. And I don't think it's going to be supported for much longer. But if you can get it at a great price (below 100 USD), it could be a starting point till GPU prices fall again with the 5000series. Did that somewhat help? You can reach out to me on X (@AI_Homelab). ✌️
@k1tajfar714
@k1tajfar714 Ай бұрын
@@AI-HOMELAB Thank you! Fantastic. You're awesome. thank you for your valuable time and great and complete response! much thanks go to you 2 guys you're Awesomests. thanks. i'll reach out if i needed further assistance your answer addresses everything and quite complete. thanks for your time and response once again.
@MarkoThePsycho
@MarkoThePsycho 14 күн бұрын
I don't see it running Crysis...
@AI-HOMELAB
@AI-HOMELAB 13 күн бұрын
😂 still the best quote.
@echbob6301
@echbob6301 2 ай бұрын
How much now tesla P40 ?
@AI-HOMELAB
@AI-HOMELAB 2 ай бұрын
Around 300 USD. Prices have risen insanely. I'd try to make an offer on Ebay, or wait until they fall again. The card is not worth more than 200 USD at max.
@echbob6301
@echbob6301 2 ай бұрын
@@AI-HOMELAB its now 400 USD, tf xD
@AI-HOMELAB
@AI-HOMELAB 2 ай бұрын
Yeah, they fluctuate quite heavily. Please don't buy one at that price. You can get modded 2080TIs for around 450USD with 22gb of Vram. These will outperform a P40 without a sweat. You could also get a 4070ti with 16gb used for about the same.
@marywel7615
@marywel7615 2 ай бұрын
PLA filament heats up and deforms in a hot car.
@AI-HOMELAB
@AI-HOMELAB 2 ай бұрын
True but this is on the intake side and it will never reach the temperature you get in a hot car. This really is no problem. I've been using PLA parts in these settings for more than a year. It's fine. The comparison doesn't really hold up here.
@AI-HOMELAB
@AI-HOMELAB 2 ай бұрын
But if you are worried: Print it in PET-G or ABS. I'd still go with PLA from my experience. 🙂
@jcirclev2
@jcirclev2 2 ай бұрын
for the cost and power, a 3060 would have been a better way to go. I have a DL380 with 512 ram and 8tb of disk, 2 3060s all included $1300 and i have 40 cores to back it up.
@AI-HOMELAB
@AI-HOMELAB 2 ай бұрын
This is just one possible setup. But I really prefer to not be too much dependent on the CPU and have enough Vram. If you only need to offload some layers, this makes the LLMs so much slower. Got 32 cores here. Works fine. ;-) But I hope Nvidia will release a 5060TI with 16gb or 24gb of Vram. This would be an instant buy. ;-) I'd generally stay away from lower V-Ram Cards as Asynchronous Usage of GPUs for CNNs is not widely adopted. (There are some promising early steps though).
@og_tokyo
@og_tokyo 2 ай бұрын
a 97mm blower fan working at 60% is less noisy and can cool better (better static pressure) than 2x 40mm fans killing themselves at 100% workload.
@AI-HOMELAB
@AI-HOMELAB 2 ай бұрын
@@og_tokyo You could go with a 97mm fan. But I'd have problems with the space when running 4 P40ties without riser cables. ✌️
@og_tokyo
@og_tokyo 2 ай бұрын
@@AI-HOMELAB guess that's a fair point, it's considerably longer. Honestly, those 40mm just don't have the force to cool those fins well and just are a nightmare, the 15k model works better but even louder. With that quad setup, a single 80mm can cool all 4 if you build a custom quad linked 80mm fan mount, throw on a thick high pressure 80. Just saying, would be a cool setup.
@AI-HOMELAB
@AI-HOMELAB 2 ай бұрын
Will try it out in some weeks ✌️ Should be doable.
@mz8755
@mz8755 15 күн бұрын
The cost of this setup is not a lot cheaper than quad 3090. Only slightly now. And its not half but a quaeter of 3090 performance esp on 70b. Then you have the hassle of cooling and the noise issue. 3090 can go with all th consumer ecosystem gor watercooling etc. It seems to me a bad parh now. It was cool once.
@AI-HOMELAB
@AI-HOMELAB 15 күн бұрын
Well, where I live RTX 3090 still go for around 900 USD or more. But I agree that a quad 3090 setup will get you much better inference speed, just at a much higher cost. The P40ties have risen in price, which is a big problem. But there are better alternatives. We will show you some of them in the coming weeks. ✌️
@33butterzucker33
@33butterzucker33 Ай бұрын
VRAM ist King.....400 Tacken..Wahnsinnspreis fuer die Schweiz. Hoffentlich sorgt Apples M4 dafuer, dass andere Hersteller unter Druck kommen und man nicht mehr auf solch stromfressende Loesungen angewiesen ist.
@AI-HOMELAB
@AI-HOMELAB Ай бұрын
Absolut gleicher Meinung. Wir brauchen viel mehr Konkurrenz in diesem Bereich. Wenn der M4 wirklich mit bis zu 512gb Ram veröffentlicht wird und so vlt. 128gb oder 256gb Macbooks etwas vernünftiger im Preis werden dann hätten zumindest zwei Grafikkartenhersteller auch im "Enterprise-Sektor" Konkurrenz. Ich kann gar nicht darauf warten dass Vram, welches eigentlich auf Chip Basis günstig ist, endlich auch in Consumer Cards faire Preise erhält. Eine 80gb Vram Karte dürfte in keinem Fall mehr als 2000 Fr. kosten. Selbst dann machen die Hersteller noch viel Gewinn... Die Modding Szene beweist dass es möglich wäre. * English version * Absolutely agree. We need much more competition in this area. If the M4 is indeed released with up to 512GB of RAM, and if 128GB or 256GB MacBooks become somewhat more reasonably priced, then at least two graphics card manufacturers would face competition even in the enterprise sector. I can’t wait for VRAM, which is actually cheap at the chip level, to finally get fair prices in consumer cards. An 80GB VRAM card should in no case cost more than 2000 CHF. Even then, manufacturers would still make a lot of profit… The modding scene proves that it’s possible.
@AI-HOMELAB
@AI-HOMELAB Ай бұрын
But I still love to build these monstrosities! =D
@33butterzucker33
@33butterzucker33 Ай бұрын
@@AI-HOMELAB ..ja, ich habe hier zB einen HP Z620 rumstehen und einen Fujitsu Primergy (made in Germany), beide mit 2 CPUs....
@VamosViverFora
@VamosViverFora 7 күн бұрын
It’s extremely cheap. Really. Yesterday I was looking for a really simple GAMING PC (but for AI hobbyist home research) and with 64GB RAM, a RTX 4060TI (16GB, 128bits) and a not so strong processor it was around 1700 EUR.
@AI-HOMELAB
@AI-HOMELAB 7 күн бұрын
I got my 4060 TI for 380CHF. You can find 64gb of ECC DDR4 Ram for around 60CHF on ebay/ or non ECC. If you are not scared of building the computer yourself you should be able to create smth. along these specs for sub 1000 Euro. Don't go for prebuilds. If you ho for a mix of new and used parts this should be doable/ probably even with new parts. =) But yes, P40ties used to be a great deal for your money.
@AI-HOMELAB
@AI-HOMELAB 7 күн бұрын
Right now: P100 are still "cheap" at around 180 Euro, MI60 (Amd) with 32gb can be found for around 250Euro, modded RTX 2080TIs with 22gb of Vram can be found for around 450Euro. ✌️
Upgrading My RTX 3070 from 8GB to 16GB: Is It Worth It? -  Part 1
15:10
What's in the clown's bag? #clown #angel #bunnypolice
00:19
超人夫妇
Рет қаралды 21 МЛН
怎么能插队呢!#火影忍者 #佐助 #家庭
00:12
火影忍者一家
Рет қаралды 39 МЛН
龟兔赛跑:好可爱的小乌龟#short #angel #clown
01:00
Super Beauty team
Рет қаралды 61 МЛН
Build an Ai Server for less than $1k and Run LLM's Locally FREE
15:47
Dylan The Technogizguy
Рет қаралды 9 М.
INSANE Homelab Storage Server
57:21
Digital Spaceport
Рет қаралды 24 М.
Nvidia Tesla M40 DIY: Budget-Friendly 24 GB VRAM GPU for AI!
18:09
i added a Google AI chip in my Home Server…
11:37
TechHut
Рет қаралды 76 М.
DUAL 3090 AI Inference Workstation
13:43
LetsRTFM
Рет қаралды 7 М.