Proxmox GPU Passthrough LXC with Docker Ultimate Guide

Рет қаралды 4,024

Күн бұрын

Пікірлер: 87

@OutsiderDreams 3 күн бұрын

Thanks for the written instructions, but your website seems to replace double dashes "--" with single dashes "-" as well replaces single quotes with another character that looks like it, but linux does not lilke. It's a bit of a pain to have to carefully read every line and correct it before pasting it into the proxmox shell. Am I the only one having this issue?

@DigitalSpaceport 3 күн бұрын

Ill check into that its very possible some formatter thing is getting applied on the website! Thanks for letting me know, I'll make a note for folks until I can fix it up.

@OutsiderDreams 3 күн бұрын

@@DigitalSpaceport one quick search for "dearmor" (in the gpg --dearmom command) will show you what I mean. Unfortunately there's several other places I can't remember now. FYI, this was an issue on your previous GPU Passthrough with Proxmox and LXC/Docker written instructions.

@OutsiderDreams 2 күн бұрын

Another place I've had trouble with is copy and pasting the compose files (both ollama and openwebui) since all the dashes are converted to some weird symbol which looks like a dash, but is not recognized by the dockage editor. You have to manually change every instance of "-" with a regular dash, otherwise you get errors with dockage.

@DigitalSpaceport 2 күн бұрын

Yeah I found this issue also so it dies exist im spending some time solving it now. Thanks for bringing it up hope to have it corrected asap

@Uncle-Chew 8 сағат бұрын

Awesome work and guide there to get us started! Moved from my Ubuntu host to Proxmox again after seeing your video, as I couldn't figure out the proper way to get this to work previously. Thanks again buddy!

@ewenchan1239 3 күн бұрын

For the vast majority of my Linux stuff, I've switched over from running Linux VMs to running LXCs with maybe only a tiny handful of notable exceptions. One of the best things about using LXCs is that you can share multiple GPUs across multiple LXCs, which you can't do between VMs. Excellent video!

@DigitalSpaceport 3 күн бұрын

LXC is really growing in favor for me also as the space saving is signifigant on my nvme drives.

@ewenchan1239 3 күн бұрын

@@DigitalSpaceport YUP! If you are using ZFS and you have de-dup enabled, you may be able to realise even further on-disk space savings. This is already on top of the fact that the base LXCs themselves are already quite lean, and it really only grows more in size, when you install the various Linux packages, for whatever it is that you're trying to do. Pro tip: re: installing the NVIDIA drivers, in the LXC If you downloaded the NVIDIA drivers to a shared location (i.e. NOT in /root of the Proxmox host/node itself), what you can then do with LXCs is pass that shared location as a mount point into your LXC. Therefore; you can skip the "pct push" command to push the driver into your LXC, and by "mounting" the shared folder location (it uses lxcfs), you can keep ONE copy of the NVIDIA driver for even further space savings. But yes, I have also found that it has more efficient resource utilisation (RAM in particular), but also disk space as well because you don't have to pre-allocate a much larger disk that you would nominally need for a "full fat VM". It's been fantastic! And also, unlike a VM, if the LXC needs more RAM, then it will try to pull that. If a VM runs out of memory, it completely crashes.

@thenanook 3 күн бұрын

you are the best bro... i love your videos.... with timeline that rocks. i got my proxmox with radarr and plex, and windows vm with a rtx a2000 passthrough... and i wan to use with containers too, and voila, i got your video

@DigitalSpaceport 3 күн бұрын

Ive read it is possible to run a windows 11 LXC container, but have not tested it yet. I need to check into this more but that would be a fun video.

@OutsiderDreams 2 күн бұрын

@@DigitalSpaceport I second that. Getting Win11 running in an LXC with GPU passthrough would be awesome!

@DimonEx 2 күн бұрын

Thank you! I've spent couple days to configure same setup (except 1 GPU) for couple days until it started to work. Now going to try this setup.

@DimonEx 2 күн бұрын

I would add link to Nvidia drivers site to the guide.

@DimonEx 2 күн бұрын

Like that LXC container is now created via UI and not with the script

@DimonEx 2 күн бұрын

in both old and new guide I've noticed that for "Install the NVIDIA Container Toolkit" (in new guide as an example) --dearmor two dashes are replaced with one unicode symbol as well as single quotes in sed to a unicode quotes - thus it's failing when copy pasted. Some string formatting is happening (Google Doc or Word might do this).

@DimonEx 2 күн бұрын

Also I've switched to shell for tty in LXC container's Options so I do not have to input password all the time in Console.

@DimonEx 2 күн бұрын

though nvtop seg fault in shell, switched back to tty

@ronw6808 3 күн бұрын

Excelllent video, great explanation. Ran acrosss your channel recently and it’s quickly becoming my go to. Keep up the great work!

@DigitalSpaceport 3 күн бұрын

Thanks! Anything topic in particular you have been looking for content wise?

@ronw6808 3 күн бұрын

@@DigitalSpaceport Pretty much everything you’ve been covering in your videos as of late. AI, GPU’s and Proxmox.

@AntonisAsc 3 күн бұрын

Great video! To the point and step by step! Thank you

@Albandito77 2 күн бұрын

Have you tested using SLI with 2x2 3090s to evaluate the benefits if any vs parallelism on the 4x? I’m basically copying your setup and this was one of my considerations. Can punch in on the comments about your channel becoming a favorite. Amazing work you are doing 🎉

@Alex-rg1rz 3 күн бұрын

great video, saving that for later. thanks.

@beprivatecdblind7831 3 күн бұрын

Have you tried using the add device passthrough from the GUI, if you do that you don't need to edit the config file to add the GPU to the LXC container. It makes setting up GPU passthrough much easier. Just add "/dev/nvidia0:gid=195" etc for caps, uvm, modeset ....

@DigitalSpaceport 2 күн бұрын

Yeah but you have to still go get that info from the terminal so you are already there. The gui fields are a start in the right direction but still needs work. Is there any reason they cant have selection from a dropdown like VMs?

@OutsiderDreams 3 күн бұрын

@DigitalSpacePort, you missed the following command in the written instructions nvidia-ctk runtime configure --runtime=docker

@DigitalSpaceport 2 күн бұрын

TY!

@B_r_u_c_e Күн бұрын

Very good.. Thank you.

@aerotheory 3 күн бұрын

Great job.

@MM-vl8ic 3 күн бұрын

cool.... Thanks.... no container experience, and really fed up with VMs and GPUs (pretty much reverting to bare metal everything... so.... this might be total ignorance/misunderstanding the tutorial, but if I try/want to run two models{?} simultaneously voice/text/voice and image detection... is it two containers? sharing gpu{s}...... or one container able to run two models simultaneously? hopefully I'm using the correct terms.... side note: running pfSense {paid} on bare metal for 2 yrs.... very stable.... wasn't as a VM server2019, 4x1g-2/2x10g-2x40/56g port nics/Supermicro x10 srl-f/E5-2690v4/128g ecc/2-1T 980pro raid1 boot/Intel 8950-SCCP Quick Assist....

@DigitalSpaceport 3 күн бұрын

You can run it all following this video I did at the same time. You do need to account for total vram usage but yes multiple containers hooked into openwebUI kzbin.info/www/bejne/f3TCfXqjps-lr8k

@MM-vl8ic 3 күн бұрын

@@DigitalSpaceport thanks! for clarifying... be interested to see how the old pascals perform.... have an titan X and a 1080..... that are in limbo.... I thought that "tensor cores" were needed for AI, so was thinking my 3060 12G was the lowest starting point and my A4000 an upgrade.... Thanks again for helping me to wrap my head around this stuff...

@DigitalSpaceport 3 күн бұрын

These are some of the exact questions I had myself and there is a lot of knowledge gaps. You can rock a Pascal card no issues! Now it will eventually get phased out from driver support/cuda version but they do work. They also dont idle wattage great so keep that in mind also.

@DigitalSpaceport 3 күн бұрын

Forgot here is a video with a P2000 5GB gpu and Ive also tested my old 1070. Im not likely to buy for testing any more older GPUs but yeah its interesting indeed! kzbin.info/www/bejne/eXuVf3uFeLZlr7s

@MM-vl8ic 3 күн бұрын

@@DigitalSpaceport incentive to pull the nvme with win 10LTSC out and try your install on a Supermicro x10DAX with the TitanX i have just sitting idle....

@Caramel_poison 2 күн бұрын

Hey man just came across your channel and I know you’re the guy to ask this question. I have came into possession of 19 GPUs. I’m wondering what can I do with them. 3x 3090 EVGA GE FORCE RTX 4x3080 EVGA GE FORCE RTX 2x 2070 ti asus tuf games 3060 g force rtx 3060 ti asus 8x 6700 xt Radeon Deavil

@DigitalSpaceport 2 күн бұрын

Sell all but the 3090s. Buy another 3090. Quad GPU local ai rig.

@Caramel_poison 2 күн бұрын

@@DigitalSpaceport Would that be sufficient enough to train my own model on them? Or better question what the biggest model I can run on a rig like that?

@DigitalSpaceport 2 күн бұрын

To train a model 4090s are really the best. You also need full bandwidth gen4 pcie lanes and risers. A fast but modest core count threadripper ideally for that. To run models 4 3090s performs amazingly. You can run like nemotron 70b q8 fully on 4 24GB GPUs.

@Caramel_poison 2 күн бұрын

@@DigitalSpaceport Nice thanks for the info. What if I wanted to train some predictive models? LLMs are cool but I would like to start with maybe card counting of black jack or forecast weather models?

@DigitalSpaceport 2 күн бұрын

Yeah thats best handled by 4090s I think. They are super fast for training data. Ive yet to get a functional training run in, but ive had some failed experiments so far. When I figure it out, even on a small one, you can be sure there will be a video. Checkout llama 3.2 vision lora training as some keywords, its what im researching now.

@OutsiderDreams 2 күн бұрын

How do you manage the storage for all the LLM models? Since the models can get quite large (from tens to hundreads of GB) do you just grow the LXC container to larger and larger sizes to hold the ever growing docker images? Or do you pass through a storage pool from proxmox directly into the docker container? I would hate for the LXC backup to grow to TB in size just because I have many LLM models. A video exploring one or two ways of managing the storage of large LLM models wold be interesting.

@DigitalSpaceport 2 күн бұрын

Agreed I need to do a video on this!

@AntonisAsc 3 күн бұрын

Can you mix different NVIDIA models to work together or they have to be the same? eg I have an RTX3060 12Gb and a GTX1060 3GB, can they work together to split the load?

@DigitalSpaceport 3 күн бұрын

Yes they will work together and pool vram. Tested with 1070 and p2000 and 4090.

@AntonisAsc 2 күн бұрын

@DigitalSpaceport thank you for the reply. I will try it for fun

@hantuan107 3 күн бұрын

can you pls do amd gpu and igpu next?

@cracklingice 3 күн бұрын

What is the ceph repo?

@aerotheory 3 күн бұрын

Beginner question: can you have multiple gpu cards of different Nvidia types eg:GTX RTX with this setup?

@DigitalSpaceport 3 күн бұрын

You can mix and match generations, pcie gens, and vram amounts and it just works.

@aerotheory 2 күн бұрын

@@DigitalSpaceport ty, 😀

@OutsiderDreams 2 күн бұрын

Yes, you can mix GPUs, but your token generation speed will be limited by the slowest GPU in the pool.

@squoblat 3 күн бұрын

Any idea how to get this working with Nvidia GRID? I've been going round in circles for a while, can't seem to actually find how to get a copy of the Nvidia virtualisation drivers for generic Linux KVM.

@DigitalSpaceport 3 күн бұрын

No but your not the first person I have heard of chasing GRID but its just too much of a pita for me to get into when this works acceptably well.

@squoblat 3 күн бұрын

@@DigitalSpaceport aaaaaany chance you could give it a punt, for science? :D

@PeteyCarcass Күн бұрын

So I got this all setup yesterday, passthrough was working and transcoding in Jellyfin as well. Shut down my server overnight and everything I spent yesterday doing was just... gone?! Could someone explain how this could've happened 'cause I'm at a total loss for words.

@DigitalSpaceport Күн бұрын

Like the LXC container was gone?

@PeteyCarcass Күн бұрын

@@DigitalSpaceport No, like all my settings and stuff. Some of the files, like the Nvidia-drivers were left in my root-node but the installation was gone and lots of stuff I had added to get it all running is also gone, directories that I created are missing. I have absolutely no idea, my only theory is that I've unwittingly updated the Proxmox-kernel or something. It kind of feels like my whole system was rolled back. Not being that great att Linux doesn't help but I guess doing it all over is a learning experience, just don't want to risk something like this happening again.

@DigitalSpaceport Күн бұрын

A few things to check. Make sure your not running as privilaged LXC. You should setup an LXC for just the AI stuff and then use other LXCs for segregating other services. Id run those as pure LXCs when you can (tteck helper scripts, rip) for things like home assistant, nextcloud. Finally did you when you installed the nvidia drivers in the root, not lxc, use the --dkms flag and make sure you clicked the yes on intramfs rebuild option? Inside the LXC --no-kernel-module install and you will have functional cuda as long as both driver versions are the same. Also Im going to do a video on it soon but please use bind mounts to store stuff that is not core to the containerized lxc OS like model files, RAG folders.

@RafaelXZ5 18 сағат бұрын

I've encoutered an issue I haven't quite found out how to solve. openwebui's search request to searxng uses the filetype=json on the request. and my searxng keeps responding forbidden. google has not been much help either. have you ever come across something similar?

@DigitalSpaceport 4 сағат бұрын

Did you follow along with the steps from here? kzbin.info/www/bejne/f3TCfXqjps-lr8ksi=unI1WfPsTuyK4NXw&t=261

@RafaelXZ5 Сағат бұрын

@@DigitalSpaceport Thanks! I was following this one. the vision was my next video do follow along. great stuff!!

@IvinerLimug 3 күн бұрын

I have Ollama running in LXC on an Nvidia card, when I try to use the same video card in another container (stable diffusion), ollama stops seeing the video card and starts working on the processor, only restarting the container helps, how can this be fixed?

@IvinerLimug 3 күн бұрын

I'm sorry, it looks like I didn't use the LXC configuration well. Now it looks like everything is working together!

@IvinerLimug 3 күн бұрын

but no, again ollama decided to use the CPU even after configuring the configs, while the stable diffusion container is active

@DigitalSpaceport 3 күн бұрын

The VRAM is being exhausted would be my first guess. SD and especially flux can eat a lot of vram depending on the added loras. Ive had this happen on my single GPU rigs often

@squoblat 2 күн бұрын

My Nvidia driver installation doesn't look like yours, it warns me about UEFI, signing the kernel module and then when I do nvtop it tells me there isn't a GPU to monitor, despite lspci grep NVIDIA showing both of my video cards. This is incredibly frustrating. I'm at the point where I'm going to have to do a fresh Proxmox installation and start again. Edit: update initramfs -u also tells me there is no /etc/kernel/proxmox-boot-uuids found

@DigitalSpaceport 2 күн бұрын

You followed the written guide? Its verified from scratch, it does work. You are not seeing it in the proxmox root correct? What color is the screen for proxmox when you boot?

@squoblat 2 күн бұрын

@@DigitalSpaceport yep, this is all as root on the host. If I plug a screen into the server, it's just black with white text (shell). I've done everything else from the web interface so it's exactly the same. I'm going to start over, I followed your last tutorial from an older video previous to this one and that got me to the same place. I have been at this for 3 days and have yet to get to a place where I can do an nvtop command and get anything other than an error. I have downloaded the Linux 64 driver for the correct card twice, just in case that was it, but the screens I get from the Nvidia installer throw all kinds of shit at me compared to what you get. I've no idea why.

@DigitalSpaceport 2 күн бұрын

Do you have above 4G decode enabled in the motherboard? The proxmox boot screen should be black with gray letters in the middle. If for some reason it is blue, then it is booting in bios mode. Also are you sure you have installed the pve kernel headers and done an intramfs update prior to installing the driver with --dkms flag?

@squoblat 2 күн бұрын

@@DigitalSpaceport Yes to both. Not sure will check the BIOS settings, I was getting a blue option screen when I first set Proxmox up, but I haven't seen that screen since I got the web portal running. Will start over and see what happens.

@squoblat 2 күн бұрын

@@DigitalSpaceport Just to be clear - do I want above 4g decode on or off? If it's an option.

@cracklingice 3 күн бұрын

I want to lose my mind.

@ewenchan1239 3 күн бұрын

Stupid question -- what's the benefit to using the fp16 model vs. the default Qwen model?

@DigitalSpaceport 2 күн бұрын

Precision of stored information in the layers. FP16 is the highest and the default in ollama is q4. I can tell some degredation at q4 typically. q8 is what I typically run myself. Double the model size at each quant listed here, double the VRAM load. Effective to think of it as lossy compression.

@ewenchan1239 2 күн бұрын

@@DigitalSpaceport For the three questions (plus the warmup) tests that you give -- does the different quantizations give higher quality/more accurate results at the expense of speed?

@DigitalSpaceport 2 күн бұрын

Yes. Most observable in q4 to q8. Its less apparent to me q8 to fp16.

@ewenchan1239 2 күн бұрын

@@DigitalSpaceport Gotcha. Thank you. Are the differences such that Q4 would just produce garbage or unusable results or would the Q4 results still be ok, just not as good as Q8?