NVIDIA Update Solves CUDA error (but very slow) -Train Dreambooth, SDXL LoRA with Low VRAM

No video

NVIDIA Update Solves CUDA error (but very slow) -Train Dreambooth, SDXL LoRA with Low VRAM

Рет қаралды 5,633

How to

Күн бұрын

Пікірлер: 31

@cebesten 5 ай бұрын

I must say that after 3 weeks surfing yt and trying all advices, lowering settings, after your settings on the nvidia controller, is working dreambooth for the training with my rtx 4060! thanks!

@AI-HowTo 5 ай бұрын

You are welcome, slow, but might be uesful in some cases...hopefully soon we get more optimized training methods that require less memory and that are faster.

@brottop2089 5 ай бұрын

Thanks!

@hatuey6326 9 ай бұрын

excellent tuto !!! thanks !!

@AI-HowTo 9 ай бұрын

You're welcome!

@user-kk2ve1un4u 9 ай бұрын

Could you please make for us a Tutorial regarding Dreambooth Training ?.

@AI-HowTo 9 ай бұрын

unfortunately it takes a long time to do a Dreambooth tutorial, which I may not find at the time being, will do so however if I could in the near future.

@Dante02d12 9 ай бұрын

How is performance with this trick with 6GB VRAM and 16GB RAM? Personally, I get OOM trying to load a SDXL model.

@AI-HowTo 9 ай бұрын

I got acceptable speed with dreambooth training with 1024x1024 image sizes, but not with SDXL, anything related to SDXL seems too slow to be any practical, hopefully in their new driver update things get better, because this is they first update that allows memory offloading, so i expect it gets better in future updates and faster.

@AI-HowTo 9 ай бұрын

I will update the video title and thumbnail to include (Very slow), to avoid deleting the video, and add a note inside description regarding that, because I was expecting performance to be better than this too.

@HestoySeghuro 9 ай бұрын

Question here. And finetunning for VRAM 24 GB? Did this do the same offloading VRAM?

@AI-HowTo 9 ай бұрын

Yes, if enabled, when VRAM is not enough, it will offload the work to the RAM despite what kind of work you are doing with Stable diffusion, finetuning, training, generation... it is really a great update from NVidia.

@AI-HowTo 9 ай бұрын

it only offloads what it needs to the RAM only, so VRAM will still get like 100% utilization, but things will get slower since RAM is alot slower, but still, very useful.

@CMak3r 9 ай бұрын

The guide from NVIDIA officially states that this option DISABLES system memory fallback. Which means that in cases when your GPU run out of VRAM, it will no longer be able to access your RAM to continue generation and will crash. Why this feature even exist is another question.

@AI-HowTo 9 ай бұрын

Before this update, Disable was always the case, so we always get Out of Memory Error, now this is the official description of the three options available as given by the Driver option description: "Typical usage scenarios: • Set to "Driver Default" for the driver recommended behavior , • Set to "Prefer No Sysmem Fallback" to prefer allocations not fallback to system memory.This option sets the preference to returning an Out Of Memory error instead of utilizing system memory to allow allocations to succeed. Choosing this option may cause application crashes or TDRs." • Set to "Prefer Sysmem Fallback" to prefer fallback to system memory while under memory pressure" it is very useful for locally training small size models only, since it takes a long time...but when trying to train something big, you know that the time it takes is not worth it.

@luisff7030 9 ай бұрын

It's to remember us to purchase a new GPU

@marcozisa6317 9 ай бұрын

Thanks for the update but unfortunately it didn't work for me. Tested on a 40GB ram pc with NVIDIA RTX 2060 SUPER, 8GB of VRAM, the NVDIA driver was up to date. It get stuck right before the training start with the progress bar on screen set to 0% then it raises CUDA: OutofMemoryError

@AI-HowTo 9 ай бұрын

I see, possibly the driver is new and still has some issues, I set CUDA Sysmem fall back policy to (prefer system fall back) and the CUDA errors are no long, I only have 16GB Ram, and can see on performance that it is now using 80% of my RAM, even using some part of my internal Sharred Video CARD RAM too, but training was slow, very slow compared to Runpod, it was interesting that they worked it out, but the speed makes it impractical for me to use it effectively.... most likley they will do new updates soon, especially that this is a new feature.

@stefanocesaretti8962 9 ай бұрын

Will this let me use sdxl models as well other than just training? Will I be able to generate images with sdxl? Thanks im advance :) (didn't see the whole video yet sadly, so please excuse me if you already said this in your video)

@AI-HowTo 9 ай бұрын

yes, but speed will most likely be disappointing, if you are unable to run SDXL with --no-half-vae --medvram-sdxl options already, then this option should in theory allow you to run it... in general, if speed is not good enough, then using SDXL is not practical at all.

@stefanocesaretti8962 9 ай бұрын

@@AI-HowTo I'll give it a shot, just in case, are there any methods to speed it up?

@AI-HowTo 9 ай бұрын

not sure, but i think in future driver updates, they might make it faster, since this is new.

@generalawareness101 9 ай бұрын

I did this on my 4090 because I was stuck on 531.79 otherwise the moment I came close to the 24GB of my vram I fell to a crawl and most trainings I am at 23.4-23.6GB so oof. What I noticed is that it has a global setting. I want access to that setting and I bet I know where it lives, but to get at it is within another more master level program. I am being vague here on purpose because I don't want someone to read it and poof nothing works when they go in and start messing with stuff. I have not tried training yet but I am shortly going to which will be about 23.5GB normally (DB).

@generalawareness101 9 ай бұрын

Just tested and the new drivers let me BS16 vs BS14 for my max with 900mb free. I cannot go higher.

@AI-HowTo 9 ай бұрын

offloading can be slow, so if things are working with your current GPU, dont offload, disable it...this shoudl only be enabled for lower GPUs... despite that only a small portion is offloaded, the communication overhead between the RAM and VRAM makes it slow, hopefully in future updates to the driver, things get faster.... still for me it is interesting that i was able to train dreambooth or SDXL for first time on my 8GB graphics card, but seeing how long it takes, it is not worth it....only suitable to leave things running overnight for simple models.

@generalawareness101 9 ай бұрын

TBH, anything less than 16GB just don't try to train SDXL.@@AI-HowTo At 24GB I need a 32-40GB to sit as comfortable as I was in 2.1.

@AI-HowTo 9 ай бұрын

100% true.

@alexlux147 9 ай бұрын

Can this works with other AI model? I have a 3060 laptop with 6Gb and i'm interest in NeRF and Gaussian Splatting, small vram is a big problem for me.

@AI-HowTo 9 ай бұрын

I think NVidia has primarly updated the driver for stable diffusion, but in general, this should work for any other model that uses CUDA and gets CUDA memory errors, based on the feature names and description which is just (CUDA System memory fall back policy) ... I have not tested it however outside stable diffusion related models.

@akairas6 9 ай бұрын

i have the latest driver, but i don't find that option

@AI-HowTo 9 ай бұрын

I did a manual installation till that option showed up as in the video, this driver update was like three days ago I think, possibly in future releases things will get faster for CUDA ops... you could also google BIOS update too, because older BIOS might have limit on Driver version update, not sure though.