What Exactly Does NVLink do for Machine Learning (featuring Exxact Workstation w/dual 3090s)

Рет қаралды 44,283

Күн бұрын

Пікірлер: 63

@MagnumCarta 10 ай бұрын

That's pretty neat that Exxact sent you a loaner system to test out! I never worked there but drive past it on my commute. Small world!

@dinoscheidt 3 жыл бұрын

Would love to see you try things out with CUDA. Not only from the perspective of what those GPUs can do, but also to show how much abstraction there actually from a python library to the GPU + what it actually means to go “low level”

@HeatonResearch 3 жыл бұрын

Okay, adding that to my list. I rather enjoy accessing CUDA directly.

@ProjectPhysX Жыл бұрын

It's a pity that NVLink/SLI are entirely inaccessible to OpenCL. Makes it useless for non-proprietary software. At least PCIe bandwidth is rapidly increasing and becoming a good alternative, yet PCIe peer-to-peer transfer for Nvidia GPUs is also not accessible to OpenCL, so everything has to go through CPU memory once. PS: 8:30 20MB is not nearly enough to saturate PCIe/NVLink transfer. Want you're seeing here is only the transfer latency.

@weylandsmith5924 3 жыл бұрын

The bottom line, from the point of view of a ML practitioner that's not going to access CUDA directly (or at least not so often), is: 1. NVlink won't male a big difference with data parallelization (although some slight advantage will still be appreciated). 2. NVlink *will* make a substantial difference with MODEL parallelization, for obvious reasons. @Jeff: you should definitely do a video in which you show this practically.

@Tsardoz 2 жыл бұрын

I have been scratching my head over this. I agree. Most modelling I see at surface level allows data parallization and not model parallelization (unless it is some custom thing) so NVLink will make no difference at all. This video does not explain this so I gave it the thumbs down. Please correct me if I am wrong.

@derwolf9668 Жыл бұрын

God sent this video !!! Thanks Jeff!

@YaYa-qg5vb 2 жыл бұрын

Thanks for your dual gpus series. Since no nvlink port canceled on RTX4090, would you think it is still efficient to build a 4090x2 workstation for deep learning

@amanda.collaud Жыл бұрын

good question! Well the devs from nvidia superseeded SLI -> NVLINK -> PCIE5 Memory allocation (it starts with the lovelace quadro cards and is thought to be implemented on RTX blackwell consumer cards.) Dont beat me on the name of this tech, i read it on some article but forgot its actual name. PCIE5 motherboards dont need SLI bridges , they are super fast anyway, i got one for just 180€, using 2 rtx 3090 in multi gpu mode, super fast.

@shiro836_ Жыл бұрын

@@amanda.collaud what motherboard is it?

@arogov Жыл бұрын

@@amanda.collaud But RTX 3090 supports PCIe 4.0 only, so it wouldn't work faster in PCIe 5.0

@fredrikhansen75 3 жыл бұрын

Always inspiring and educational - thank you!

@vtrandal Жыл бұрын

Excellent video. I also have two RTX 3090 gpu cards connected with NVLink. My goal is to use pycuda as you have done, but I also want to scale up to the cloud (probably using AWS as you have shown). I think I am on a good learning path. I want the experience of prototyping my code using my two RTX 3090s and NVLink and then scale it up to the cloud to see how the speed scales with more 3090s. Like you have done in this video I will not be using TensorFlow or PyTorch.

@richarddow8967 2 жыл бұрын

Thanks for explaining a lot of this.

@wentworthmiller1890 3 жыл бұрын

Some questions (naïve probably): 1 - They look like custom built 3090s - what are the temperatures (GPU, Mem, Hotspot) when both are under full load? 2 - Any impact of the lower GPU blocking the upper one's airflow? 3 - Will 3090 and 3080 on the same system help in sharing the training load?

@AOTanoos22 2 жыл бұрын

Can only answer 3. you can not Nvlink two different GPU‘s and if you use a 3090 and 3080 as two Independent GPU‘s via Pcie express slots, your 3090 will throttle down its speed/power and memory to the speed and memory of the 3080, as if you had two 3080‘s. So it makes no sense using 2 different tiers of GPUs to train a model. As your model can only train as fast as your slowest GPU allows to.

@wentworthmiller1890 2 жыл бұрын

@@AOTanoos22 Thanks! :)

@Haley2077 Жыл бұрын

I have one question. For Deep Learning, rtx 3090 sli vs rtx 4090 single which build is better? Thanks for your advice

@robertobokarev439 10 ай бұрын

Of course 4090, if you're making a build right? Pure performance is always better than through bridge. If money is no limiter then rtx 6000 ada.

@StevanNetto-qg7gx 2 ай бұрын

Hi Jeff, I'm very outdated with the gpu based machine learning training. By connecting 2+ cards with nvlink, would we overcome the limitations of model complexity that comes a single gpu's memory size?

@Miesiu 7 ай бұрын

3:30 - What H/W go I need even with 1CPU but for 3pcs eg. RTX 3090 ?

@nullpointerexception1685 3 жыл бұрын

Can pytorch take advantage of the NVlink? Use the cards as one 48G GPU?

@HeatonResearch 3 жыл бұрын

Really, no software solution can combine two GPUs into the same logical unit. NVLink just provides a very fast conduit to keep the local memories of the GPUs synced. Often, though, the way training is being batched, this can give you 2X speedup for that 2nd GPU.

@nullpointerexception1685 3 жыл бұрын

@@HeatonResearch but I’ve heard nvidia advertising something about TCC or memory pooling which can effectively combine the VRAMs together?

@HeatonResearch 3 жыл бұрын

@@nullpointerexception1685 They are using the same memory address space, but you still must divide the processing across all of the GPUs, which is not automatic.

@nullpointerexception1685 3 жыл бұрын

@@HeatonResearch alright, thanks for your reply. I guess it’s better to get a RTX8000 than 2 3090 in that case.

@artemsult 9 ай бұрын

HI! What do you think, if there are 4 3090 cards and nvlinks in pairs, will it be possible to optimize such a scheme when there are 2 nvlink arrays?

@ywueeee Жыл бұрын

hey jeff, i want to figure out this can be used with stablediffusion image generation such that automatic1111 uses both my GPUs and not just one? can you make a video please?

@НиколайКол-е2и 3 жыл бұрын

so 2xGPU w NVLink could be ~x100 faster then 2xGPU without? (on some tasks)

@TheFarmacySeedsNetwork 7 ай бұрын

Thanks for the great explanation... I have 1 Quadro M5000 in... 2nd coming and SLI.... planning to switch to Nvlink eventually... Mostly do editing and big number crunching stuff. Installed a game to try it.... got bored playing in 5 minutes and went back to building... lol

@IntenseGrid 11 ай бұрын

Does this still work with the latest linux drivers? Will it work with the 3090TI cards?

@RebelBreed888 Жыл бұрын

Does it matter what OS you're using? I can't get one of my 3080s to initialize. Would it be better just to run a threadripper pro for computational power versus a dual GPU setup?

@jonfe Жыл бұрын

what is the best way to improve connection between four 3080TI gpus ? something like nvlink or infiniband?

@fourteen_ljw 2 жыл бұрын

Hi sir, thank you for your sharing. Can you also share a link for the motherboard that supports nvlink? (It's looks like normal Z690 ATX does not support)

@xpim3d 2 жыл бұрын

Nice explanation! Both GPUs should be on x16 PCI-E slots, right? Also, since the spacing varies according to the MB manufacturer, some models won’t be suited to do this, right? Thank you :)

@skinnyboy996 3 жыл бұрын

Can you please make video training with tensor cores?

@Edward-un2ej 2 жыл бұрын

Although the speed of transfer is very large, the time without Nvlink is also acceptable compared with training time.

@talha_anwar 3 жыл бұрын

I am bit confused, do both gpu need to be same. Like it can be 3070 and 3060

@D12075 3 жыл бұрын

Not only do they have to be the same model (3090 to 3090) but they have to be the exact same brand/model as well. So I have two 3090s from EVGA, and they literally stick out from the motherboard at different lengths because one is the ftw3 ultra version. Given that the SLI attachment is a fixed piece of metal and doesn't have any play to it, you have to have two identical cards for it to connect properly.

@pavellelyukh5272 3 жыл бұрын

@Daniel Vachalek is one xc3 and the other ftw3?

@D12075 3 жыл бұрын

@@pavellelyukh5272 Yes, you have to have two of the exact same make/model. Either two XC's or two FTW's. Which, right now, is almost impossible to source at msrp.

@FisicoAlexandreBonatto 2 жыл бұрын

Thank you for posting this video. I recently assembled a dual-cpu system with three GPUs, being two (nv-linked) A6000 and one A4500, which is being used for academic research purposes, and I found your channel a very accessible source of information. As a beginner, may I ask your advice on the following matter: right now, I have one A6000 installed in a slot handled by CPU0,, and the other A6000 in a slot handled by CPU1 (A4500 is handled by CPU1 as well). Would it be better to have both A6000 GPUs (which are connected through nvlink) handled by the same CPU?

@yaminadjoudi4357 3 жыл бұрын

Please sir how can i combine the outputs of 2 different deep learning (lstm and CNN) models to get a new 3rd model?

@nealschoeler6463 3 жыл бұрын

I'm interested in what you think about the NVIDIA Jetson Xavier NX or more really the Jetson Mate (cluster). While it clearly doesn't sit as a direct competitor to modern Ampere GPU's since the GPU onboard is a Volta generation, there are other benefits. Namely 50Gb/s memory, NVDLA engines onboard, 6 Arm, 384 Cuda, and 48 tensor cores at just 10/15w per card. for 24 arm, 1536 cuda, 192 tensors, and 8 NVDLA engines @ about $2000 drawing just 90w the SoM's come in two varieties, one with 8gb and an sd card slot(dev kit ~$400) and one with 16gb(~$500). You get 4 system-on-modules with Arm CPU, Volta GPU, NVDLA Engines on the die, sharing access to the fast lpDDR4x ram onboard. Onboard gigabit ethernet and 5 port switch linking them together making it a tidy little cluster. One reason I think this is an interesting option is that at the price point (full Mate) it provides decent local compute with a low ongoing cost as an alternative to buying cloud time for maybe low priority training. And, it allows for practice with directing data flow for parallel processing.

@jasb78 3 жыл бұрын

How fast can you run Microsoft FSX 2021 with NVLink enabled?

@igorchurakov5585 Жыл бұрын

Does anyone have experience using NVLink 4 slots from 3090 series on workstation cards like A4500/5000/6000 ? Nvidia support says it won't work. However those cards are the same generation and have exactly the same amount of pins and placement on NVLink. I know people do that the other way around and it's all good. I was wondering if there is really any difference between NVLink or Nvidia wants me to pay them for their own NVLink which is 2/3 Slots and doesn't fit my Motherboard

@AOTanoos22 3 жыл бұрын

I have a hard time understanding, how you can connect more than one GPU with another via Nvlink. The GPU only has 1 Nvlink slot, right ? So lets say you have 4 A6000's...you connect the 1st and the 2nd GPU with an Nvlink bridge and connect the 3rd and the 4th one with another Nvlink bridge, right ? So now the 1st/2nd and the 3rd/4th GPU are not connected ? An explanation would be very appreciated !

@Tsardoz 2 жыл бұрын

You cannot. He does not explain this so deserves a thumbsdown.

@ajey214 2 жыл бұрын

NVlinks for rtx 3 series and A series are different than NVlinks for rtx 2028 ti. Depending on the NVlink type you buy, they allow upto 4 GPUs to be connected or even more.

@vb433 3 жыл бұрын

(Dual-kit 32 x 2=64 vs Two single-kit 32+32=64) use of two single-kit 32+32=64 will this affect the performance?

@marcelocoi 3 жыл бұрын

Please professor, make a video showing how to improve enhance AI (topaz labs) running at quadro NVIDIA card. Thanks.

@thewizardsofthezoo5376 Жыл бұрын

SLI is one GPU with other GPU undr? SLI was 2 GPUs working together it didn't scale linearly.

@sergiogalvez4603 4 ай бұрын

perfect you explain

@returncode0000 Жыл бұрын

Does anyone successfully using nvlink on two 3090‘s running ubuntu? Please share your configuration below. I‘m currently build my own DL box with originally in mind using nvlink with exact two 3090‘s butI‘m not sure if it will work out on pytorch?

@HeatonResearch Жыл бұрын

I did a series of videos on a dual 3090 Ubuntu workstation from Exxact. Pytorch did fine. kzbin.info/www/bejne/amGaYnRnodplr9E&ab_channel=JeffHeaton

@returncode0000 Жыл бұрын

@@HeatonResearch Thanks man for the video, this helps a lot! I think I ll build that as a clone :-) (with ubuntu and pytorch running). Great channel, so much value for all of us 👍