That's pretty neat that Exxact sent you a loaner system to test out! I never worked there but drive past it on my commute. Small world!
@weylandsmith59242 жыл бұрын
The bottom line, from the point of view of a ML practitioner that's not going to access CUDA directly (or at least not so often), is: 1. NVlink won't male a big difference with data parallelization (although some slight advantage will still be appreciated). 2. NVlink *will* make a substantial difference with MODEL parallelization, for obvious reasons. @Jeff: you should definitely do a video in which you show this practically.
@Tsardoz2 жыл бұрын
I have been scratching my head over this. I agree. Most modelling I see at surface level allows data parallization and not model parallelization (unless it is some custom thing) so NVLink will make no difference at all. This video does not explain this so I gave it the thumbs down. Please correct me if I am wrong.
@dinoscheidt3 жыл бұрын
Would love to see you try things out with CUDA. Not only from the perspective of what those GPUs can do, but also to show how much abstraction there actually from a python library to the GPU + what it actually means to go “low level”
@HeatonResearch3 жыл бұрын
Okay, adding that to my list. I rather enjoy accessing CUDA directly.
@derwolf9668 Жыл бұрын
God sent this video !!! Thanks Jeff!
@ProjectPhysX Жыл бұрын
It's a pity that NVLink/SLI are entirely inaccessible to OpenCL. Makes it useless for non-proprietary software. At least PCIe bandwidth is rapidly increasing and becoming a good alternative, yet PCIe peer-to-peer transfer for Nvidia GPUs is also not accessible to OpenCL, so everything has to go through CPU memory once. PS: 8:30 20MB is not nearly enough to saturate PCIe/NVLink transfer. Want you're seeing here is only the transfer latency.
@fredrikhansen753 жыл бұрын
Always inspiring and educational - thank you!
@vtrandal Жыл бұрын
Excellent video. I also have two RTX 3090 gpu cards connected with NVLink. My goal is to use pycuda as you have done, but I also want to scale up to the cloud (probably using AWS as you have shown). I think I am on a good learning path. I want the experience of prototyping my code using my two RTX 3090s and NVLink and then scale it up to the cloud to see how the speed scales with more 3090s. Like you have done in this video I will not be using TensorFlow or PyTorch.
@richarddow89672 жыл бұрын
Thanks for explaining a lot of this.
@TheFarmacySeedsNetwork6 ай бұрын
Thanks for the great explanation... I have 1 Quadro M5000 in... 2nd coming and SLI.... planning to switch to Nvlink eventually... Mostly do editing and big number crunching stuff. Installed a game to try it.... got bored playing in 5 minutes and went back to building... lol
@YaYa-qg5vb2 жыл бұрын
Thanks for your dual gpus series. Since no nvlink port canceled on RTX4090, would you think it is still efficient to build a 4090x2 workstation for deep learning
@amanda.collaud Жыл бұрын
good question! Well the devs from nvidia superseeded SLI -> NVLINK -> PCIE5 Memory allocation (it starts with the lovelace quadro cards and is thought to be implemented on RTX blackwell consumer cards.) Dont beat me on the name of this tech, i read it on some article but forgot its actual name. PCIE5 motherboards dont need SLI bridges , they are super fast anyway, i got one for just 180€, using 2 rtx 3090 in multi gpu mode, super fast.
@shiro836_ Жыл бұрын
@@amanda.collaud what motherboard is it?
@arogov Жыл бұрын
@@amanda.collaud But RTX 3090 supports PCIe 4.0 only, so it wouldn't work faster in PCIe 5.0
@Haley2077 Жыл бұрын
I have one question. For Deep Learning, rtx 3090 sli vs rtx 4090 single which build is better? Thanks for your advice
@robertobokarev4398 ай бұрын
Of course 4090, if you're making a build right? Pure performance is always better than through bridge. If money is no limiter then rtx 6000 ada.
@wentworthmiller18903 жыл бұрын
Some questions (naïve probably): 1 - They look like custom built 3090s - what are the temperatures (GPU, Mem, Hotspot) when both are under full load? 2 - Any impact of the lower GPU blocking the upper one's airflow? 3 - Will 3090 and 3080 on the same system help in sharing the training load?
@AOTanoos222 жыл бұрын
Can only answer 3. you can not Nvlink two different GPU‘s and if you use a 3090 and 3080 as two Independent GPU‘s via Pcie express slots, your 3090 will throttle down its speed/power and memory to the speed and memory of the 3080, as if you had two 3080‘s. So it makes no sense using 2 different tiers of GPUs to train a model. As your model can only train as fast as your slowest GPU allows to.
@wentworthmiller18902 жыл бұрын
@@AOTanoos22 Thanks! :)
@Edward-un2ej2 жыл бұрын
Although the speed of transfer is very large, the time without Nvlink is also acceptable compared with training time.
@StevanNetto-qg7gxАй бұрын
Hi Jeff, I'm very outdated with the gpu based machine learning training. By connecting 2+ cards with nvlink, would we overcome the limitations of model complexity that comes a single gpu's memory size?
@jonfe Жыл бұрын
what is the best way to improve connection between four 3080TI gpus ? something like nvlink or infiniband?
@sergiogalvez46032 ай бұрын
perfect you explain
@Miesiu5 ай бұрын
3:30 - What H/W go I need even with 1CPU but for 3pcs eg. RTX 3090 ?
@НиколайКол-е2и3 жыл бұрын
so 2xGPU w NVLink could be ~x100 faster then 2xGPU without? (on some tasks)
@xpim3d2 жыл бұрын
Nice explanation! Both GPUs should be on x16 PCI-E slots, right? Also, since the spacing varies according to the MB manufacturer, some models won’t be suited to do this, right? Thank you :)
@FisicoAlexandreBonatto2 жыл бұрын
Thank you for posting this video. I recently assembled a dual-cpu system with three GPUs, being two (nv-linked) A6000 and one A4500, which is being used for academic research purposes, and I found your channel a very accessible source of information. As a beginner, may I ask your advice on the following matter: right now, I have one A6000 installed in a slot handled by CPU0,, and the other A6000 in a slot handled by CPU1 (A4500 is handled by CPU1 as well). Would it be better to have both A6000 GPUs (which are connected through nvlink) handled by the same CPU?
@nullpointerexception16852 жыл бұрын
Can pytorch take advantage of the NVlink? Use the cards as one 48G GPU?
@HeatonResearch2 жыл бұрын
Really, no software solution can combine two GPUs into the same logical unit. NVLink just provides a very fast conduit to keep the local memories of the GPUs synced. Often, though, the way training is being batched, this can give you 2X speedup for that 2nd GPU.
@nullpointerexception16852 жыл бұрын
@@HeatonResearch but I’ve heard nvidia advertising something about TCC or memory pooling which can effectively combine the VRAMs together?
@HeatonResearch2 жыл бұрын
@@nullpointerexception1685 They are using the same memory address space, but you still must divide the processing across all of the GPUs, which is not automatic.
@nullpointerexception16852 жыл бұрын
@@HeatonResearch alright, thanks for your reply. I guess it’s better to get a RTX8000 than 2 3090 in that case.
@artemsult7 ай бұрын
HI! What do you think, if there are 4 3090 cards and nvlinks in pairs, will it be possible to optimize such a scheme when there are 2 nvlink arrays?
@RebelBreed888 Жыл бұрын
Does it matter what OS you're using? I can't get one of my 3080s to initialize. Would it be better just to run a threadripper pro for computational power versus a dual GPU setup?
@IntenseGrid10 ай бұрын
Does this still work with the latest linux drivers? Will it work with the 3090TI cards?
@ProximusRegentАй бұрын
Prox was here
@ywueeee Жыл бұрын
hey jeff, i want to figure out this can be used with stablediffusion image generation such that automatic1111 uses both my GPUs and not just one? can you make a video please?
@fourteen_ljw2 жыл бұрын
Hi sir, thank you for your sharing. Can you also share a link for the motherboard that supports nvlink? (It's looks like normal Z690 ATX does not support)
@skinnyboy9963 жыл бұрын
Can you please make video training with tensor cores?
@nealschoeler64633 жыл бұрын
I'm interested in what you think about the NVIDIA Jetson Xavier NX or more really the Jetson Mate (cluster). While it clearly doesn't sit as a direct competitor to modern Ampere GPU's since the GPU onboard is a Volta generation, there are other benefits. Namely 50Gb/s memory, NVDLA engines onboard, 6 Arm, 384 Cuda, and 48 tensor cores at just 10/15w per card. for 24 arm, 1536 cuda, 192 tensors, and 8 NVDLA engines @ about $2000 drawing just 90w the SoM's come in two varieties, one with 8gb and an sd card slot(dev kit ~$400) and one with 16gb(~$500). You get 4 system-on-modules with Arm CPU, Volta GPU, NVDLA Engines on the die, sharing access to the fast lpDDR4x ram onboard. Onboard gigabit ethernet and 5 port switch linking them together making it a tidy little cluster. One reason I think this is an interesting option is that at the price point (full Mate) it provides decent local compute with a low ongoing cost as an alternative to buying cloud time for maybe low priority training. And, it allows for practice with directing data flow for parallel processing.
@jasb783 жыл бұрын
How fast can you run Microsoft FSX 2021 with NVLink enabled?
@yaminadjoudi43573 жыл бұрын
Please sir how can i combine the outputs of 2 different deep learning (lstm and CNN) models to get a new 3rd model?
@AOTanoos223 жыл бұрын
I have a hard time understanding, how you can connect more than one GPU with another via Nvlink. The GPU only has 1 Nvlink slot, right ? So lets say you have 4 A6000's...you connect the 1st and the 2nd GPU with an Nvlink bridge and connect the 3rd and the 4th one with another Nvlink bridge, right ? So now the 1st/2nd and the 3rd/4th GPU are not connected ? An explanation would be very appreciated !
@Tsardoz2 жыл бұрын
You cannot. He does not explain this so deserves a thumbsdown.
@ajey2142 жыл бұрын
NVlinks for rtx 3 series and A series are different than NVlinks for rtx 2028 ti. Depending on the NVlink type you buy, they allow upto 4 GPUs to be connected or even more.
@talha_anwar3 жыл бұрын
I am bit confused, do both gpu need to be same. Like it can be 3070 and 3060
@D120753 жыл бұрын
Not only do they have to be the same model (3090 to 3090) but they have to be the exact same brand/model as well. So I have two 3090s from EVGA, and they literally stick out from the motherboard at different lengths because one is the ftw3 ultra version. Given that the SLI attachment is a fixed piece of metal and doesn't have any play to it, you have to have two identical cards for it to connect properly.
@pavellelyukh52722 жыл бұрын
@Daniel Vachalek is one xc3 and the other ftw3?
@D120752 жыл бұрын
@@pavellelyukh5272 Yes, you have to have two of the exact same make/model. Either two XC's or two FTW's. Which, right now, is almost impossible to source at msrp.
@igorchurakov5585 Жыл бұрын
Does anyone have experience using NVLink 4 slots from 3090 series on workstation cards like A4500/5000/6000 ? Nvidia support says it won't work. However those cards are the same generation and have exactly the same amount of pins and placement on NVLink. I know people do that the other way around and it's all good. I was wondering if there is really any difference between NVLink or Nvidia wants me to pay them for their own NVLink which is 2/3 Slots and doesn't fit my Motherboard
@vb4333 жыл бұрын
(Dual-kit 32 x 2=64 vs Two single-kit 32+32=64) use of two single-kit 32+32=64 will this affect the performance?
@marcelocoi3 жыл бұрын
Please professor, make a video showing how to improve enhance AI (topaz labs) running at quadro NVIDIA card. Thanks.
@thewizardsofthezoo5376 Жыл бұрын
SLI is one GPU with other GPU undr? SLI was 2 GPUs working together it didn't scale linearly.
@whoseai33972 жыл бұрын
SLI could not transfer data
@DumbledoreMcCracken20 күн бұрын
I would love to have a million x million x million point grid, where every point is a 3x3x3x3 matrix. Still not enough RAM.
@returncode0000 Жыл бұрын
Does anyone successfully using nvlink on two 3090‘s running ubuntu? Please share your configuration below. I‘m currently build my own DL box with originally in mind using nvlink with exact two 3090‘s butI‘m not sure if it will work out on pytorch?
@HeatonResearch Жыл бұрын
I did a series of videos on a dual 3090 Ubuntu workstation from Exxact. Pytorch did fine. kzbin.info/www/bejne/amGaYnRnodplr9E&ab_channel=JeffHeaton
@returncode0000 Жыл бұрын
@@HeatonResearch Thanks man for the video, this helps a lot! I think I ll build that as a clone :-) (with ubuntu and pytorch running). Great channel, so much value for all of us 👍
@orthodoxNPC2 жыл бұрын
nvlink, another way of reinventing RDMA but with extra licensing fees