Dell PowerEdge R720 GPU Deep Learning Upgrade: Installing Dual Tesla P40s with NVIDIA Drivers

Рет қаралды 8,915

Күн бұрын

🚀 Join me on an exciting journey into the world of high-performance computing! In this video, I'll guide you through the step-by-step process of upgrading my Dell PowerEdge R720 server for deep learning excellence. Watch closely as we install not one, but two powerful NVIDIA Tesla P40 GPUs, transforming my system into a powerhouse for handling intense AI workloads.
🔧 Together, let's explore the intricacies of the installation process - from physically integrating the GPUs into my Dell PowerEdge R720 to configuring the essential settings for optimal performance. I'll be your expert companion throughout this tutorial, ensuring a smooth experience as we enhance my server's capabilities for deep learning tasks.
🔍 Additionally, I'll walk you through the installation of the latest NVIDIA drivers, ensuring that my dual Tesla P40s are ready to tackle the most demanding machine learning tasks. Stay ahead of the curve in the rapidly evolving field of deep learning by following along with my comprehensive tutorial.
📈 Don't miss out on witnessing the full potential of my Dell PowerEdge R720 as we upgrade with confidence and harness the raw computational power of dual Tesla P40 GPUs. Hit that 'Subscribe' button, give this video a 'Like,' and share your thoughts or questions in the comments below. Let's empower my server for the future of deep learning together! 💻🔥 #GPUUpgrade #DeepLearning #DellPowerEdge #NVIDIADrivers #TeslaP40 #TechTutorial
📚 Additional Resources:
AI/ML/DL GPU Buying Guide 2023: Get the Most AI Power for Your Budget
• AI/ML/DL GPU Buying Gu...
AI/ML/DL with the Dell PowerEdge R720 Server - Energy, Heat, and Noise Considerations
• AI/ML/DL with the Dell...
Throttle No More: My Strategy for GPU Cooling in Dell PowerEdge
• Throttle No More: My S...
Dell PowerEdge R720XD GPU Upgrade: Installing Tesla P40 with NVIDIA Drivers
• Dell PowerEdge R720XD ...
Installing Tesla P100 GPU on Dell PowerEdge R720 Server with Driver Installation
• Installing Tesla P100 ...
Installing DUAL Tesla P100 GPU on Dell PowerEdge R720 Server with Driver Installation
• Installing DUAL Tesla ...
Other KZbin Video That Describes Cabling Issues In More Detail
• Discussing power cavea...
Links to Parts I Used
BETTER CABLING OPTION: a.co/d/hccc8m8
www.amazon.com...
www.amazon.com...
HOW TO GET IN CONTACT WITH ME
🐦 X (Formerly Twitter): @TheDataDaddi
📧 Email: skingutube22@gmail.com
💬 Discord: / discord
Feel free to connect with me on X (Formerly Twitter) or shoot me an email for any inquiries, questions, collaborations, or just to say hello! 👋
HOW TO SUPPORT MY CHANNEL
If you found this content useful, please consider buying me a coffee at the link below. This goes a long way in helping me through grad school and allows me to continue making the best content possible.
Buy Me a Coffee
www.buymeacoff...
Thanks for your support!

Пікірлер: 64

@theoldrook Ай бұрын

So glad I didn’t just yoink a couple extra cables from my modular power supply from my desktop pc. Thanks for the updated product link!

@TheDataDaddi 25 күн бұрын

Sure! Just wish I would have know about it when I made the video originally. Glad it helped you though!

@dleer_defi 2 ай бұрын

I just had an R730xd arrive at my house today. I want to do a deep learning build. How would you rate the dual Tesla P40s after a few months of use?

@TheDataDaddi 2 ай бұрын

Hi there. Thanks for the question! I am really happy with them. I still think that they are the best GPUs for the price at this current moment.

@Daed2001 8 ай бұрын

i've switch to newer model 3090 after using p40 since pytorche and tensorflow do not support these model on newer version anymore

@TheDataDaddi 8 ай бұрын

Hi there. Thank you for the feedback. I am currently using the newest version of Pytorch and the p40s I have seem to be working fine. I cannot speak for Tensorflow though. Could you please provide the version of each you have installed so that I can test them?

@Daed2001 8 ай бұрын

@@TheDataDaddi maybe because i'm using it as vgpu on vmware exsi just dont know why. I was testing with cuda 11.8 and 12.1 version but both not working . My cuda verison is 12.0 and driver 525 for my p40 and it said "device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect" when i use t = torch.rand(10, 10).cuda() to test but i check and see CUDA enable is true and all thing working normal and torch.cuda.is_available is TRUE also

@TheDataDaddi 8 ай бұрын

Hmmm. That is strange indeed. I agree it is likely an issue from virtualizing the GPU. I had some issues with I was trying to setup Detectron2 in a docker container. First, I would recommend checking that the versions of CUDA, PyTorch, and the NVIDIA driver are compatible with each other. PyTorch has specific builds for different CUDA versions. Since you're using CUDA 12.0, ensure that your PyTorch version is compatible with it. Second, try running your code in debug mode to get more detailed error messages. This can be done by setting CUDA_LAUNCH_BLOCKING=1 in your environment. This setting makes CUDA operations synchronous and can provide more precise error messages. Let me know what the output is if you have to resort to option 2@@Daed2001

@Daed2001 8 ай бұрын

@@TheDataDaddi hi i've figure it out , because A Profiles using vApps so it is not working with CUDA after switching to Q profiles it is working now

@TheDataDaddi 8 ай бұрын

Glad you got it figured out!@@Daed2001

@minime9400 8 ай бұрын

8:31 No components where harmed in the making of this video. Rrrright?

@TheDataDaddi 8 ай бұрын

Lol. Yeah it looked pretty bad. I will admit, but nothing was harmed thankfully.

@kal9001 5 ай бұрын

Whats with the jank power cable set up when you can get EPS12V Male to Male cables cheap that won't won't have these bulky 2x8pin PCEe thing blocking the airflow...

@TheDataDaddi 5 ай бұрын

Hi there. Thanks for the comment! When I first made this video, I could not find these cables. Not sure if they just didn't exist or I just over looked them. I have since found exactly what you are talking about and added the link in the video description.

@kal9001 5 ай бұрын

@@TheDataDaddi Sorry if it sounded harsh, but I see these silly Y-splitters in tons of guides on adding GPUs to servers and they just seem ridiculous. Glad you got it sorted, I recently got an R720XD FULL of 2TB drives, got super lucky on it. Will be migrating all the stuff off my R510-12 tomorrow. When running 2x P40s are you able to get one load to run both cards together, or does it just allow you to run simultaneous loads?

@TheDataDaddi 5 ай бұрын

@@kal9001 Hey no worries. I didn't take it that way. I am always trying to find the best solutions so if I missed something I always appreciate comments letting me know. Its funny you say that. The whole time I was making this video I was like "this is super dumb. I can't believe there is not a better way to do this." Turns out there was. Lol.

@jimgore20 5 ай бұрын

What the info for the guy you talked about that make and sells custom cables? I didn't see it on here.

@TheDataDaddi 5 ай бұрын

As it turns out you can buy one that will work on Amazon. Here is the link: a.co/d/9IVyJRi

@jimgore20 5 ай бұрын

my builds a little different I'm putting 4 Tesla K80 24gb GPU into a Asus esc400 G3 but the one i got didn't come with any gup cable and it a 4pin x2 to a 8pin cable they used on the g3 but changed for the g4. I'm doing like to some and want to build my now AI like Javis from Iron Man that can stand allow in my lab without internet if need be.

@TheDataDaddi 5 ай бұрын

Gotcha. Well I hope this cable is what you need. It says it should work for the K80 and if the K80 is like the P40 and P100 it will use this cable. Its basically male to male 8 pin 12V EPS. I think that is awesome! I wish you the best of luck in that goal. Let me know once you get it working. One things to consider though is that the k80s. Also, one thing to keep in mind though with the K80s is that they are a dual GPU card at 12GB a piece. The memory pools are separate so it really behaves more like 2 separate GPUs from a memory perspective. Just wanted to throw that out there in case you were thinking you could use all 24 GB of VRAM contiguously (like I did originally) as this may have an impact on you ability to train and work with large LLMs like Llama2 for example. @@jimgore20

@cashbyclose 2 күн бұрын

cool stuff. Do you have any intrest in building ML models to analyze NASDAQ futures orderflow data. Not to determine where the price will end up long term. but to exploite short term bursts of volatility based on live stock market data. use ML to analyze agressive buying selling and to determine breakouts from short term consolidation. Would a ML model continue to train its self to get better based on the information you gave it?

@TheDataDaddi Күн бұрын

Hi there. Thanks so much for the comment! I love finance and investing almost as much as AI and computing. I am an avid investor and have tried my hand a few times at building automated trading systems. My last attempt was back when LSTMs were still popular. Anyway, to answer your question more directly, absolutely that sounds really interesting. I like the idea. I think that could be a good approach especially for short term trades. By itself, no. It would need to be retrained on new data in order to get better. However, if you had a base model and employed reinforcement learning of some kind the model could improve based on new data.

@cashbyclose Күн бұрын

@TheDataDaddi thanks for the reply. I followed you on another platform, the same profile Pic. Sierra chart or ninja trader software seems like the best bet for legit orderflow data, and full automation of orders stops and take profits. A few scalp trades on NQ could be 2000-4000+ a day once the account gets above 50k.

@sydneylivecamera 8 ай бұрын

Hey, great video, thanks for making it. I've been eyeing these off - there's quite a few Chinese Ebay bulk sellers. I can't really tell if it's standard for these to come with the 2x pcie power to 1 cable. Did yours arrive with cables? I want to put one in my Dell R7610 (rack workstation) which has the same chipset and processors as your R720 - on the fan topic if you ever find fan management software that will let you control this era dell in Windows, please let me know - my R7610 bios fan settings are: melting or freezing. If you make further videos with the P40 and P100 I would love to see a few benchmark tests on some kind of GUI or the normal test stuff in Windows - because I'm a simpleton and don't understand what I'm looking at in a terminal. Cheers!

@TheDataDaddi 8 ай бұрын

Hi there! So glad the video was able to help you. In my case, if I remember correctly did not receive any cabling with my p40s. I linked the cables I used in the video in the video description. Should be in the Additional Resources section. However, I also found a better cabling system that should work (I have not tested it yet though to be fair). Here is the link to that one: www.amazon.com/dp/B08N4BJL2J?psc=1&ref=ppx_yo2ov_dt_b_product_details With regard to the fans, I encountered a similar issue. I have made a video on how to fix this. However, it is geared towards linux users. You could take my same script/methodology and use it with Windows Task Scheduler or similar. Here is the link to the video on GPU cooling: kzbin.info/www/bejne/iIa6ZHSvatemebs Absolutely! As soon as I get a second and can figure out a meaningful way to benchmark these GPUs for AI/ML/DL I will post a video on that. I will also try to make it as OS agnostic as possible so that everyone can benefit. Thanks so much again for the feedback! I hope this helps!

@sydneylivecamera 8 ай бұрын

@@TheDataDaddi Thanks that's awesome. The rack workstation model that I have has a bit of a different tail-end to your chassis - it's got support for 2 (or maybe 3?) GPUs with 4 (or 6? don't want to open it up right this minute..) normal GPU power connections. Eg. I can pop my GTX 1650 in there with space and cabling for at least one more. It's a good chassis to look out for - though it struggles a bit with modern GPU card-size because I think it was designed for older slimmer Quadros as an offsite CAD or Broadcast machine. That long explanation is just to say that I think I need just the last bit that's 2x female pcie to eps. Thanks, embarrassingly I came across you video on fan control moments after asking. I need to get familiar with Linux, and thanks for the Windows Task Scheduler tip - I'll try that. Perfect - I feel entirely agnostic about OSes but I only know how to use some of them lol. Absolutely, a benchmark that speaks to the data-educated that you are targeting would be great. Additionally, for any audience who like to build/use computers but don't know ML (thinking selfishly here), I think a quick run through of Cinebench R15 and maybe a CPU bound X264 render vs a GPU based render in DaVinci Resolve or Premiere would be illustrative of the power of the machine you've built. Cheers and thanks for the tips!

@TheDataDaddi 8 ай бұрын

Gotcha. What chassis are you working with if you dont mind me asking? I am looking for other that might work as well or better than the R720. Awesome. Those are great recommendations. Also want to train and run inference on some standard models and compare execution times. Hopefully I can get that video out next month when I get a chance. Cheers and thanks so much again for the feedback!@@sydneylivecamera

@sydneylivecamera 8 ай бұрын

@@TheDataDaddi It's a Dell Precision R7610 - basically a Precision T7610 tower turned into a rack mounted model. Handy because they have 2 or 3 pcie 16 gen 3 slots. I think 3 of these cards would be a push, and there's a spline or two that may interfere with a single or double card, and might need modification. The later R7910 and particularly the R7920 might be better designed for newer, bigger GPUs, but obviously more expensive. It is noticeably quieter than comparable Supermicro 2U servers I have, too - I don't know if that's a precision model thing or a Dell thing. Oh, and if you get one and it acts weird - bypass the LSi 9271-8i raid card by going direct into the motherboard or swap - I have two of these and the raid card seems to cause trouble in both but maybe it's me or coincidence. Cheers, no trouble and thanks again for the video!

@TheDataDaddi 8 ай бұрын

Awesome! Thanks so much for the information. I'll start looking into these as well. Really appreciate it. Cheers!@@sydneylivecamera

@publicsectordirect982 4 ай бұрын

Hello I'm new to your channel. Really useful information thanks! I'm just getting into ml and was wondering if this set up would allow a 40b llm to be loaded? Or what might be the best solution to run 4x tesla p40? Thanks for any potential tips

@TheDataDaddi 4 ай бұрын

Hi there. Thanks so much for the comment and so glad you are enjoying the content! Take what I say with a grain of salt because I have not actually tried this myself, but for a 40 billion parameter model at half precision it would roughly require about 80GB to load the associated parameters then there will be some extra memory overhead for other process and space to load data to run the actual inference. So, to be safe lets approximate 100GB ish of VRAM will be needed. With 4 P40 GPUs (4 x 24 = 96GB) you would in theory be able to run inference locally. Caveat here is you would need to use model sharding to split the overall model across all of the GPUs. This should also be theoretically possible. One other thing to note is that the half precision of the P40s is very low. This will likely mean that you will have much longer inference times. Some other options, you could consider would be going with 4 3090s instead or going with 6 (or more) P100s. This should give you the VRAM you need for your use case while also providing better performance. It seems like the 6 p100s would be the best balance of cost and performance. You could also consider using a smaller model and running it at the full precision to better take advantage of the p40s design. So, it seems that for your use case you will need a minimum of 4 GPUs so you will need a different server than the one discussed in this video that can hold at least 4 2 slot GPUs. If you would like some recommendations, here please let me know.

@dylanmaniatakes 7 ай бұрын

What kinda performance are you seeing

@TheDataDaddi 6 ай бұрын

Hi there. Thanks so much for you feedback! I did not have a specific performance benchmark in this video. I simply wanted to get the most compute I could find for the least amount of money.

@theturtle32 7 ай бұрын

You're supposed to take out the riser card, install the GPU into the riser card, and then plug the riser card plus PCIe cards back into the motherboard all at once.

@TheDataDaddi 6 ай бұрын

Hi there. Thanks so much for you feedback! That is why I have seen others do it as well. However, it is has actually always been harder for me to do it that way because I can't see the slot that riser goes in. I appreciate the tip though. I will try it that way again next time and see if it is easier that way.

@theturtle32 6 ай бұрын

@@TheDataDaddi Upon opening my R730 and looking at this, I think that may actually only apply to the cage of small form factor card slots, so my apologies!

@TheDataDaddi 6 ай бұрын

No worries man. I appreciate the feedback either way.@@theturtle32

@ICanDoThatToo2 3 ай бұрын

I have a regular R720 myself, and all three risers come out. So yeah, you should be able to pull the riser, snap on the P40, then slide everything back in. FYI if the cards has connectors sticking out, those will hit the back of the server, and there's no good solution for getting those in. I'm also concerned that the power wires interfere with air flow from the little slot in the CPU shroud to the P40, as that's the only cooling the card gets.

@PardnerCLUTCH 8 ай бұрын

Did you have to purchase Nvidia's "vGPU" license? I have a Dell 7910 (rebadged r730) and the drivers fail to install and nvidia-smi results in "devices were found." The gpu is being detected...spci | grep -i nvidia shows a P40 on the PCI bus. Any pointers? Thanks!

@TheDataDaddi 8 ай бұрын

Hi there. Thanks for your question. I did not have to purchase Nvidia's "vGPU" license. To may knowledge this is only for GPU virtualization in the case of allow multiple VMs to share physical GPUs. Unless this is your goal I don't think you need this. I use docker containers for most of my deep learning project as well and have never had an issue. Are you trying to virtualize the GPU(s)?

@PardnerCLUTCH 7 ай бұрын

@@TheDataDaddi Thanks for the reply. No virtualization here. I'm using OpenAI Whisper as a starting point which uses PyTorch, which currently doesn't detect the P40. I'll take a look at running docker see if I get anywhere. Edit: did you have to enable "Resizable BAR" I looked in my BIOS and didn't find it (maybe this a difference between the r730 and the 7910) Edit#2: Okay found BAR settings (MMIO) but still not luck on the drivers :(

@TheDataDaddi 7 ай бұрын

Yeah let me know if docker works for you. I think it is a much better solution than VMs these days. Regarding the P40 not being detected by PyTorch, I did not have to enable "Resizable BAR" that I remember. This issue might stem from several factors, including driver compatibility, CUDA version mismatches, or PyTorch itself. Can you paste in or email me the error you are getting when the drivers fail to install? I think this would be the best place to start. Email is skingutube22@gmail.com @@PardnerCLUTCH

@PardnerCLUTCH 7 ай бұрын

@@TheDataDaddi well don't laugh... it was the power cables. I originally bough a r730-to-K80 cable (8pin PCI to 8 pin EPS). running "dmesg |grep NVRM" reviled "GPU does not have the necessary power cables connected." So I bought the cables you recommended and preso! Working GPU!

@TheDataDaddi 7 ай бұрын

Its funny because I did the same thing. It took me way longer than it should have the first time to find the right cable, and I almost shorted out my server as well. lol. Glad you got it working!@@PardnerCLUTCH

@Mark300win 9 ай бұрын

How hot are these gpus running? Would you still recommend these p40s or P100s?

@TheDataDaddi 9 ай бұрын

So they are running about 80C which is toward the top end of their operating range, but that is fairly normal under heavy loads. I also have not adjusted the default fan settings so you could likely get them running cooler. At this point, I am still recommending the p40s. However, I reserve the right to change my mind once I have a chance to fully benchmark both.

@Mark300win 9 ай бұрын

@@TheDataDaddithrottling starts at 80c not sure if there is room to fit a snail blower fan for each of these cards!

@TheDataDaddi 9 ай бұрын

You could probably fit in a small snail blower for the p40s I doubt you could fit them for the p100s as they are a bit longer. I would try adjusting the integrated fan speeds first. I will look into this and see if it is possible. @@Mark300win

@HemanthSatya-eo4rq 6 ай бұрын

Hi Sir, I'm new to this field, is it possible to use both p40 and p100 simultaneously with r720? I'm thinking of buying one p40 and p100 and use them both at the same time such that, this way I might not miss out on computing power for both training and interference. What do you suggest?

@TheDataDaddi 5 ай бұрын

Hi there. I think this is a great approach. Certainly will give you access to the best qualities of both GPUs. I support it.

@HemanthSatya-eo4rq 5 ай бұрын

@@TheDataDaddi thank you, is it the same steps you showed in this video or any changes in the steps you demonstrated?

@TheDataDaddi 5 ай бұрын

Should be exactly the same as in the video above. Just will be using one p100 and one p40.@@HemanthSatya-eo4rq

@HemanthSatya-eo4rq 5 ай бұрын

@@TheDataDaddi thank you sir🫡

@jjolleta 4 ай бұрын

I see that cable management and I kinda wonder, what about thermals....... You should find a way to make those cables even more custom for them to fit better and as in the second card below the air ducting. This is just a suggestion, hope you can manage to give those cards a better lifespan. Greetings !!!!

@TheDataDaddi 3 ай бұрын

Hi there! So I have actually found a better cabling strategy. The updated cabling should be included in the video description. Please have a look there if you are interested in going that route. Cheers!

@Anthony-c7o 3 ай бұрын

I can believe you blocked the air flow with the cables :(

@TheDataDaddi 2 ай бұрын

Hi there! Thanks for the comment. Yeah, I definitely did not love how cumbersome the cabling was. However, I have since found a better more compact option that allows for more air flow. Check out the link below: a.co/d/00JvYA0B Also, to provide some more data here. I tested the cabling setup in this video and the more compact option listed above. There was really no noticeable difference between the 2 in terms of GPU temperature when running operations. This seems to imply that first option while bulkier does not restrict airflow enough that it prevents the GPUs from cooling effectively.

@emiribrahimbegovic813 4 ай бұрын

how much electricity do you consume per month to run this setup?

@TheDataDaddi 4 ай бұрын

Hi there! Thanks for the comment. I am averaging about 206.4 kWh/month with this setup. It cost me on average about $27.08 to run this per month with electricity costs in my area. I am assuming 30 days per month here, and I would estimate I run both GPUs solid about half of the month for context.

@fusion__gaming 10 ай бұрын

Hii we talked before about gpu so did you got any gpu for my Dell R720 Server which i have i try other gpus but when i plug in for display output the screen is always blank

@千里影 9 ай бұрын

try the original output and install gpu properly in server system， check it in the idrac or lifecycle controller，

@fusion__gaming 9 ай бұрын

@@千里影 what should i check in idrac or lifecycle also If i use original output how i will get to know that GPU is installed and working properly

@千里影 9 ай бұрын

I dont have a server around me but as f as i know, all the running devices' info of the server will be on the lifecycle controller. like the harddrive array cards or anything installed on the pcie stuff. @@fusion__gaming