Sharcnet HPC

Sharcnet HPC

This channel provides public access to recordings of SHARCNET training events. These recordings are part of a presentation series organized by SHARCNET staff. For more information about SHARCNET events visit our calendar at www.sharcnet.ca/my/news/calendar.

SHARCNET is a consortium of 19 Canadian academic institutions who share a network of high performance computers. With this infrastructure we enable world-class academic research. We aim to:

accelerate computational academic research,
attract the best students and faculty to our partner institutions by providing cutting edge expertise and hardware,
and link academic researchers with corporate partners in a search for new business opportunities

SHARCNET is a partner organization of Compute Ontario and Digital Research Alliance of Canada national advanced research computing platform.

Unlocking the Power of Comet: Streamlining Machine Learning Experimentation

41:04

Unlocking the Power of Comet: Streamlining Machine Learning Experimentation

Ай бұрын

Data Wrangling with Tidyverse (part 3)

1:02:07

Data Wrangling with Tidyverse (part 3)

Ай бұрын

Causal Inference using Probabilistic Variational Causal Effect in Observational Studies

43:54

Causal Inference using Probabilistic Variational Causal Effect in Observational Studies

2 ай бұрын

Survival guide for the upcoming GPU upgrades (more total power, but fewer GPUs)

50:07

Survival guide for the upcoming GPU upgrades (more total power, but fewer GPUs)

2 ай бұрын

Git Part 3: Managing Workflows

44:42

Git Part 3: Managing Workflows

2 ай бұрын

Graham update

34:14

3 ай бұрын

Parallel Programming: MPI I/O Basics

46:48

Parallel Programming: MPI I/O Basics

3 ай бұрын

Introspection for Jobs: in-job monitoring of performance

49:02

Introspection for Jobs: in-job monitoring of performance

3 ай бұрын

Multidimensional Arrays in C++

52:56

Multidimensional Arrays in C++

4 ай бұрын

Debugging and Optimization of PyTorch Models

43:30

Debugging and Optimization of PyTorch Models

4 ай бұрын

Using machine learning to predict rare events

47:34

Using machine learning to predict rare events

5 ай бұрын

Diagnosing Wasted Resources from User Facing Portals on the National Clusters

40:54

Diagnosing Wasted Resources from User Facing Portals on the National Clusters

5 ай бұрын

The Emergence of WebAssembly (Wasm) in Scientific Computing

54:07

The Emergence of WebAssembly (Wasm) in Scientific Computing

5 ай бұрын

Exploring Compute Usage from User Facing Portals on the National Clusters

53:08

Exploring Compute Usage from User Facing Portals on the National Clusters

6 ай бұрын

Compute Ontario Summer School 2024

21:42

Compute Ontario Summer School 2024

8 ай бұрын

Data Wrangling with Tidyverse (part 2)

1:04:42

Data Wrangling with Tidyverse (part 2)

9 ай бұрын

Accelerating data analytics with RAPIDS cuDF

25:31

Accelerating data analytics with RAPIDS cuDF

9 ай бұрын

Accelerating Graph Analysis on GPUs

59:04

Accelerating Graph Analysis on GPUs

10 ай бұрын

Make: a declarative, lazy, parallel workload manager. Elegant or obsolete?

38:59

Make: a declarative, lazy, parallel workload manager. Elegant or obsolete?

10 ай бұрын

Debugging your code with DDT

51:21

Debugging your code with DDT

11 ай бұрын

MySQL Part 3: Constraints and Joins

56:16

MySQL Part 3: Constraints and Joins

11 ай бұрын

Introduction to GPU programming with OpenMP

39:09

Introduction to GPU programming with OpenMP

Жыл бұрын

False Sharing and Contention in Parallel Codes

40:03

False Sharing and Contention in Parallel Codes

Жыл бұрын

Skorch: Training PyTorch models with scikit-learn

41:52

Skorch: Training PyTorch models with scikit-learn

Жыл бұрын

Squeeze more juice out of a single GPU in deep learning

48:49

Squeeze more juice out of a single GPU in deep learning

Жыл бұрын

Generalized End to End Python and Neuroscience Workflows on a Compute Cluster

41:06

Generalized End to End Python and Neuroscience Workflows on a Compute Cluster

Жыл бұрын

p2rng - A C++ Parallel Random Number Generator Library for the Masses

46:41

p2rng - A C++ Parallel Random Number Generator Library for the Masses

Жыл бұрын

Exploring job wait times on Alliance compute clusters: a holistic view

50:21

Exploring job wait times on Alliance compute clusters: a holistic view

Жыл бұрын

Data Wrangling with Tidyverse (part 1)

56:10

Data Wrangling with Tidyverse (part 1)

Жыл бұрын

Пікірлер

@FraserTajima Ай бұрын

Isn’t the call at 16:27 “call saxpykernel <<<grid,block>>>…” not call saxpy?

@AmirHosseinSojoodi

@AmirHosseinSojoodi 2 ай бұрын

A very important issue that MIG introduces to the multi-GPU systems, is the fact that all types of peer-to-peer accesses would be disabled and the communication time across inter-node GPUs would drop dramatically. This will result in an environment in which CUDA-Aware MPI and UCX cannot utilize NVLinks. How will you guys handle this issue? Would users be able to enable MIG on demand? (it usually requires sudo access) On the other hand, enabling MIG statically on even a small subset of the GPUs would sacrifice more performance from the cluster's potential peak, because all of the inter-GPU NVLinks would be useless until MIG is disabled again. How is this going to work?

@markhahn0 2 ай бұрын

Can you describe the scenario in which it would make sense for a job to use multiple MIGs like that? Our main motive with MIGs is to address the extremely common problem of jobs that underutilize a whole GPU. Maybe the short answer is: we expect to have whole GPUs available as well as MIGs. Whole GPUs make sense for jobs that can use them effectively, of course - including across nodes. (I am Sharcnet/DRAC staff.)

@SHARCNET_HPC 2 ай бұрын

I suspect MPS is the answer, but this needs to be tested.

@AmirHosseinSojoodi

@AmirHosseinSojoodi 2 ай бұрын

@markhahn0 Well, that's the point. There are virtually no reasonable scenarios in which it would make sense for an MPI job to use multiple MIGs, unless the code is extremely and embarrassingly parallel with little to no communication across the GPUs. That's why most of the users, who knows what they are doing, won't use the MIG-enabled GPUs for their MPI/Multi-GPU jobs, making the race to acquire non-MIG GPUs even more competitive than before. This results in longer waiting times than what you already expect. I get your motivation and it's absolutely important to take action. However, I suspect using MIG for a big chunk of nodes in an HPC cluster is an efficient solution to this problem, and it may backfire. There are some considerations/suggestions that come to my mind about this: 1. Enable MIG only on single GPU nodes (a portion of them, if there are any) 2. Enable dynamic MIG configuration - preferably by slurm itself, and ambitiously, depending on the jobs and their requirements 3. With number two, then maybe you can enable MIG on multi-GPU nodes, if there are enough requests for them. 4. Do what you are doing: instruct users when and why to use MIG, and help them write better code. @SHARCNET_HPC Yeah, MPS is a better choice I guess, and users should be encouraged to use it. I wrote a couple personal posts about MIG and MPS-on-top-of-MIG that might be helpful. amirsojoodi.github.io/posts/Enabling-MPS/ amirsojoodi.github.io/posts/MIG/ amirsojoodi.github.io/posts/MPS+MIG/ P.s. Regarding MIG and its related issues, NVIDIA said they might relax the constraints in the future, but who knows when. Read here: docs.nvidia.com/datacenter/tesla/mig-user-guide/index.html#application-considerations and here: docs.nvidia.com/datacenter/tesla/mig-user-guide/index.html#cuda-device-enumeration

@AmirHosseinSojoodi

@AmirHosseinSojoodi 2 ай бұрын

15:21 MPS and Hyper-Q are separate, but related concepts. Hyper-Q is basically a hardware feature that is available on GPUs since compute capability 3.5. While MPS is a software solution that utilizes this hardware feature to enable multiple processes sharing the GPU more efficiently.

@SHARCNET_HPC 2 ай бұрын

You are right: "MPS enables cooperative multi-process CUDA applications, typically MPI jobs, to utilize Hyper-Q capabilities on the NVIDIA GPUs."

@DaisyKeith-f4i

@DaisyKeith-f4i 4 ай бұрын

5736 Stark Trace

@zappandy 5 ай бұрын

Thank you so much! I know this is pretty straightforward stuff, but 54:33 really helped me out fix an issue with containerized experiments.

@karlkrasnowsky1393

@karlkrasnowsky1393 5 ай бұрын

DBInterface is not an interface... I was wondering how you might create an interface in javascript since it's not natively supported. Typescript would be a better choice for this if you really wanted this, but as this is, this "interface" contains implementation, something an interface is not, by definition, supposed to do.

@karlkrasnowsky1393

@karlkrasnowsky1393 5 ай бұрын

Where's the github you mention would be available?

@yusufkemahl5943

@yusufkemahl5943 6 ай бұрын

how can i install these GPU Utilization packages(GPU dashboard)?

@FerPerez-mc3wr

@FerPerez-mc3wr 6 ай бұрын

Obrigada pelo vídeo incrível! 🇧🇷

@idreeskhan-zp5ey

@idreeskhan-zp5ey 7 ай бұрын

Nice explanation. I converted my 2D parallelized fortran code to 3D and now it gives me NaN, any suggestions please?

@jurgendevlieghere9081

@jurgendevlieghere9081 7 ай бұрын

thanks. One of the better videos on collaborative groups.👌

@richardbennett4365

@richardbennett4365 8 ай бұрын

Be like hay.

@richardbennett4365

@richardbennett4365 8 ай бұрын

Oh, my goodness.

@richardbennett4365

@richardbennett4365 8 ай бұрын

😮 It must not be the right way to do it, because cythin should be at least 40 times faster. He should probably get out of the juoyter notebook and get better, more reasonable measurements.

@enriqueaguilarolivares9560

@enriqueaguilarolivares9560 10 ай бұрын

Is there cuda fortran for windows?

@muhammadfaridkhandaq4379

@muhammadfaridkhandaq4379 8 ай бұрын

what do you mean? The compiler? yes

@enriqueaguilarolivares9560

@enriqueaguilarolivares9560 8 ай бұрын

@@muhammadfaridkhandaq4379 thanks

@Sandeep_Sulakhe

@Sandeep_Sulakhe 10 ай бұрын

This was nice. Thank you.

@tihihoang 10 ай бұрын

hi guys, in 57:11, after "ssh -A -J narval nc10201", do you know why I still got required the password, and then got this error: "Received disconnect from UNKNOWN port 65535:2: Too many authentication failures Disconnected from UNKNOWN port 65535" ? Thank you in advance

@mindlessgreen 11 ай бұрын

Is it possible to automatically get compute node id and connect to it?

@captaindunsell8568

@captaindunsell8568 Жыл бұрын

Not necessarily new … we did this with the mainframe and the Cray Vector processors … GPUs are great at vector math and solving simultaneous equations

@Leonid.Shamis Жыл бұрын

Excellent talk - great content, very informative, very useful! Thank you.

@camaycama7479 Жыл бұрын

Fantastic presentation, thank you so much! What about xperf on windows environment? Still relevant?

@AdrianGonzalez-ii7jb

@AdrianGonzalez-ii7jb Жыл бұрын

Amazing job!

@akbarravan5604

@akbarravan5604 Жыл бұрын

Thanks for sharing useful contents. It would have been better if you had named it "faculty member edition", cause it is specifically for supervisors.

@zgrunschlag Жыл бұрын

This deserves 10K thumbs up, not just 39!!!

@Hut-il6oz Жыл бұрын

Nice presentation and useful instructions. Thanks a lot!

@pietraderdetective8953

@pietraderdetective8953 Жыл бұрын

Your cython explanation and tutorial is the most crystal clear I've heard...and I have listened to > 10 videos! Excellent content! Thank you for this, it makes things less painful ❤

@karthikmr416 Жыл бұрын

In the code mentioned at 22:50, the for loop initialisation condition must be g.size()/2 right?. To accomodate any thread group size.

@Eduardo-Quantum

@Eduardo-Quantum Жыл бұрын

Thank you for the presentation 😊

@AmanSingh-fl3wp

@AmanSingh-fl3wp Жыл бұрын

thanks for the video. Its really helpful I had a doubt, so say you want to train an xgboost model on top of a 200GB dataset and I have a VM with some gpus having a total combined memory of ~100GB (GPU memory + VM memory). Will I be able to train the model successfully on that using LocalCUDACluster .

@oguzhanakkurt3947

@oguzhanakkurt3947 7 ай бұрын

did you try?

@popov654 Жыл бұрын

Great tutorial. May I ask you a question? If I have to use an old version of Node for my project that does not support async/await but supports promises, how can I rewrite the "for" loop on 19:16? I need to sustain the ordering of the promises, which means that just start them in loop with "then()" attached to every one of them is not a solution. Also want to ask you - in one of my old web apps I used an inline <script> block in into which I uploaded a file or some web form for new item creation, that script was something like "if (window.parent) { window.parent.doStuff(data) }". It worked fine even in IE6+, but is it a clean approach?

@GeorgeLi-j5r Жыл бұрын

Great video, helped connect to UMIACS HPC

@dhimanbhowmick9558

@dhimanbhowmick9558 Жыл бұрын

Thanks a lot. It is a very well structured nice video. Without examples it is very hard to learn programming. Thanks a lot. 😄

@farzaddizaji7002

@farzaddizaji7002 Жыл бұрын

It was really a nice demonstration for CUDA. Thanks

@SHARCNET_HPC Жыл бұрын

Glad you liked it!

@sebastiangomez-wu9gh

@sebastiangomez-wu9gh Жыл бұрын

Very useful !! specially the explanation of the color coding

@SHARCNET_HPC Жыл бұрын

Glad it was helpful!

@KuoZhang-v9b Жыл бұрын

is the source code available for the debugger part?

@dakezhang2845 Жыл бұрын

Any idea of when H100 will be available on Graham?

@pawelpomorski4025

@pawelpomorski4025 Жыл бұрын

We plan to replace graham with a new cluster in 2024 (if funding comes through), and this new cluster may contain H100 GPUs. Before then we may get a small number of H100 GPUs for testing purposes.

@dakezhang2845 Жыл бұрын

@@pawelpomorski4025 I'm interested in testing H100 GPUs from a researcher's perspective when they are available. 😁

@pawelpomorski4025

@pawelpomorski4025 Жыл бұрын

@@dakezhang2845 There is a lot of interest. When we get some H100 cards for testing, we will advertise it, probably in our monthly newsletter which all users receive.

@ameerracle Жыл бұрын

@@pawelpomorski4025 What happens to the old cluster? Can research groups purchase the old cores or GPUs?

@pawelpomorski4025

@pawelpomorski4025 Жыл бұрын

@@ameerracle The cluster hardware is owned by the university, which decides how to dispose of it if we (SHARCNET) retire it. Anyway, it looks like graham will be running with the current hardware at least until early 2025.

@favourkachi2178

@favourkachi2178 Жыл бұрын

how to profile in production using scalene pls

@favourkachi2178

@favourkachi2178 Жыл бұрын

i need the steps to take to get this done

@ranchuhead4547

@ranchuhead4547 Жыл бұрын

Do you think the AI programmer community will support/adopt RocM and to an extent AMD DataCenter GPUs?

@pawelpomorski4025

@pawelpomorski4025 Жыл бұрын

RocM uses HIP which is very similar to CUDA, so any program written in CUDA can be ported to AMD (assuming it does not use any advanced features which are exclusive to NVIDIA GPUs). As for data centres, given the high demand and expense of NVIDIA GPUs, more AMD data center GPUs may be adopted.

@rupjyoti2304 Жыл бұрын

Can I use this code?

@fatemehn.nokabadi6232

@fatemehn.nokabadi6232 Жыл бұрын

Thank you for the great explanation 🙏

@SHARCNET_HPC Жыл бұрын

My pleasure!

@pauljohn32 Жыл бұрын

Well done. I hope you'll make a follow up about a larger python program, with classes, various files, etc. The challenge I find is that a profiler is not so helpful once the work spreads out across components. I need something more specific to say "how can this function call be more efficient"?

@sigururbaldursson8118

@sigururbaldursson8118 Жыл бұрын

24:38 How are you configuring Jupyterhub like this in the background connecting to these cluster resources. Is there documentation on this available or open source github that I can take a look at?

@krgrief Жыл бұрын

awesome. thanks man

@SHARCNET_HPC Жыл бұрын

Glad it helped!

@paulosergioschlogl9550

@paulosergioschlogl9550 2 жыл бұрын

'/^seq2$;/

@paulosergioschlogl9550

@paulosergioschlogl9550 2 жыл бұрын

awk '{print $1, $3, $5} filename

@abdelra7man87 2 жыл бұрын

Thanks a lot for sharing that. It helped me connect UGent's HPC

@SHARCNET_HPC Жыл бұрын

Glad it helped

@GarimaKumari-sl2gq

@GarimaKumari-sl2gq 2 жыл бұрын

How to overcome with this issue:-RuntimeError: NCCL error in: /opt/conda/conda-bld/pytorch_1670525552843/work/torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:1269, i nternal error, NCCL version 2.14.3 ncclInternalError: Internal check failed. Last error: Duplicate GPU detected : rank 0 and rank 1 both on CUDA device 1a000

@mnk_navin Жыл бұрын

same error bro, did u rectified it?

@AmirHosseinSojoodi

@AmirHosseinSojoodi 2 жыл бұрын

Thanks for the presentation. Please make MIG available for everyone

@pawelpomorski4025

@pawelpomorski4025 2 жыл бұрын

If you are interested in trying our experimental MIG setup on narval, please submit a ticket with a request. We might be able to give you some access in January.

@sinedeiras 2 жыл бұрын

Dear Ge, thanks for the video. Dou you know if its possible to debug in many nodes ?

@userjjb 2 жыл бұрын

The first example shown will not use multiple GPUs, only one. You can see at 27:36 that in the code (it would have been nice to have line numbers) [device = 'cuda:0'...] meaning it only uses the first GPU. I believe this should be [device ='cuda:0,1'...]

@lordnaive 2 жыл бұрын

u are correct

@kavibharathi1547

@kavibharathi1547 11 ай бұрын

Can you tell exactly how to use or provide syntax to utilize multi gpu