The Thunder Sessions | Session 14
47:33
The Thunder Sessions | Session 13
50:35
The Thunder Sessions | Session 12
46:55
Thunder Sessions | Session 11
39:38
The Thunder Sessions | Session 8
43:17
The Thunder Sessions | Session 2
34:13
The Thunder Sessions | Session 1
48:38
Пікірлер
@deependu__
@deependu__ 2 ай бұрын
It was challenging for me to follow the video, but my profiling skills have improved a lot. Thanks a lot for such valuable insights!🔥
@PyTorchLightning
@PyTorchLightning 2 ай бұрын
Way to go on following along with the sessions! So awesome to hear you have improved a lot!
@deependu__
@deependu__ 2 ай бұрын
36:11 feels good 😌 Been there, done that, and my bug was: I was going from ( i : 0->n), only if I'd done ( i: n->0)🥴
@deependu__
@deependu__ 2 ай бұрын
fantastic video. Luckily I'd explored triton before. Its awesome that using thunder, we can inspect traces, and replace with my own kernel implementation in triton. 🔥🚀
@deependu__
@deependu__ 2 ай бұрын
great video 🔥 So when we use quantization transform, while saving weights to cuda global memory we convert 16 bit float to 4 bit float, and 4 bit float is stored, and when it's loaded back, it's converted back to 16 bit float (by adding lots of 0s in the float), and loaded into the model. Save less, load less. Hope I got it correct.
@deependu__
@deependu__ 2 ай бұрын
31:42-32:55 sounds exciting. I don't have a GPU either, so I'll be able to do symbolic manipulations on my mac, and when it's time to run, I can run it on T4 GPU on lightning studio.
@deependu__
@deependu__ 2 ай бұрын
Thanks for the great session, Luca and Thomas! Most interesting part to me was: DAG creation and kernel fusion challenges in earlier frameworks. Inspired by this, I'm considering creating a small library, DeepTorch, with a couple of kernels implemented for forward and backward passes, aiming to train on a small MNIST dataset (thinking of using pybind to connect C++ with Python).
@YamenHawit
@YamenHawit 2 ай бұрын
Great Tools!
@rafsanjaniLab
@rafsanjaniLab 2 ай бұрын
Why 1 x A100 GPU is not offered?
@rashadbartholomew8427
@rashadbartholomew8427 2 ай бұрын
A100 GPU is offered, I onboarded last night and it was an option!
@PyTorchLightning
@PyTorchLightning 2 ай бұрын
Coming soon!
@AvaisAziz
@AvaisAziz 2 ай бұрын
Excellent news, now I don't need to look for cheaper options
@FabriceNormandin95
@FabriceNormandin95 3 ай бұрын
Are there any parallels to be made between Jax/XLA and this new "Thunder" thing? Or is it some kind of wrapper around torch.compile? Or something else entirely?
@thomas8014
@thomas8014 3 ай бұрын
Great question. We're contrasting the Thunder, JAX, torch.compile in Episode 3 ( kzbin.info/www/bejne/fqWvXodno9xjiNk ). The gist is that Thunder is a source-to-source compiler that takes in PyTorch models and compiles it to Python code that calls optimized functions provided as kernels (e.g. from Apex or CuDNN, written in Triton etc.) or generated on the fly (e.g. by NVFuser or torch.compile) or by user-provided kernels. We've learned an entire lot from JAX and torch.compile, and we're using torch.compile as a backend (and some people work on making it a bit of a front-end, too), but it's a bit of a different approach. Or three priorities are usability, understandability, and extensibility, and we believe we have a good proposition for them, aside from enabling great perf, of course.
@FabriceNormandin95
@FabriceNormandin95 3 ай бұрын
Thanks @thomas8014 for the explanation. Sounds like a very interesting project!
@SaitoHiraga-e5s
@SaitoHiraga-e5s 3 ай бұрын
Hello, I'm planning on getting the pro version in lightning ai for NLP model training, Is there a storage path in lightning ai for saving/downloading the trained model?? Please reply
@PyTorchLightning
@PyTorchLightning 3 ай бұрын
Glad you are enjoying using Lightning AI. The short answer is Yes! Get more info on storage here: lightning.ai/docs/overview/studios/drive And join our Discord to connect and get answers to future questions!
@SaitoHiraga-e5s
@SaitoHiraga-e5s 3 ай бұрын
@@PyTorchLightning is the drive only available for teamspace ?
@PyTorchLightning
@PyTorchLightning 3 ай бұрын
@@SaitoHiraga-e5s The drive is accessible from any studio within a teamspace!
@SaitoHiraga-e5s
@SaitoHiraga-e5s 3 ай бұрын
@@PyTorchLightning hello i was gonna save my model but it shows error PermissionError: [Errno 13] Permission denied: '/teamspace/config.json' Please help me resolve this.
@PyTorchLightning
@PyTorchLightning 3 ай бұрын
​@@SaitoHiraga-e5s Yes, let us help you resolve this! Follow the link above to join our Discord and connect with us directly!
@ammararazzaq132
@ammararazzaq132 3 ай бұрын
What is studio? It is mentioned in litserve pricing details that it is free for four hours and after that it incurs charges, but those charges are not mentioned. Also, can I just use litserve and not studio?
@PyTorchLightning
@PyTorchLightning 3 ай бұрын
Thank you for the questions! Lightning AI Studio is an all-in-one platform for AI development. It allows users to code together, prototype, train, scale, and serve from the browser- with zero setup. Visit Lightning.ai to start using Studio for free and learn more. LitServe allows you to easily serve AI models. LitServe can be used with Lightning AI Studios for fully managed hosting. It can also be self-hosted on your own machines, completely independent of using Studio.
@osamansr5281
@osamansr5281 4 ай бұрын
[2:08] the program freezes when I set setting sync_dist=True. Idk why? but can I stop the warning from being logged every time somehow? maybe warn once in the beginning (first log) then stop it
@user-wr4yl7tx3w
@user-wr4yl7tx3w 4 ай бұрын
i think some narration would have been preferred over music. just like to try to get up to speed as easily as possible, so i can start using it regularly.
@tturbo3981
@tturbo3981 4 ай бұрын
**2024 Update:** With the new `import lightning.pytorch as pl` API, it looks something like this: tuner = pl.tuner.Tuner(trainer) lr_finder = tuner.lr_find(model, train_dataloader) print(lr_finder.suggestion()) lr_finder.plot(suggest=True)
@pubg-hf2np
@pubg-hf2np 4 ай бұрын
I hope to know why when I install in the studio docker images and containers or when I pip install any libraries then the studio goes to sleep all is gone when the studio is reactivated again .. I really need to know where the issue is
@PyTorchLightning
@PyTorchLightning 4 ай бұрын
Join our Discord and post in the "Studio-Help" channel and we will be happy to assist you! discord.com/invite/MWAEvnC5fU
@pubg-hf2np
@pubg-hf2np 4 ай бұрын
@@PyTorchLightning okay thanks
@I_hate_charlatans_like_mkbhd
@I_hate_charlatans_like_mkbhd 4 ай бұрын
remove the thumbnails in the last few seconds of the video as you are still explaining something with a slide. the thumbnails are nuisance
@I_hate_charlatans_like_mkbhd
@I_hate_charlatans_like_mkbhd 4 ай бұрын
would have been a nice video. the colored tophead is annoying with his irritating haircolor.
@carneica
@carneica 4 ай бұрын
for those who use windows, the equivalence between Mac OS and Win Command Prompt: mkdir <dir> mkdir <dir> or md <dir> cd <dir> cd <dir> ls dir open . start . clear cls touch <file> type nul > <file> mv <src> <dest> move <src> <dest> cp <src> <dest> copy <src> <dest> rm <file> del <file> in addition these useful ones: Delete Directory rm -r <dir> rmdir /s <dir> View File Contents cat <file> type <file>
@carneica
@carneica 4 ай бұрын
quite simple but educational and cool videos... I knew most of these things but it's always clever to "learn the basics" from "the experts" as I just code python as a tool for my main job. I always learn something new with these... ;) The Github videos were specially cool... as that was a black box to me until now! I'm not sure I need to use it as I just create small projects (I call it programs lol) and I keep my versions the old fashioned way (v1, v1.1, v2, etc...) but, one of these days I'll create an account and test the version ctrl over there. The "sharing code" concept is quite appealing, even for a non pro like me! ;)
@PyTorchLightning
@PyTorchLightning 4 ай бұрын
Thanks for watching!
@akshay_pachaar
@akshay_pachaar 5 ай бұрын
What Luca did at kzbin.info/www/bejne/ZpzYqmlpqpaekJo was so much fun!
@vmguerra
@vmguerra 5 ай бұрын
Nice intro to Thunder and DL compilers in general
@King_Muktar
@King_Muktar 5 ай бұрын
Thank you For This 🤗🤗
@andrei_aksionau
@andrei_aksionau 5 ай бұрын
Thanks for the video. I like a combination of drawings (nice ones, btw) and live coding. But I'm slightly confused about what exactly Thunder does. It transforms PyTorch code into its own intermediate representation (IR), then, if NVFuser is selected, transforms it into a "fusion specification", which NVFuser takes as an input and can do its part of the job. I hope I get it right. But what does Thunder do when torch.compile is selected? Just applies it to Thunder's IR or also does something specific? And what is happening when both torch.compile and NVFuser are selected? It will be nice if in the next episode it is explained with drawings, otherwise it's a bit difficult (at least for me) to figure out what goes where. I really want to know how "sausages are made" :)
@lucaantiga3941
@lucaantiga3941 5 ай бұрын
Great questions Andrei. We’ll make sure we cover them in the next episode. Spoiler: while in the case of NVFuser we take out IR (which is a subset of Python) and build a NVFuser input graph using its API, in the case of torch.compile we can take the IR, which is Python callable and pass it (all or in part) to torch.compile directly. See you next week!
@andrei_aksionau
@andrei_aksionau 5 ай бұрын
Great introductionary video for a such a complex topic. Looking forward to a one about distributed.
@YokoSakh
@YokoSakh 6 ай бұрын
Cool. I hope you’ll continue doing that lives.
@lucaantiga3941
@lucaantiga3941 6 ай бұрын
Thank you! Yes we will, see you next Friday!
@Lily-wr1nw
@Lily-wr1nw 6 ай бұрын
Is there a template for comfyui?
@PyTorchLightning
@PyTorchLightning 6 ай бұрын
Yes! We do have templates using comfyui and more templates being added regularly.
@Lily-wr1nw
@Lily-wr1nw 6 ай бұрын
@@PyTorchLightning can you please link one in this post. It will be really helpful
@PyTorchLightning
@PyTorchLightning 6 ай бұрын
@@Lily-wr1nw Visit Lightning.ai to browse the studio templates available! Here's a link to one to get you started: lightning.ai/mpilosov/studios/stable-diffusion-with-comfyui
@pedrogorilla483
@pedrogorilla483 6 ай бұрын
I’ve been trying to understand the stable diffusion unet in detail for a while. This video added a few pieces of information I was missing from other material. Thanks!
@kimomoh5439
@kimomoh5439 6 ай бұрын
I hope you solve this problem in PyTorch Lightning: RuntimeWarning: os.fork() was called. os.fork() is incompatible with multithreaded code, and JAX is multithreaded, so this will likely lead to a deadlock. self.pid = os.fork()
@PyTorchLightning
@PyTorchLightning 6 ай бұрын
We are always working to alleviate problems people have while training. Join our discord to join the discussion and connect with a wide variety of experts in all things ML:. discord.com/invite/MWAEvnC5fU
@clippadas
@clippadas 6 ай бұрын
Meu sonho, queria ter o modo 8 GPU desbloqueado pra mim usar um script para recuperar minha senha mais eu sou pobre
@samarbhosale8310
@samarbhosale8310 6 ай бұрын
I want to train for detecting text similarity for 2 questions between 0 and 1....my dataset is unlabelled how should i proceed can you guide.
@PyTorchLightning
@PyTorchLightning 6 ай бұрын
Good question! Join our discord and get advice from a wide variety of experts in all things ML, including a special channel dedicated to this course. lnkd.in/g63PCKBN
@osamansr5281
@osamansr5281 6 ай бұрын
if overfit_batches uses the same batches for training and validation, shouldn't the validation loss == the training loss ?? I see the training loss getting reduced but the validation loss is increasing !! 😳
@osamansr5281
@osamansr5281 6 ай бұрын
I have a guess, but I'd appreciate some confirmation, that overfit_batches doesn't use the same batch in training and validation BUT the same batch count! so if the DataModule provides val_dataloader and train_dataloader they are going to be called and the same batch count is going to be sampled from both.
@PyTorchLightning
@PyTorchLightning 6 ай бұрын
@@osamansr5281 The answer you arrived at is correct. :) Join the Lightning AI Discord for continued discussion with the ML community: discord.gg/zYcT6Yk9kw
@not_a_human_being
@not_a_human_being 6 ай бұрын
omg this is horrible
@osamansr5281
@osamansr5281 7 ай бұрын
did I misunderstand something or the graph presented in the over-fitting section of the video from [0:22] to [1:00] is mislabeled🧐 over-fitting occurs when the train accuracy *RED* increases while the test accuracy *BLUE* decreases, correct? 🤔 aren't the colors are swapped! btw, thanks for the amazing tutorials and special thanks for updating them <3
@SebastianRaschka
@SebastianRaschka 7 ай бұрын
Good question. I think the your question arises because this shows the training and test accuracy in a slightly different context. Here, we are looking at the performance for different portions of the dataset. The overall idea is still true: the larger the gap the bigger the degree of overfitting. But the reason why you are seeing the training accuracy go down is that with more data, it becomes harder to memorize (because there's simply more data to memorize). And if there is more data (and it's harder to memorize), it becomes easier to generalize (hence the test accuracy goes up)
@saikatnextd
@saikatnextd 7 ай бұрын
Love it thanks a lot Linus //
@MrPaPaYa86
@MrPaPaYa86 7 ай бұрын
This was very clear and informative
@benc7910
@benc7910 7 ай бұрын
my plot_loss_and_acc(): def plot_loss_and_acc(log_dir) -> None: import pandas as pd import matplotlib.pyplot as plt metrics = pd.read_csv(f"{log_dir}/metrics.csv") # Group metrics by epoch and calculate mean for each metric df_metrics = metrics.groupby("epoch").mean() # Add epoch as a column df_metrics["epoch"] = df_metrics.index # Index is the grouping key (epoch) print(df_metrics.head(10)) df_metrics[["train_loss", "val_loss"]].plot( grid=True, legend=True, xlabel="Epoch", ylabel="Loss", title="Loss Curve" ) df_metrics[["train_acc_epoch", "val_acc_epoch"]].plot( grid=True, legend=True, xlabel="Epoch", ylabel="ACC", title="Accuracy" ) plt.show() plot_loss_and_acc(trainer.logger.log_dir)
@JikeWimblik
@JikeWimblik 7 ай бұрын
Couldn't you use 8bit precision during training by using double weights hence enabling more error tolerance and hence more speed up options.
@River-xd8sk
@River-xd8sk 8 ай бұрын
You just lost $510 because im not waiting 2 to 3 days to have my email "verified".
@NeoZondix
@NeoZondix 8 ай бұрын
Thanks
@kevinsasso1405
@kevinsasso1405 8 ай бұрын
why are you blinking like that are you ok
@AbhishekBade1310
@AbhishekBade1310 8 ай бұрын
X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.15, random_state=1, stratify=y) X_train, X_val, y_train, y_val = train_test_split( X_train, y_train, test_size=0.1, random_state=1, stratify=y_train) for this line of code im getting this error ValueError: The least populated class in y has only 1 member, which is too few. The minimum number of groups for any class cannot be less than 2. is there anything i can do to fix this?
@PyTorchLightning
@PyTorchLightning 8 ай бұрын
Perhaps there is an issue in the data not getting loaded correctly and so there's a truncated dataset, which could cause that issue. If you open an issue on our course GitHub, we could help you debug and get to the bottom of it: github.com/Lightning-AI/dl-fundamentals/issues
@user-wr4yl7tx3w
@user-wr4yl7tx3w 9 ай бұрын
i think we need more tutorial videos on lightning studio
@PyTorchLightning
@PyTorchLightning 9 ай бұрын
Thanks for the feedback! Check out Lightning's founder William Falcon's youtube channel for more videos featuring Lightning Studio: www.youtube.com/@WilliamAFalcon
@prathameshdinkar2966
@prathameshdinkar2966 9 ай бұрын
That is quite a unique and nice functionality! I faced the issues of OOM at higher batch sizes, and I think this is a good solution to it! Keep the good work going 😁
@pal999
@pal999 9 ай бұрын
Isn't MLE used in Logistic regression and not Gradient descent?
@SebastianRaschka
@SebastianRaschka 9 ай бұрын
Hi there, so in this example, we perform maximum likelihood estimation (MLE) using gradient descent
@IvarGarnes-o3o
@IvarGarnes-o3o 9 ай бұрын
Sebastian, I have recvently started to watch your videos on AI. I find the material relatively easy to follow and very interesting. I do have a question related to section 3.6. In the conde we are looping over the minibatch 'for batch_idx, (features, class_labels) in enumerate(train_loader):'. At first I thought I understood this, but when I inserted a line in the code to print out the class_labels, I expected that the output on every second minibatch to be the same. However, they are not. Does this mean the every time we are running the line - for batch_idx, (features, class_labels) in enumerate(train_loader): - the date in being shuffled?? Ivar
@SebastianRaschka
@SebastianRaschka 9 ай бұрын
Hi there. Yes the data is being shuffled via the data loader. This is usually recommended -- I have done experiments many years ago with and without shuffling, and neural networks learn better if they see the data in a different order in each epoch. You can turn off the shuffling though via `shuffle=False` if you want in the data loader if you want (in the code here it's set to shuffle=True)
@IvarGarnes-o3o
@IvarGarnes-o3o 9 ай бұрын
Sebastian, I have recvently started to watch your videos on AI. I find the material relatively easy to follow and very interesting. I do have a question related to section 3.6. In the conde we are looping over the minibatch 'for batch_idx, (features, class_labels) in enumerate(train_loader):'. At first I thought I understood this, but when I inserted a line in the code to print out the class_labels, I expected that the output on every second minibatch to be the same. However, they are not. Does this mean the every time we are running the line - for batch_idx, (features, class_labels) in enumerate(train_loader): - the date in being shuffled?? Ivar
@Prithviization
@Prithviization 9 ай бұрын
HATE PYTORCH LIGHTNING