It was challenging for me to follow the video, but my profiling skills have improved a lot. Thanks a lot for such valuable insights!🔥
@PyTorchLightning2 ай бұрын
Way to go on following along with the sessions! So awesome to hear you have improved a lot!
@deependu__2 ай бұрын
36:11 feels good 😌 Been there, done that, and my bug was: I was going from ( i : 0->n), only if I'd done ( i: n->0)🥴
@deependu__2 ай бұрын
fantastic video. Luckily I'd explored triton before. Its awesome that using thunder, we can inspect traces, and replace with my own kernel implementation in triton. 🔥🚀
@deependu__2 ай бұрын
great video 🔥 So when we use quantization transform, while saving weights to cuda global memory we convert 16 bit float to 4 bit float, and 4 bit float is stored, and when it's loaded back, it's converted back to 16 bit float (by adding lots of 0s in the float), and loaded into the model. Save less, load less. Hope I got it correct.
@deependu__2 ай бұрын
31:42-32:55 sounds exciting. I don't have a GPU either, so I'll be able to do symbolic manipulations on my mac, and when it's time to run, I can run it on T4 GPU on lightning studio.
@deependu__2 ай бұрын
Thanks for the great session, Luca and Thomas! Most interesting part to me was: DAG creation and kernel fusion challenges in earlier frameworks. Inspired by this, I'm considering creating a small library, DeepTorch, with a couple of kernels implemented for forward and backward passes, aiming to train on a small MNIST dataset (thinking of using pybind to connect C++ with Python).
@YamenHawit2 ай бұрын
Great Tools!
@rafsanjaniLab2 ай бұрын
Why 1 x A100 GPU is not offered?
@rashadbartholomew84272 ай бұрын
A100 GPU is offered, I onboarded last night and it was an option!
@PyTorchLightning2 ай бұрын
Coming soon!
@AvaisAziz2 ай бұрын
Excellent news, now I don't need to look for cheaper options
@FabriceNormandin953 ай бұрын
Are there any parallels to be made between Jax/XLA and this new "Thunder" thing? Or is it some kind of wrapper around torch.compile? Or something else entirely?
@thomas80143 ай бұрын
Great question. We're contrasting the Thunder, JAX, torch.compile in Episode 3 ( kzbin.info/www/bejne/fqWvXodno9xjiNk ). The gist is that Thunder is a source-to-source compiler that takes in PyTorch models and compiles it to Python code that calls optimized functions provided as kernels (e.g. from Apex or CuDNN, written in Triton etc.) or generated on the fly (e.g. by NVFuser or torch.compile) or by user-provided kernels. We've learned an entire lot from JAX and torch.compile, and we're using torch.compile as a backend (and some people work on making it a bit of a front-end, too), but it's a bit of a different approach. Or three priorities are usability, understandability, and extensibility, and we believe we have a good proposition for them, aside from enabling great perf, of course.
@FabriceNormandin953 ай бұрын
Thanks @thomas8014 for the explanation. Sounds like a very interesting project!
@SaitoHiraga-e5s3 ай бұрын
Hello, I'm planning on getting the pro version in lightning ai for NLP model training, Is there a storage path in lightning ai for saving/downloading the trained model?? Please reply
@PyTorchLightning3 ай бұрын
Glad you are enjoying using Lightning AI. The short answer is Yes! Get more info on storage here: lightning.ai/docs/overview/studios/drive And join our Discord to connect and get answers to future questions!
@SaitoHiraga-e5s3 ай бұрын
@@PyTorchLightning is the drive only available for teamspace ?
@PyTorchLightning3 ай бұрын
@@SaitoHiraga-e5s The drive is accessible from any studio within a teamspace!
@SaitoHiraga-e5s3 ай бұрын
@@PyTorchLightning hello i was gonna save my model but it shows error PermissionError: [Errno 13] Permission denied: '/teamspace/config.json' Please help me resolve this.
@PyTorchLightning3 ай бұрын
@@SaitoHiraga-e5s Yes, let us help you resolve this! Follow the link above to join our Discord and connect with us directly!
@ammararazzaq1323 ай бұрын
What is studio? It is mentioned in litserve pricing details that it is free for four hours and after that it incurs charges, but those charges are not mentioned. Also, can I just use litserve and not studio?
@PyTorchLightning3 ай бұрын
Thank you for the questions! Lightning AI Studio is an all-in-one platform for AI development. It allows users to code together, prototype, train, scale, and serve from the browser- with zero setup. Visit Lightning.ai to start using Studio for free and learn more. LitServe allows you to easily serve AI models. LitServe can be used with Lightning AI Studios for fully managed hosting. It can also be self-hosted on your own machines, completely independent of using Studio.
@osamansr52814 ай бұрын
[2:08] the program freezes when I set setting sync_dist=True. Idk why? but can I stop the warning from being logged every time somehow? maybe warn once in the beginning (first log) then stop it
@user-wr4yl7tx3w4 ай бұрын
i think some narration would have been preferred over music. just like to try to get up to speed as easily as possible, so i can start using it regularly.
@tturbo39814 ай бұрын
**2024 Update:** With the new `import lightning.pytorch as pl` API, it looks something like this: tuner = pl.tuner.Tuner(trainer) lr_finder = tuner.lr_find(model, train_dataloader) print(lr_finder.suggestion()) lr_finder.plot(suggest=True)
@pubg-hf2np4 ай бұрын
I hope to know why when I install in the studio docker images and containers or when I pip install any libraries then the studio goes to sleep all is gone when the studio is reactivated again .. I really need to know where the issue is
@PyTorchLightning4 ай бұрын
Join our Discord and post in the "Studio-Help" channel and we will be happy to assist you! discord.com/invite/MWAEvnC5fU
@pubg-hf2np4 ай бұрын
@@PyTorchLightning okay thanks
@I_hate_charlatans_like_mkbhd4 ай бұрын
remove the thumbnails in the last few seconds of the video as you are still explaining something with a slide. the thumbnails are nuisance
@I_hate_charlatans_like_mkbhd4 ай бұрын
would have been a nice video. the colored tophead is annoying with his irritating haircolor.
@carneica4 ай бұрын
for those who use windows, the equivalence between Mac OS and Win Command Prompt: mkdir <dir> mkdir <dir> or md <dir> cd <dir> cd <dir> ls dir open . start . clear cls touch <file> type nul > <file> mv <src> <dest> move <src> <dest> cp <src> <dest> copy <src> <dest> rm <file> del <file> in addition these useful ones: Delete Directory rm -r <dir> rmdir /s <dir> View File Contents cat <file> type <file>
@carneica4 ай бұрын
quite simple but educational and cool videos... I knew most of these things but it's always clever to "learn the basics" from "the experts" as I just code python as a tool for my main job. I always learn something new with these... ;) The Github videos were specially cool... as that was a black box to me until now! I'm not sure I need to use it as I just create small projects (I call it programs lol) and I keep my versions the old fashioned way (v1, v1.1, v2, etc...) but, one of these days I'll create an account and test the version ctrl over there. The "sharing code" concept is quite appealing, even for a non pro like me! ;)
@PyTorchLightning4 ай бұрын
Thanks for watching!
@akshay_pachaar5 ай бұрын
What Luca did at kzbin.info/www/bejne/ZpzYqmlpqpaekJo was so much fun!
@vmguerra5 ай бұрын
Nice intro to Thunder and DL compilers in general
@King_Muktar5 ай бұрын
Thank you For This 🤗🤗
@andrei_aksionau5 ай бұрын
Thanks for the video. I like a combination of drawings (nice ones, btw) and live coding. But I'm slightly confused about what exactly Thunder does. It transforms PyTorch code into its own intermediate representation (IR), then, if NVFuser is selected, transforms it into a "fusion specification", which NVFuser takes as an input and can do its part of the job. I hope I get it right. But what does Thunder do when torch.compile is selected? Just applies it to Thunder's IR or also does something specific? And what is happening when both torch.compile and NVFuser are selected? It will be nice if in the next episode it is explained with drawings, otherwise it's a bit difficult (at least for me) to figure out what goes where. I really want to know how "sausages are made" :)
@lucaantiga39415 ай бұрын
Great questions Andrei. We’ll make sure we cover them in the next episode. Spoiler: while in the case of NVFuser we take out IR (which is a subset of Python) and build a NVFuser input graph using its API, in the case of torch.compile we can take the IR, which is Python callable and pass it (all or in part) to torch.compile directly. See you next week!
@andrei_aksionau5 ай бұрын
Great introductionary video for a such a complex topic. Looking forward to a one about distributed.
@YokoSakh6 ай бұрын
Cool. I hope you’ll continue doing that lives.
@lucaantiga39416 ай бұрын
Thank you! Yes we will, see you next Friday!
@Lily-wr1nw6 ай бұрын
Is there a template for comfyui?
@PyTorchLightning6 ай бұрын
Yes! We do have templates using comfyui and more templates being added regularly.
@Lily-wr1nw6 ай бұрын
@@PyTorchLightning can you please link one in this post. It will be really helpful
@PyTorchLightning6 ай бұрын
@@Lily-wr1nw Visit Lightning.ai to browse the studio templates available! Here's a link to one to get you started: lightning.ai/mpilosov/studios/stable-diffusion-with-comfyui
@pedrogorilla4836 ай бұрын
I’ve been trying to understand the stable diffusion unet in detail for a while. This video added a few pieces of information I was missing from other material. Thanks!
@kimomoh54396 ай бұрын
I hope you solve this problem in PyTorch Lightning: RuntimeWarning: os.fork() was called. os.fork() is incompatible with multithreaded code, and JAX is multithreaded, so this will likely lead to a deadlock. self.pid = os.fork()
@PyTorchLightning6 ай бұрын
We are always working to alleviate problems people have while training. Join our discord to join the discussion and connect with a wide variety of experts in all things ML:. discord.com/invite/MWAEvnC5fU
@clippadas6 ай бұрын
Meu sonho, queria ter o modo 8 GPU desbloqueado pra mim usar um script para recuperar minha senha mais eu sou pobre
@samarbhosale83106 ай бұрын
I want to train for detecting text similarity for 2 questions between 0 and 1....my dataset is unlabelled how should i proceed can you guide.
@PyTorchLightning6 ай бұрын
Good question! Join our discord and get advice from a wide variety of experts in all things ML, including a special channel dedicated to this course. lnkd.in/g63PCKBN
@osamansr52816 ай бұрын
if overfit_batches uses the same batches for training and validation, shouldn't the validation loss == the training loss ?? I see the training loss getting reduced but the validation loss is increasing !! 😳
@osamansr52816 ай бұрын
I have a guess, but I'd appreciate some confirmation, that overfit_batches doesn't use the same batch in training and validation BUT the same batch count! so if the DataModule provides val_dataloader and train_dataloader they are going to be called and the same batch count is going to be sampled from both.
@PyTorchLightning6 ай бұрын
@@osamansr5281 The answer you arrived at is correct. :) Join the Lightning AI Discord for continued discussion with the ML community: discord.gg/zYcT6Yk9kw
@not_a_human_being6 ай бұрын
omg this is horrible
@osamansr52817 ай бұрын
did I misunderstand something or the graph presented in the over-fitting section of the video from [0:22] to [1:00] is mislabeled🧐 over-fitting occurs when the train accuracy *RED* increases while the test accuracy *BLUE* decreases, correct? 🤔 aren't the colors are swapped! btw, thanks for the amazing tutorials and special thanks for updating them <3
@SebastianRaschka7 ай бұрын
Good question. I think the your question arises because this shows the training and test accuracy in a slightly different context. Here, we are looking at the performance for different portions of the dataset. The overall idea is still true: the larger the gap the bigger the degree of overfitting. But the reason why you are seeing the training accuracy go down is that with more data, it becomes harder to memorize (because there's simply more data to memorize). And if there is more data (and it's harder to memorize), it becomes easier to generalize (hence the test accuracy goes up)
@saikatnextd7 ай бұрын
Love it thanks a lot Linus //
@MrPaPaYa867 ай бұрын
This was very clear and informative
@benc79107 ай бұрын
my plot_loss_and_acc(): def plot_loss_and_acc(log_dir) -> None: import pandas as pd import matplotlib.pyplot as plt metrics = pd.read_csv(f"{log_dir}/metrics.csv") # Group metrics by epoch and calculate mean for each metric df_metrics = metrics.groupby("epoch").mean() # Add epoch as a column df_metrics["epoch"] = df_metrics.index # Index is the grouping key (epoch) print(df_metrics.head(10)) df_metrics[["train_loss", "val_loss"]].plot( grid=True, legend=True, xlabel="Epoch", ylabel="Loss", title="Loss Curve" ) df_metrics[["train_acc_epoch", "val_acc_epoch"]].plot( grid=True, legend=True, xlabel="Epoch", ylabel="ACC", title="Accuracy" ) plt.show() plot_loss_and_acc(trainer.logger.log_dir)
@JikeWimblik7 ай бұрын
Couldn't you use 8bit precision during training by using double weights hence enabling more error tolerance and hence more speed up options.
@River-xd8sk8 ай бұрын
You just lost $510 because im not waiting 2 to 3 days to have my email "verified".
@NeoZondix8 ай бұрын
Thanks
@kevinsasso14058 ай бұрын
why are you blinking like that are you ok
@AbhishekBade13108 ай бұрын
X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.15, random_state=1, stratify=y) X_train, X_val, y_train, y_val = train_test_split( X_train, y_train, test_size=0.1, random_state=1, stratify=y_train) for this line of code im getting this error ValueError: The least populated class in y has only 1 member, which is too few. The minimum number of groups for any class cannot be less than 2. is there anything i can do to fix this?
@PyTorchLightning8 ай бұрын
Perhaps there is an issue in the data not getting loaded correctly and so there's a truncated dataset, which could cause that issue. If you open an issue on our course GitHub, we could help you debug and get to the bottom of it: github.com/Lightning-AI/dl-fundamentals/issues
@user-wr4yl7tx3w9 ай бұрын
i think we need more tutorial videos on lightning studio
@PyTorchLightning9 ай бұрын
Thanks for the feedback! Check out Lightning's founder William Falcon's youtube channel for more videos featuring Lightning Studio: www.youtube.com/@WilliamAFalcon
@prathameshdinkar29669 ай бұрын
That is quite a unique and nice functionality! I faced the issues of OOM at higher batch sizes, and I think this is a good solution to it! Keep the good work going 😁
@pal9999 ай бұрын
Isn't MLE used in Logistic regression and not Gradient descent?
@SebastianRaschka9 ай бұрын
Hi there, so in this example, we perform maximum likelihood estimation (MLE) using gradient descent
@IvarGarnes-o3o9 ай бұрын
Sebastian, I have recvently started to watch your videos on AI. I find the material relatively easy to follow and very interesting. I do have a question related to section 3.6. In the conde we are looping over the minibatch 'for batch_idx, (features, class_labels) in enumerate(train_loader):'. At first I thought I understood this, but when I inserted a line in the code to print out the class_labels, I expected that the output on every second minibatch to be the same. However, they are not. Does this mean the every time we are running the line - for batch_idx, (features, class_labels) in enumerate(train_loader): - the date in being shuffled?? Ivar
@SebastianRaschka9 ай бұрын
Hi there. Yes the data is being shuffled via the data loader. This is usually recommended -- I have done experiments many years ago with and without shuffling, and neural networks learn better if they see the data in a different order in each epoch. You can turn off the shuffling though via `shuffle=False` if you want in the data loader if you want (in the code here it's set to shuffle=True)
@IvarGarnes-o3o9 ай бұрын
Sebastian, I have recvently started to watch your videos on AI. I find the material relatively easy to follow and very interesting. I do have a question related to section 3.6. In the conde we are looping over the minibatch 'for batch_idx, (features, class_labels) in enumerate(train_loader):'. At first I thought I understood this, but when I inserted a line in the code to print out the class_labels, I expected that the output on every second minibatch to be the same. However, they are not. Does this mean the every time we are running the line - for batch_idx, (features, class_labels) in enumerate(train_loader): - the date in being shuffled?? Ivar