Lesson 18: Deep Learning Foundations to Stable Diffusion

  Рет қаралды 7,773

Jeremy Howard

Jeremy Howard

Күн бұрын

(All lesson resources are available at course.fast.ai.) In this lesson, we dive into various stochastic gradient descent (SGD) accelerated approaches, such as momentum, RMSProp, and Adam. We start by experimenting with these techniques in Microsoft Excel, creating a simple linear regression problem and applying the different approaches to solve it. We also introduce learning rate annealing and show how to implement it in Excel. Next, we explore learning rate schedulers in PyTorch, focusing on Cosine Annealing and how to work with PyTorch optimizers. We create a learner with a single batch callback and fit the model to obtain an optimizer. We then explore the attributes of the optimizer and explain the concept of parameter groups.
We continue by implementing the OneCycleLR scheduler from PyTorch, which adjusts the learning rate and momentum during training. We also discuss how to improve the architecture of a neural network by making it deeper and wider, introducing ResNets and the concept of residual connections. Finally, we explore various ResNet architectures from the PyTorch Image Models (timm) library and experiment with data augmentation techniques, such as random erasing and test time augmentation.
0:00:00 - Accelerated SGD done in Excel
0:01:35 - Basic SGD
0:10:56 - Momentum
0:15:37 - RMSProp
0:16:35 - Adam
0:20:11 - Adam with annealing tab
0:23:02 - Learning Rate Annealing in PyTorch
0:26:34 - How PyTorch’s Optimizers work?
0:32:44 - How schedulers work?
0:34:32 - Plotting learning rates from a scheduler
0:36:36 - Creating a scheduler callback
0:40:03 - Training with Cosine Annealing
0:42:18 - 1-Cycle learning rate
0:48:26 - HasLearnCB - passing learn as parameter
0:51:01 - Changes from last week, /compare in GitHub
0:52:40 - fastcore’s patch to the Learner with lr_find
0:55:11 - New fit() parameters
0:56:38 - ResNets
1:17:44 - Training the ResNet
1:21:17 - ResNets from timm
1:23:48 - Going wider
1:26:02 - Pooling
1:31:15 - Reducing the number of parameters and megaFLOPS
1:35:34 - Training for longer
1:38:06 - Data Augmentation
1:45:56 - Test Time Augmentation
1:49:22 - Random Erasing
1:55:55 - Random Copying
1:58:52 - Ensembling
2:00:54 - Wrap-up and homework
Many thanks to Francisco Mussari for timestamps and transcription.

Пікірлер: 12
@mkamp
@mkamp 8 ай бұрын
Bam. This lesson is dynamite. So much depth in just one lesson. ❤
@howardjeremyp
@howardjeremyp 8 ай бұрын
Glad you think so!
@mkamp
@mkamp 8 ай бұрын
Around 1:58:00 (Rand copy). To truly preserve the existing distribution we could also copy the patch from a to b, but also copy what was prior to the copy on b to a.
@Lily-wp5do
@Lily-wp5do 5 ай бұрын
This is absolutely amazing!
@seanriley3121
@seanriley3121 3 ай бұрын
the random replace doesn't need to be slices/patches.. it could "swap" individual pixels. even easier to implement
@mkamp
@mkamp 8 ай бұрын
Around 1:36:00, using batchnorm scales the activations, but the activations are also scaled by the weights and with gamma of batch norm. Regularizing the weights of the linear modules becomes ineffective if the model learns to increase gamma? And it would because there is only one gamma parameter per module, but many weight parameters, therefore the gamma penalty is not having too much of an impact on the loss? Is that what Jeremy explains? Also this would be true for LayerNorm as well?
@coolarun3150
@coolarun3150 Жыл бұрын
coool!
@JensNyborg
@JensNyborg Жыл бұрын
Just before you went into copying I was sitting here thinking you could do a random shuffle to maintain the distribution. It may not matter, but the distribution stil changes when you delete pixels. After all, now there are more of the ones you copied. (And I should write this on the forums, but for now I'll write it here lest I forget.)
@carnap355
@carnap355 6 ай бұрын
distribution of individual datapoints changes but average distribution remains the same because the copied patch and the replaced patch on average will have the same distribution
@alexkelly757
@alexkelly757 11 ай бұрын
Jeramy comments about twitter not existing is quite ept. Its now X
@satirthapaulshyam7769
@satirthapaulshyam7769 9 ай бұрын
59:00 Summary 1:17
@satirthapaulshyam7769
@satirthapaulshyam7769 6 ай бұрын
1:07:00. Resblock
Lesson 19: Deep Learning Foundations to Stable Diffusion
1:30:03
Jeremy Howard
Рет қаралды 9 М.
Lesson 15: Deep Learning Foundations to Stable Diffusion
1:37:18
Jeremy Howard
Рет қаралды 10 М.
Please be kind🙏
00:34
ISSEI / いっせい
Рет қаралды 177 МЛН
Increíble final 😱
00:37
Juan De Dios Pantoja 2
Рет қаралды 108 МЛН
I wish I could change THIS fast! 🤣
00:33
America's Got Talent
Рет қаралды 76 МЛН
Super gymnastics 😍🫣
00:15
Lexa_Merin
Рет қаралды 108 МЛН
Lesson 11 2022: Deep Learning Foundations to Stable Diffusion
1:48:17
Jeremy Howard
Рет қаралды 21 М.
Lesson 14: Deep Learning Foundations to Stable Diffusion
1:49:37
Jeremy Howard
Рет қаралды 12 М.
Diffusion Models Explained with Math From Scratch
31:21
Computer Vision with Hüseyin Özdemir
Рет қаралды 1,4 М.
793: Bayesian Methods and Applications - with Alexandre Andorra
1:31:57
Super Data Science: ML & AI Podcast with Jon Krohn
Рет қаралды 1 М.
Popular Technologies that Won't be Around Much Longer...
14:36
Sideprojects
Рет қаралды 146 М.
UFC Fighter Quits Mid Fight!
0:43
UFC Hub
Рет қаралды 17 МЛН
How to Create a Neural Network (and Train it to Identify Doodles)
54:51
Sebastian Lague
Рет қаралды 1,8 МЛН
Diffusion models from scratch in PyTorch
30:54
DeepFindr
Рет қаралды 233 М.
Lesson 16: Deep Learning Foundations to Stable Diffusion
1:25:39
Jeremy Howard
Рет қаралды 9 М.
Lecture 15: Object Detection
1:12:32
Michigan Online
Рет қаралды 58 М.
Please be kind🙏
00:34
ISSEI / いっせい
Рет қаралды 177 МЛН