Lesson 18: Deep Learning Foundations to Stable Diffusion

  Рет қаралды 7,914

Jeremy Howard

Jeremy Howard

Күн бұрын

(All lesson resources are available at course.fast.ai.) In this lesson, we dive into various stochastic gradient descent (SGD) accelerated approaches, such as momentum, RMSProp, and Adam. We start by experimenting with these techniques in Microsoft Excel, creating a simple linear regression problem and applying the different approaches to solve it. We also introduce learning rate annealing and show how to implement it in Excel. Next, we explore learning rate schedulers in PyTorch, focusing on Cosine Annealing and how to work with PyTorch optimizers. We create a learner with a single batch callback and fit the model to obtain an optimizer. We then explore the attributes of the optimizer and explain the concept of parameter groups.
We continue by implementing the OneCycleLR scheduler from PyTorch, which adjusts the learning rate and momentum during training. We also discuss how to improve the architecture of a neural network by making it deeper and wider, introducing ResNets and the concept of residual connections. Finally, we explore various ResNet architectures from the PyTorch Image Models (timm) library and experiment with data augmentation techniques, such as random erasing and test time augmentation.
0:00:00 - Accelerated SGD done in Excel
0:01:35 - Basic SGD
0:10:56 - Momentum
0:15:37 - RMSProp
0:16:35 - Adam
0:20:11 - Adam with annealing tab
0:23:02 - Learning Rate Annealing in PyTorch
0:26:34 - How PyTorch’s Optimizers work?
0:32:44 - How schedulers work?
0:34:32 - Plotting learning rates from a scheduler
0:36:36 - Creating a scheduler callback
0:40:03 - Training with Cosine Annealing
0:42:18 - 1-Cycle learning rate
0:48:26 - HasLearnCB - passing learn as parameter
0:51:01 - Changes from last week, /compare in GitHub
0:52:40 - fastcore’s patch to the Learner with lr_find
0:55:11 - New fit() parameters
0:56:38 - ResNets
1:17:44 - Training the ResNet
1:21:17 - ResNets from timm
1:23:48 - Going wider
1:26:02 - Pooling
1:31:15 - Reducing the number of parameters and megaFLOPS
1:35:34 - Training for longer
1:38:06 - Data Augmentation
1:45:56 - Test Time Augmentation
1:49:22 - Random Erasing
1:55:55 - Random Copying
1:58:52 - Ensembling
2:00:54 - Wrap-up and homework
Many thanks to Francisco Mussari for timestamps and transcription.

Пікірлер: 12
@mkamp
@mkamp 8 ай бұрын
Bam. This lesson is dynamite. So much depth in just one lesson. ❤
@howardjeremyp
@howardjeremyp 8 ай бұрын
Glad you think so!
@mkamp
@mkamp 8 ай бұрын
Around 1:58:00 (Rand copy). To truly preserve the existing distribution we could also copy the patch from a to b, but also copy what was prior to the copy on b to a.
@Lily-wp5do
@Lily-wp5do 5 ай бұрын
This is absolutely amazing!
@seanriley3121
@seanriley3121 3 ай бұрын
the random replace doesn't need to be slices/patches.. it could "swap" individual pixels. even easier to implement
@mkamp
@mkamp 8 ай бұрын
Around 1:36:00, using batchnorm scales the activations, but the activations are also scaled by the weights and with gamma of batch norm. Regularizing the weights of the linear modules becomes ineffective if the model learns to increase gamma? And it would because there is only one gamma parameter per module, but many weight parameters, therefore the gamma penalty is not having too much of an impact on the loss? Is that what Jeremy explains? Also this would be true for LayerNorm as well?
@coolarun3150
@coolarun3150 Жыл бұрын
coool!
@alexkelly757
@alexkelly757 11 ай бұрын
Jeramy comments about twitter not existing is quite ept. Its now X
@JensNyborg
@JensNyborg Жыл бұрын
Just before you went into copying I was sitting here thinking you could do a random shuffle to maintain the distribution. It may not matter, but the distribution stil changes when you delete pixels. After all, now there are more of the ones you copied. (And I should write this on the forums, but for now I'll write it here lest I forget.)
@carnap355
@carnap355 7 ай бұрын
distribution of individual datapoints changes but average distribution remains the same because the copied patch and the replaced patch on average will have the same distribution
@satirthapaulshyam7769
@satirthapaulshyam7769 7 ай бұрын
1:07:00. Resblock
@satirthapaulshyam7769
@satirthapaulshyam7769 9 ай бұрын
59:00 Summary 1:17
Lesson 19: Deep Learning Foundations to Stable Diffusion
1:30:03
Jeremy Howard
Рет қаралды 9 М.
Jeremy Howard demo for Mojo launch
7:28
Jeremy Howard
Рет қаралды 131 М.
THE POLICE TAKES ME! feat @PANDAGIRLOFFICIAL #shorts
00:31
PANDA BOI
Рет қаралды 24 МЛН
When You Get Ran Over By A Car...
00:15
Jojo Sim
Рет қаралды 21 МЛН
Always be more smart #shorts
00:32
Jin and Hattie
Рет қаралды 48 МЛН
MIT Introduction to Deep Learning | 6.S191
1:09:58
Alexander Amini
Рет қаралды 344 М.
Fixing RAG with GraphRAG
15:04
Vivek Haldar
Рет қаралды 5 М.
How I would learn Machine Learning (if I could start over)
7:43
AssemblyAI
Рет қаралды 745 М.
This is why Deep Learning is really weird.
2:06:38
Machine Learning Street Talk
Рет қаралды 365 М.
A Hackers' Guide to Language Models
1:31:13
Jeremy Howard
Рет қаралды 510 М.
Lesson 11 2022: Deep Learning Foundations to Stable Diffusion
1:48:17
Jeremy Howard
Рет қаралды 21 М.
Lesson 8 - Practical Deep Learning for Coders 2022
1:36:55
Jeremy Howard
Рет қаралды 33 М.
TensorFlow in 100 Seconds
2:39
Fireship
Рет қаралды 911 М.
THE POLICE TAKES ME! feat @PANDAGIRLOFFICIAL #shorts
00:31
PANDA BOI
Рет қаралды 24 МЛН