Lesson 25: Deep Learning Foundations to Stable Diffusion

Рет қаралды 6,792

Күн бұрын

(All lesson resources are available at course.fast.ai.) In this final lesson of the series, Johno begins by showing us how we can convert sounds into pictures, and then take advantage of what we've learned in this course to generate audio! He builds and demonstrates a very effective bird-song generator using this approach.
Then Jeremy wraps up "Stable diffusion from scratch" by showing how to use the latents in a variational encoder as the "pixels" in a regular diffusion model. He also describes an intriguing new idea for students to follow up: what if you use latents for other purposes, such as a classification model? Perhaps this would open up a whole world of possibilities, such as latents-FID, latents-perceptual-loss, and new approaches to diffusion guidance!

Пікірлер: 12

@ADHDOCD Жыл бұрын

Amazing lecture as always! Cannot wait for the LLM lectures. MIT's lectures pale in comparison to what Jeremy and his team produce.

@abhishekmann Жыл бұрын

There are going to be LLM lectures?

@ADHDOCD Жыл бұрын

@@abhishekmann Yes

@NewMateo Жыл бұрын

@@ADHDOCD any link to where I can read more about this to confirm? That would be wonderful! Ill be taking huggin faces course after but I would love to see jeremy and their team really dig into it as their own series.

@briansmithphotos Жыл бұрын

@@NewMateo If you watch just the first few minutes and the last few minutes they mention the LLM ;)next course - but plenty of value in watching all the parts in between too!

@NewMateo Жыл бұрын

@lunchwithalens im on Lesson 10 lol. I just skipped ahead to see how far the course went. Nontheless thanks for the heads up!

@hacklife8363 Жыл бұрын

How do I join the live classes for LLM?

@kashifsiddhu 6 ай бұрын

00:02 Introduction to NLP and delay in completing stable diffusion 02:19 Creating a subset dataset from longer call recordings for deep learning analysis 06:16 Mel spectrogram focuses on human audible frequencies and transforms them into a log space. 08:26 Converting audio to spectrogram and back to audio 12:46 The model uses Transformer blocks for stable diffusion 14:58 Generating fake bird calls with spectrogram diffusion 19:23 Creating a simple autoencoder with a single hidden layer MLP 21:35 The simple autoencoder compresses and regenerates images. 26:00 Log variance affects standard deviation in deep learning 28:10 Utilizing BCE loss for deeper learning stability 32:21 Minimize log variance in deep learning foundations 34:47 Mapping inputs to a restricted range for better decoding 38:39 Introducing new metrics for model evaluation 40:45 VAE benefits from pre-trained models for efficient generation 44:51 Creating a data set and pre-processing images for deep learning 47:02 Using parallel processing to speed up image reading in deep learning 51:14 Discussion on spatial resolution and training objectives 53:21 Deep learning foundations include perceptual loss and adversarial loss 57:54 Pre-training generator and discriminator for GANs 59:49 Using memory mapped numpy files to save latents efficiently 1:04:01 Creating memory-mapped numpy array of latents 1:06:02 Training and validation set creation 1:10:06 Creating high-quality 256 by 256 pixel images in a few hours with stable diffusion VAE 1:11:56 Experimenting with diffusers and stable diffusion models for better results. 1:16:03 Data set acquisition process explained 1:18:01 Creating a cache for quicker access to files 1:22:11 Preparing and transforming training data for deep learning 1:23:58 Implementing data augmentation techniques in deep learning training process 1:27:47 Achieved 66% accuracy after 40 epochs of training a new model 1:30:07 Pre-training with perceptual loss yields promising results 1:33:40 Congratulations on completing the course, consider experimenting and collaborating further 1:35:36 Deep Learning Foundations to Stable Diffusion Crafted by Merlin AI.