Stable Diffusion: High-Resolution Image Synthesis with Latent Diffusion Models

No video

Stable Diffusion: High-Resolution Image Synthesis with Latent Diffusion Models | ML Coding Series

Рет қаралды 53,081

Aleksa Gordić - The AI Epiphany

Күн бұрын

Пікірлер: 66

@deepblender 2 жыл бұрын

Very cool video, thanks a lot for that! Whenever I train neural networks, it is still surprising to me that this whole process actually works and is even quite flexible. I just got started with diffusion models and this video is invaluable to solidify my understanding. To me, one of the critical milestones in learning something new is to confidently know when something is not important for my current toy project and which snippets are worth trying out.

@myksmith Жыл бұрын

This is an amazing video - you start with the paper, not pulling punches on the tech, then dive into practical applications. Wonderful.

@TheAIEpiphany Жыл бұрын

Thanks Michael!

@Kutlutr Ай бұрын

Wonderful video, to the point, starting with the paper, understanding the necessary background and moving forward with practical exercise, very nice, keep up the good work bro.

@TheAIEpiphany 2 жыл бұрын

Check out my high level how to get started with diffusion video here: kzbin.info/www/bejne/m6HOpX6qgbyafrM If you want to understand how Stable Diffusion works behind the scenes in this video I walk you through the codebase step by step explaining: 1. First stage autoencoder training (autoencoder with KL regularization) 2. Latent Diffusion Model training (UNet + conditioning model) 3. Sampling using PLMS scheduler and how a link to differential equations enables us to sample much faster

@arcosin3861 2 ай бұрын

If I was a king or other monarch of authority, I would Knight you. Great tutorial. Peace brother.

@junpyohong2132 Жыл бұрын

Amazing work. It helps me a lot to understand code! Trying to figure PLM out. Not an EZ thing...

@lukadjokovitch7040 2 жыл бұрын

Your Chanel is gold

@TheAIEpiphany 2 жыл бұрын

Thanks man

@Ayayron1998 2 жыл бұрын

More videos on stable/latent diffusion please

@manikantabandla3923 9 ай бұрын

What is the difference between stable diffusion and latent diffusion repo?

@marcusleong7604 Жыл бұрын

This video is pure gold!

@user-uc1ky6ce5n Жыл бұрын

This video really help me to fix my latent diffusion model ,thanks!

@Vikram-wx4hg 2 жыл бұрын

Super thanks for this video, Aleksa. And just to let you know I sat through the whole of it - in two sittings (with a lunch break in between). 🙂 And it was so well done. 👏

@TheAIEpiphany 2 жыл бұрын

Haha niceee!! Thanks, glad to hear that!

@nekoeko500 2 жыл бұрын

A few minutes into the vid and I see this guy using Irfanview. I need to subscribe

@TheAIEpiphany 2 жыл бұрын

Lol 🤣

@abhishekdhiman5719 Жыл бұрын

i went through LDM paper, i didnt understand what is the ground truth author is using. The output should be conditioned on the input text and for comparing this we need a ground truth. How to obtain that ground truth for training the model

@masoudkarimi8211 4 ай бұрын

Great job, way to go :)

@chyldstudios 2 жыл бұрын

Amazing, I was just looking for a video on this specific topic. Thanks for sharing!

@TheAIEpiphany 2 жыл бұрын

Nice, you're welcome!

@sairampenjarla 6 ай бұрын

bro's common knowledge is my entire knowledge.

@giovanith Жыл бұрын

Hello Aleksa, what a great & legit lecture about this issue. Thank you

@abhijeetnarharshettiwar6175 2 жыл бұрын

Thank you Aleksa.

@TheAIEpiphany 2 жыл бұрын

You are welcome!

@sacramentofwilderness6656 2 жыл бұрын

Thanks, Aleksa for the video, very insightful and thorough series. By the way, you have encouraged me to use debugger instead of print statements :)

@mananshah2140 7 ай бұрын

This is the best part! Use the debugger!

@tresuvesdobles Жыл бұрын

Many thanks for the video! It is helping quite a bit to understand their codebase.

@tresuvesdobles Жыл бұрын

Help! At around 53:40 I am getting an Exception: FileNotFoundError: [Errno 2] No such file or directory: 'configs/first_stage_models/vq-f8/model.yaml'

@maramrashad3403 Жыл бұрын

Hello, I'm getting the same error. Could you please share how you fix that, if you did. Many thanks.

@trideeprath1991 Жыл бұрын

Great video. But kinda got confusing on the conditioning part especially with the clip based conditioning, why there were two networks, one with empty text and another with text and then add them up. Would be good if you can point out to understand that part better ?

@objectobjectobject4707 2 жыл бұрын

Ovo je kidanje !

@vincentaxm8322 2 жыл бұрын

thank you so much!!! your videos really help me a lot.

@user-lb1bs8iv3f 2 жыл бұрын

would you please explain where to put the downloaded imagenette and how to config it to work.

@leshanwang4090 Жыл бұрын

may I ask how many 3090GPUs do we need to train this latent diffusion model for super resolution task，cos the stable diffusion V1 is very expensive and needs hundreds and thousands of A100 GPU, it's terrifying

@akhandatripathi9897 Жыл бұрын

instead of text prompt as condition, how can we use image as condition in cross-attention to develop image to image translation model?

@dimitrismit6714 Жыл бұрын

Propably a stupid question, but why do we sample from the distribution since we have the decoder of vq gan?

@tylersnard 2 жыл бұрын

This is amazing, thank you! Would you mind doing a post about Enformer?

@TheAIEpiphany 2 жыл бұрын

I unfortunately can't cover DeepMind research due to NDA restrictions.

@user-lb1bs8iv3f 2 жыл бұрын

Thank you so much. I finally got the code to work. But I am confused with the classifier-free guidance. In the original classifier-free paper, they said that during training they randomly remove the condition. However, I haven't see this operation in any codes in DDPM series. Got any ideas？Many Thanks。

@kallamamran Жыл бұрын

I wish I could follow this 😳👍

@user-lb1bs8iv3f 2 жыл бұрын

Amazing video

@TheAIEpiphany 2 жыл бұрын

Thanks!

@commandobrandolive Жыл бұрын

I get lost at the beginning of the code part cause I cloned the project but there is no .vscode file and I have show hidden files on. Not sure how to get or make the appropriate launch.json file

@mananshah2140 7 ай бұрын

Just add this to configuration: { "name": "training", "type": "python", "request": "launch", "program": "${file}", "console": "integratedTerminal", "justMyCode": true, "env": { "PYDEVD_WARN_SLOW_RESOLVE_TIMEOUT": "5" }, "args": [ "--base", "configs/autoencoder/autoencoder_kl_64x64x3.yaml", "-t", "--gpus", "0," ] }

@Jake-om9no 5 ай бұрын

@@mananshah2140 Hi, what do you mean exactly by saying 'add this to configuration'? May I know which specific file should I input those lines? Really appreciate it if you can help me with this.

@dcdcdc5469 Жыл бұрын

Hello, a very interesting video. Do you have any content on how to use this type of models to upscale low-resolution images to high resolution? Or do you know any website with examples of this? Thank you very much for your content.

@SanatBatra 9 ай бұрын

Hey, did you get any solution to this problem ? Let me know ASAP !

@catfood7859 Жыл бұрын

Huge thanks for the crystal clear explanation! May I ask what's the difference between the latent diffusion vs stable diffusion? (I roughly browsed two github repos and found they are literally the same)

@elkwang4357 Жыл бұрын

I think stable diffusion is the updated version of ldm. They are almost the same.

@awa8766 Жыл бұрын

From my research, stable diffusion is just a name for the architecture released by Stability AI and the lab they worked with. The entire network falls within the latent diffusion domain. Therefore, stable diffusion is a part of latent diffusion.

@convolutionalnn2582 2 жыл бұрын

Berkeley Reinforcement Learning or David Silver course or Coursera Reinforcement Learning....Which one is best to get start in Reinforcement Learning?

@TheAIEpiphany 2 жыл бұрын

Yes ♥️

@andreyzakharov3651 2 жыл бұрын

I'd recommend Hugging face's course on Reinforcement learning

@mobi02 2 жыл бұрын

Hi, Would you explain this paper in your video? "First Order Motion Model for Image Animation"

@raroca23 2 жыл бұрын

Hi, thanks for your video. I'm learning a lot but I still don't have enought programming skills in DL and Pytorch. I have a doubt/question that could be very useful for me and I think for other people if can be solved. Do you think there is a way of adding local labeled images (for example your own photos) to the model without training all the dataset?. I see a lot of commertial applications in this area if can be done with a local GPU.

@cl1mbat1ze 2 жыл бұрын

It's not exactly what you want but you may find Textual Inversion interesting. You can create an embedding that biases the model towards including the concept or object represented in the embedding when it generates pictures. The embedding is created by training the model on 3-5 images of your choosing (representing the concept/object you want to bias the model towards) over 5000+ timesteps AND it can be run locally, albeit with a lot of Vram and slowly. It can definitively be run from a collab, though, if you splurge for the pro version.

@quecksilber457 Жыл бұрын

How do i change the resolution of my images? Is there an easy way?

@wenfangsun Жыл бұрын

Hello, I have the same question

@danwood4171 2 жыл бұрын

Assuming sd-v1-4.ckpt is the ?model?/?weights? for a decent stable diffusion program is the weights/model for stability.ai downloadable given that you said we can do local generation. I assume the quality is better with stability.ai versus something like sd-v1-4.ckpt???

@Vikram-wx4hg 2 жыл бұрын

Same quality.

@suntanudipto2228 2 жыл бұрын

Little did I know that my dumapp self creating drum softs by making a bunch of different soft rolls was far from the best way of doing

@joseantonioapariciogallego5620 Жыл бұрын

I don't understand it, so where is the source code, can you see it when you become a patreon?

@mananshah2140 7 ай бұрын

blah blah blah, we don't care about model checkpointing. There are some conv layers, configs 🤣. This guy simply shrugs off complex code with insouciance. Just so much fun to watch and follow along. "Nothing Fundamental!"🤣

@dingran Жыл бұрын

1:30:53 the `len(old_eps)==2` branch is the Adams-Bashforth three-step method see en.wikipedia.org/wiki/Linear_multistep_method (search for "The Adams-Bashforth methods with ")

@navissivan 2 ай бұрын

The loss part is still very confusing to me, why add the same loss twice with different weights, and as you said in the end going to zero? Anyone knows?