MIT 6.S191 (2023): Text-to-Image Generation

Рет қаралды 48,558

Күн бұрын

Пікірлер: 30

@MrMMF94 Жыл бұрын

I appreciate this series, and am grateful for it. Therefore some minor feedback on this lecture: Unlike the others this one feels less like being taught, more like he's presenting his research / work.

@RedTooNotBlue Жыл бұрын

Agreed, definitely needed some high-level explanation of the terminology he was using, guess I'll go ask ChatGPT 😂

@bohaning 11 ай бұрын

🎯Course outline for quick navigation: [00:09-05:11]1. Text-to-image generation with muse model -[01:07-02:21]Text enables non-experts to create compelling images, leveraging large-scale data and pre-trained language models. -[04:52-05:22]Muse is faster, generating a 512x512 image in 1.3 seconds compared to 10 seconds for image in prior models and about 4 seconds for stable diffusion. [05:11-17:23]2. Efficient image generation model and token-based super resolution with muse -[05:11-06:02]512x512 image generated in 1.3s, outperforming stable diffusion by 6.7s, with high clip score and fid, indicating fast and high-quality performance. -[06:28-07:01]Muse uses transformer-based architecture for text and image, employing cnns, vector quantization, and gan in the modern deep network toolbox. -[07:21-07:56]Two models: base generates 256x256 images, super resolution upscales to 512x512. t5 xxl model with 5b parameters used. -[10:47-11:12]Variable distribution drops 64% tokens, enhancing network for editing apps and allowing masks of different sizes. -[16:33-16:58]Using classifier-free guidance to generate scenes without specific objects. [17:24-22:48]3. Image generation and model evaluation -[17:24-18:00]Iterative decoding improves image generation with up to 18 steps. -[18:57-19:29]Raters preferred our model 70% of the time over stable diffusion (25%). -[21:21-22:48]Text-to-image model evaluation and challenges in mapping concepts to pixels. [22:48-28:40]4. Evaluating dall-e2 and imagen models, text-guided image editing, and style transfer -[22:48-25:10]Dall-e2 model excels in fid and clip scores, with a runtime of 1.3 seconds for super resolution. -[26:15-27:42]Style transfer and mask-free editing demonstrated with image examples, showcasing the model's ability to make various changes based on text and attention between text and images. -[28:18-28:40]Cake and latte morph into croissant, latte art changes from heart to flower. [28:43-36:08]5. Interactive editing and parallel decoding -[28:43-30:46]Model enables real-time interactive editing, with focus on improving resolution and text-image interaction. -[31:06-31:56]Research focuses on speeding up neural network processing, aiming for parallel decoding and high confidence token use. -[33:27-34:07]Editing suggests smoothness, but model fails with more than 6-7 of the same item. -[35:26-35:55]The model generates random scenes dominated by backgrounds like mountains and beaches when fed with nonsense text prompts. [36:08-44:32]6. Image editing and ai generation -[36:55-37:36]Editing process involves small back prop steps, may need faster progression -[38:31-38:57]Realistic pose changes are harder than global style changes due to increased token interaction. -[39:36-40:10]Using random seeds, we generate images; 3-4 out of 8-16 are nice. no automated way to match image to prompt. -[41:31-42:22]Data biased towards famous artists, requires fine tuning for new styles. -[44:06-44:32]Training hundreds of millions of images to identify new combinations of concepts, likely not memorization. offered by Coursnap

@tomski2671 Жыл бұрын

I'm tired of image generators creating cat images (I have two, don't need to see anymore) What I would really love to see are technically accurate diagrams, drawings and other visualizations. 2D, 3D and later video, and at a user specified level of detail. It seems nobody is doing that. Imagine asking AI to visualize how a particular ML model process data, and seeing a video of it in action.

@chyldstudios Жыл бұрын

I was looking forward to this lecture. It did not disappoint.

@phyrajkumarverma4412 Жыл бұрын

Thanks a lot ❤❤ I love to see your lectures ..

@nataliameira2283 Жыл бұрын

I'ts very interesting! Thank you for that!

@fitsum8402 Жыл бұрын

Thank you Alex

@jennifergo2024 Жыл бұрын

Thanks for sharing!

@AndyKong51 9 ай бұрын

Is MUSE related to Diffusion Transformer? thx

@Diego0wnz Жыл бұрын

great research!

@mariuspy 7 ай бұрын

this is an "intro" class, however it is attended by experts 😅

@qiaomuli2421 Жыл бұрын

Thanks a lot!

@teron-131 Жыл бұрын

Hello! Where is the slide of this lecture? There is no link in the website 🥲

@ZeeshanAli-w2f Жыл бұрын

awesome as usual.

@rahulsingh7508 Жыл бұрын

I would love to see the MUSE model generating images of X-rays or CT scans for the curable and non-curable brain tumours. And getting the accuracy of those images vetted by at least 1000 brain cancer surgeons.