MIT 6.S191 (2023): Text-to-Image Generation

  Рет қаралды 44,865

Alexander Amini

Alexander Amini

Жыл бұрын

MIT Introduction to Deep Learning 6.S191: Lecture 8
Deep Learning Limitations and New Frontiers
Lecturer: Dilip Krishnan
2023 Edition
For all lectures, slides, and lab materials: introtodeeplearning.com​
Lecture Outline - coming soon!
Subscribe to stay up to date with new deep learning lectures at MIT, or follow us @MITDeepLearning on Twitter and Instagram to stay fully-connected!!

Пікірлер: 30
@MrMMF94
@MrMMF94 10 ай бұрын
I appreciate this series, and am grateful for it. Therefore some minor feedback on this lecture: Unlike the others this one feels less like being taught, more like he's presenting his research / work.
@RedTooNotBlue
@RedTooNotBlue 5 ай бұрын
Agreed, definitely needed some high-level explanation of the terminology he was using, guess I'll go ask ChatGPT 😂
@chyldstudios
@chyldstudios Жыл бұрын
I was looking forward to this lecture. It did not disappoint.
@phyrajkumarverma4412
@phyrajkumarverma4412 Жыл бұрын
Thanks a lot ❤❤ I love to see your lectures ..
@fitsum8402
@fitsum8402 Жыл бұрын
Thank you Alex
@Diego0wnz
@Diego0wnz Жыл бұрын
great research!
@user-yd1pq9ci5z
@user-yd1pq9ci5z Жыл бұрын
awesome as usual.
@qiaomuli2421
@qiaomuli2421 Жыл бұрын
Thanks a lot!
@nataliameira2283
@nataliameira2283 Жыл бұрын
I'ts very interesting! Thank you for that!
@jennifergo2024
@jennifergo2024 6 ай бұрын
Thanks for sharing!
@pavalep
@pavalep Жыл бұрын
Awesome Lecture :)
@RajabNatshah
@RajabNatshah 11 ай бұрын
Thank you :)
@tomski2671
@tomski2671 10 ай бұрын
I'm tired of image generators creating cat images (I have two, don't need to see anymore) What I would really love to see are technically accurate diagrams, drawings and other visualizations. 2D, 3D and later video, and at a user specified level of detail. It seems nobody is doing that. Imagine asking AI to visualize how a particular ML model process data, and seeing a video of it in action.
@lmat9624
@lmat9624 Жыл бұрын
Thanks, amazing lecture
@hussienalsafi1149
@hussienalsafi1149 Жыл бұрын
👌👌👌👌👌☺️☺️☺️😎😎😎😎
@mariuspy
@mariuspy 28 күн бұрын
this is an "intro" class, however it is attended by experts 😅
@bohaning
@bohaning 4 ай бұрын
🎯Course outline for quick navigation: [00:09-05:11]1. Text-to-image generation with muse model -[01:07-02:21]Text enables non-experts to create compelling images, leveraging large-scale data and pre-trained language models. -[04:52-05:22]Muse is faster, generating a 512x512 image in 1.3 seconds compared to 10 seconds for image in prior models and about 4 seconds for stable diffusion. [05:11-17:23]2. Efficient image generation model and token-based super resolution with muse -[05:11-06:02]512x512 image generated in 1.3s, outperforming stable diffusion by 6.7s, with high clip score and fid, indicating fast and high-quality performance. -[06:28-07:01]Muse uses transformer-based architecture for text and image, employing cnns, vector quantization, and gan in the modern deep network toolbox. -[07:21-07:56]Two models: base generates 256x256 images, super resolution upscales to 512x512. t5 xxl model with 5b parameters used. -[10:47-11:12]Variable distribution drops 64% tokens, enhancing network for editing apps and allowing masks of different sizes. -[16:33-16:58]Using classifier-free guidance to generate scenes without specific objects. [17:24-22:48]3. Image generation and model evaluation -[17:24-18:00]Iterative decoding improves image generation with up to 18 steps. -[18:57-19:29]Raters preferred our model 70% of the time over stable diffusion (25%). -[21:21-22:48]Text-to-image model evaluation and challenges in mapping concepts to pixels. [22:48-28:40]4. Evaluating dall-e2 and imagen models, text-guided image editing, and style transfer -[22:48-25:10]Dall-e2 model excels in fid and clip scores, with a runtime of 1.3 seconds for super resolution. -[26:15-27:42]Style transfer and mask-free editing demonstrated with image examples, showcasing the model's ability to make various changes based on text and attention between text and images. -[28:18-28:40]Cake and latte morph into croissant, latte art changes from heart to flower. [28:43-36:08]5. Interactive editing and parallel decoding -[28:43-30:46]Model enables real-time interactive editing, with focus on improving resolution and text-image interaction. -[31:06-31:56]Research focuses on speeding up neural network processing, aiming for parallel decoding and high confidence token use. -[33:27-34:07]Editing suggests smoothness, but model fails with more than 6-7 of the same item. -[35:26-35:55]The model generates random scenes dominated by backgrounds like mountains and beaches when fed with nonsense text prompts. [36:08-44:32]6. Image editing and ai generation -[36:55-37:36]Editing process involves small back prop steps, may need faster progression -[38:31-38:57]Realistic pose changes are harder than global style changes due to increased token interaction. -[39:36-40:10]Using random seeds, we generate images; 3-4 out of 8-16 are nice. no automated way to match image to prompt. -[41:31-42:22]Data biased towards famous artists, requires fine tuning for new styles. -[44:06-44:32]Training hundreds of millions of images to identify new combinations of concepts, likely not memorization. offered by Coursnap
@labjujube
@labjujube Жыл бұрын
learn learn learn
@dianzhang1015
@dianzhang1015 Жыл бұрын
Hello! Where is the slide of this lecture? There is no link in the website
@Alex-ph6ln
@Alex-ph6ln Жыл бұрын
sweet
@AndyKong51
@AndyKong51 3 ай бұрын
Is MUSE related to Diffusion Transformer? thx
@rahulsingh7508
@rahulsingh7508 Жыл бұрын
I would love to see the MUSE model generating images of X-rays or CT scans for the curable and non-curable brain tumours. And getting the accuracy of those images vetted by at least 1000 brain cancer surgeons.
@sergiogonzalez6597
@sergiogonzalez6597 Жыл бұрын
Beautiful Toledo in Spain at 25:30
@_rd_kocaman
@_rd_kocaman Ай бұрын
looks like shit
@jqnniz
@jqnniz 9 ай бұрын
Where are the slides of this lecture?
@abhishek-tandon
@abhishek-tandon Жыл бұрын
Expected a lecture on Diffusion Models too.
@everyzylrian
@everyzylrian 10 ай бұрын
See lecture 7
@abinaychhetri6103
@abinaychhetri6103 Жыл бұрын
Hlo
@t3ron131
@t3ron131 Жыл бұрын
Hello! Where is the slide of this lecture? There is no link in the website 🥲
@t3ron131
@t3ron131 Жыл бұрын
Hello! Where is the slide of this lecture? There is no link in the website
MIT 6.S191 (2023): The Modern Era of Statistics
53:10
Alexander Amini
Рет қаралды 79 М.
MIT 6.S191 (2023): Reinforcement Learning
57:33
Alexander Amini
Рет қаралды 123 М.
Homemade Professional Spy Trick To Unlock A Phone 🔍
00:55
Crafty Champions
Рет қаралды 36 МЛН
Каха ограбил банк
01:00
К-Media
Рет қаралды 3,2 МЛН
TRY NOT TO LAUGH 😂
00:56
Feinxy
Рет қаралды 14 МЛН
How Stable Diffusion Works (AI Text To Image Explained)
12:11
All Your Tech AI
Рет қаралды 17 М.
MIT Introduction to Deep Learning | 6.S191
1:09:58
Alexander Amini
Рет қаралды 270 М.
MIT 6.S191: Building AI Models in the Wild
54:57
Alexander Amini
Рет қаралды 7 М.
What are Transformer Models and how do they work?
44:26
Serrano.Academy
Рет қаралды 99 М.
2023 MIT Integration Bee - Finals
28:09
MIT Integration Bee
Рет қаралды 1,8 МЛН
MIT 6.S191 (2023): Deep Generative Modeling
59:52
Alexander Amini
Рет қаралды 297 М.
AI Image Generation Algorithms - Breaking The Rules, Gently
9:37
Atomic Shrimp
Рет қаралды 168 М.
Simple Introduction to Large Language Models (LLMs)
25:20
Matthew Berman
Рет қаралды 54 М.
How Stable Diffusion Works (AI Image Generation)
30:21
Gonkee
Рет қаралды 133 М.
Cadiz smart lock official account unlocks the aesthetics of returning home
0:30
ВЫ ЧЕ СДЕЛАЛИ С iOS 18?
22:40
Overtake lab
Рет қаралды 123 М.
Разряженный iPhone может больше Android
0:34
Купил этот ваш VR.
37:21
Ремонтяш
Рет қаралды 248 М.