MIT 6.S191 (2023): Text-to-Image Generation

  Рет қаралды 48,094

Alexander Amini

Alexander Amini

Күн бұрын

Пікірлер: 30
@MrMMF94
@MrMMF94 Жыл бұрын
I appreciate this series, and am grateful for it. Therefore some minor feedback on this lecture: Unlike the others this one feels less like being taught, more like he's presenting his research / work.
@RedTooNotBlue
@RedTooNotBlue 10 ай бұрын
Agreed, definitely needed some high-level explanation of the terminology he was using, guess I'll go ask ChatGPT 😂
@tomski2671
@tomski2671 Жыл бұрын
I'm tired of image generators creating cat images (I have two, don't need to see anymore) What I would really love to see are technically accurate diagrams, drawings and other visualizations. 2D, 3D and later video, and at a user specified level of detail. It seems nobody is doing that. Imagine asking AI to visualize how a particular ML model process data, and seeing a video of it in action.
@bohaning
@bohaning 9 ай бұрын
🎯Course outline for quick navigation: [00:09-05:11]1. Text-to-image generation with muse model -[01:07-02:21]Text enables non-experts to create compelling images, leveraging large-scale data and pre-trained language models. -[04:52-05:22]Muse is faster, generating a 512x512 image in 1.3 seconds compared to 10 seconds for image in prior models and about 4 seconds for stable diffusion. [05:11-17:23]2. Efficient image generation model and token-based super resolution with muse -[05:11-06:02]512x512 image generated in 1.3s, outperforming stable diffusion by 6.7s, with high clip score and fid, indicating fast and high-quality performance. -[06:28-07:01]Muse uses transformer-based architecture for text and image, employing cnns, vector quantization, and gan in the modern deep network toolbox. -[07:21-07:56]Two models: base generates 256x256 images, super resolution upscales to 512x512. t5 xxl model with 5b parameters used. -[10:47-11:12]Variable distribution drops 64% tokens, enhancing network for editing apps and allowing masks of different sizes. -[16:33-16:58]Using classifier-free guidance to generate scenes without specific objects. [17:24-22:48]3. Image generation and model evaluation -[17:24-18:00]Iterative decoding improves image generation with up to 18 steps. -[18:57-19:29]Raters preferred our model 70% of the time over stable diffusion (25%). -[21:21-22:48]Text-to-image model evaluation and challenges in mapping concepts to pixels. [22:48-28:40]4. Evaluating dall-e2 and imagen models, text-guided image editing, and style transfer -[22:48-25:10]Dall-e2 model excels in fid and clip scores, with a runtime of 1.3 seconds for super resolution. -[26:15-27:42]Style transfer and mask-free editing demonstrated with image examples, showcasing the model's ability to make various changes based on text and attention between text and images. -[28:18-28:40]Cake and latte morph into croissant, latte art changes from heart to flower. [28:43-36:08]5. Interactive editing and parallel decoding -[28:43-30:46]Model enables real-time interactive editing, with focus on improving resolution and text-image interaction. -[31:06-31:56]Research focuses on speeding up neural network processing, aiming for parallel decoding and high confidence token use. -[33:27-34:07]Editing suggests smoothness, but model fails with more than 6-7 of the same item. -[35:26-35:55]The model generates random scenes dominated by backgrounds like mountains and beaches when fed with nonsense text prompts. [36:08-44:32]6. Image editing and ai generation -[36:55-37:36]Editing process involves small back prop steps, may need faster progression -[38:31-38:57]Realistic pose changes are harder than global style changes due to increased token interaction. -[39:36-40:10]Using random seeds, we generate images; 3-4 out of 8-16 are nice. no automated way to match image to prompt. -[41:31-42:22]Data biased towards famous artists, requires fine tuning for new styles. -[44:06-44:32]Training hundreds of millions of images to identify new combinations of concepts, likely not memorization. offered by Coursnap
@chyldstudios
@chyldstudios Жыл бұрын
I was looking forward to this lecture. It did not disappoint.
@mariuspy
@mariuspy 6 ай бұрын
this is an "intro" class, however it is attended by experts 😅
@phyrajkumarverma4412
@phyrajkumarverma4412 Жыл бұрын
Thanks a lot ❤❤ I love to see your lectures ..
@nataliameira2283
@nataliameira2283 Жыл бұрын
I'ts very interesting! Thank you for that!
@teron-131
@teron-131 Жыл бұрын
Hello! Where is the slide of this lecture? There is no link in the website 🥲
@fitsum8402
@fitsum8402 Жыл бұрын
Thank you Alex
@jennifergo2024
@jennifergo2024 11 ай бұрын
Thanks for sharing!
@Diego0wnz
@Diego0wnz Жыл бұрын
great research!
@rahulsingh7508
@rahulsingh7508 Жыл бұрын
I would love to see the MUSE model generating images of X-rays or CT scans for the curable and non-curable brain tumours. And getting the accuracy of those images vetted by at least 1000 brain cancer surgeons.
@lmat9624
@lmat9624 Жыл бұрын
Thanks, amazing lecture
@hussienalsafi1149
@hussienalsafi1149 Жыл бұрын
👌👌👌👌👌☺️☺️☺️😎😎😎😎
@pavalep
@pavalep Жыл бұрын
Awesome Lecture :)
@qiaomuli2421
@qiaomuli2421 Жыл бұрын
Thanks a lot!
@ZeeshanAli-w2f
@ZeeshanAli-w2f Жыл бұрын
awesome as usual.
@AndyKong51
@AndyKong51 8 ай бұрын
Is MUSE related to Diffusion Transformer? thx
@teron-131
@teron-131 Жыл бұрын
Hello! Where is the slide of this lecture? There is no link in the website
@jqnniz
@jqnniz Жыл бұрын
Where are the slides of this lecture?
@RajabNatshah
@RajabNatshah Жыл бұрын
Thank you :)
@sergiogonzalez6597
@sergiogonzalez6597 Жыл бұрын
Beautiful Toledo in Spain at 25:30
@_rd_kocaman
@_rd_kocaman 6 ай бұрын
looks like shit
@labjujube
@labjujube Жыл бұрын
learn learn learn
@Alex-ph6ln
@Alex-ph6ln Жыл бұрын
sweet
@abhishek-tandon
@abhishek-tandon Жыл бұрын
Expected a lecture on Diffusion Models too.
@everyzan-m2q
@everyzan-m2q Жыл бұрын
See lecture 7
@abinaychhetri6103
@abinaychhetri6103 Жыл бұрын
Hlo
@dianzhang1015
@dianzhang1015 Жыл бұрын
Hello! Where is the slide of this lecture? There is no link in the website
MIT 6.S191 (2023): The Modern Era of Statistics
53:10
Alexander Amini
Рет қаралды 82 М.
Text to Image Diffusion AI Model from scratch - Explained one line of code at a time!
24:58
The Ultimate Sausage Prank! Watch Their Reactions 😂🌭 #Unexpected
00:17
La La Life Shorts
Рет қаралды 8 МЛН
Perfect Pitch Challenge? Easy! 🎤😎| Free Fire Official
00:13
Garena Free Fire Global
Рет қаралды 100 МЛН
Ice Cream or Surprise Trip Around the World?
00:31
Hungry FAM
Рет қаралды 21 МЛН
За кого болели?😂
00:18
МЯТНАЯ ФАНТА
Рет қаралды 3 МЛН
MIT 6.S191 (2023): Deep Learning New Frontiers
1:08:47
Alexander Amini
Рет қаралды 85 М.
Attention in transformers, visually explained | DL6
26:10
3Blue1Brown
Рет қаралды 1,8 МЛН
MIT's AI Discovers New Science - "Intelligence Explosion"
11:11
Matthew Berman
Рет қаралды 135 М.
MIT 6.S191: Building AI Models in the Wild
54:57
Alexander Amini
Рет қаралды 23 М.
MIT Introduction to Deep Learning | 6.S191
1:09:58
Alexander Amini
Рет қаралды 736 М.
How AI 'Understands' Images (CLIP) - Computerphile
18:05
Computerphile
Рет қаралды 214 М.
Large Language Models explained briefly
8:48
3Blue1Brown
Рет қаралды 449 М.
MIT 6.S191 (2023): The Future of Robot Learning
1:02:42
Alexander Amini
Рет қаралды 48 М.
How Stable Diffusion Works (AI Image Generation)
30:21
Gonkee
Рет қаралды 157 М.
Introduction to Poker Theory
30:49
MIT OpenCourseWare
Рет қаралды 1,4 МЛН
The Ultimate Sausage Prank! Watch Their Reactions 😂🌭 #Unexpected
00:17
La La Life Shorts
Рет қаралды 8 МЛН