I appreciate this series, and am grateful for it. Therefore some minor feedback on this lecture: Unlike the others this one feels less like being taught, more like he's presenting his research / work.
@RedTooNotBlue Жыл бұрын
Agreed, definitely needed some high-level explanation of the terminology he was using, guess I'll go ask ChatGPT 😂
@bohaning11 ай бұрын
🎯Course outline for quick navigation: [00:09-05:11]1. Text-to-image generation with muse model -[01:07-02:21]Text enables non-experts to create compelling images, leveraging large-scale data and pre-trained language models. -[04:52-05:22]Muse is faster, generating a 512x512 image in 1.3 seconds compared to 10 seconds for image in prior models and about 4 seconds for stable diffusion. [05:11-17:23]2. Efficient image generation model and token-based super resolution with muse -[05:11-06:02]512x512 image generated in 1.3s, outperforming stable diffusion by 6.7s, with high clip score and fid, indicating fast and high-quality performance. -[06:28-07:01]Muse uses transformer-based architecture for text and image, employing cnns, vector quantization, and gan in the modern deep network toolbox. -[07:21-07:56]Two models: base generates 256x256 images, super resolution upscales to 512x512. t5 xxl model with 5b parameters used. -[10:47-11:12]Variable distribution drops 64% tokens, enhancing network for editing apps and allowing masks of different sizes. -[16:33-16:58]Using classifier-free guidance to generate scenes without specific objects. [17:24-22:48]3. Image generation and model evaluation -[17:24-18:00]Iterative decoding improves image generation with up to 18 steps. -[18:57-19:29]Raters preferred our model 70% of the time over stable diffusion (25%). -[21:21-22:48]Text-to-image model evaluation and challenges in mapping concepts to pixels. [22:48-28:40]4. Evaluating dall-e2 and imagen models, text-guided image editing, and style transfer -[22:48-25:10]Dall-e2 model excels in fid and clip scores, with a runtime of 1.3 seconds for super resolution. -[26:15-27:42]Style transfer and mask-free editing demonstrated with image examples, showcasing the model's ability to make various changes based on text and attention between text and images. -[28:18-28:40]Cake and latte morph into croissant, latte art changes from heart to flower. [28:43-36:08]5. Interactive editing and parallel decoding -[28:43-30:46]Model enables real-time interactive editing, with focus on improving resolution and text-image interaction. -[31:06-31:56]Research focuses on speeding up neural network processing, aiming for parallel decoding and high confidence token use. -[33:27-34:07]Editing suggests smoothness, but model fails with more than 6-7 of the same item. -[35:26-35:55]The model generates random scenes dominated by backgrounds like mountains and beaches when fed with nonsense text prompts. [36:08-44:32]6. Image editing and ai generation -[36:55-37:36]Editing process involves small back prop steps, may need faster progression -[38:31-38:57]Realistic pose changes are harder than global style changes due to increased token interaction. -[39:36-40:10]Using random seeds, we generate images; 3-4 out of 8-16 are nice. no automated way to match image to prompt. -[41:31-42:22]Data biased towards famous artists, requires fine tuning for new styles. -[44:06-44:32]Training hundreds of millions of images to identify new combinations of concepts, likely not memorization. offered by Coursnap
@tomski2671 Жыл бұрын
I'm tired of image generators creating cat images (I have two, don't need to see anymore) What I would really love to see are technically accurate diagrams, drawings and other visualizations. 2D, 3D and later video, and at a user specified level of detail. It seems nobody is doing that. Imagine asking AI to visualize how a particular ML model process data, and seeing a video of it in action.
@chyldstudios Жыл бұрын
I was looking forward to this lecture. It did not disappoint.
@phyrajkumarverma4412 Жыл бұрын
Thanks a lot ❤❤ I love to see your lectures ..
@nataliameira2283 Жыл бұрын
I'ts very interesting! Thank you for that!
@fitsum8402 Жыл бұрын
Thank you Alex
@jennifergo2024 Жыл бұрын
Thanks for sharing!
@AndyKong519 ай бұрын
Is MUSE related to Diffusion Transformer? thx
@Diego0wnz Жыл бұрын
great research!
@mariuspy7 ай бұрын
this is an "intro" class, however it is attended by experts 😅
@qiaomuli2421 Жыл бұрын
Thanks a lot!
@teron-131 Жыл бұрын
Hello! Where is the slide of this lecture? There is no link in the website 🥲
@ZeeshanAli-w2f Жыл бұрын
awesome as usual.
@rahulsingh7508 Жыл бұрын
I would love to see the MUSE model generating images of X-rays or CT scans for the curable and non-curable brain tumours. And getting the accuracy of those images vetted by at least 1000 brain cancer surgeons.
@pavalep Жыл бұрын
Awesome Lecture :)
@teron-131 Жыл бұрын
Hello! Where is the slide of this lecture? There is no link in the website
@sergiogonzalez6597 Жыл бұрын
Beautiful Toledo in Spain at 25:30
@_rd_kocaman7 ай бұрын
looks like shit
@RajabNatshah Жыл бұрын
Thank you :)
@lmat9624 Жыл бұрын
Thanks, amazing lecture
@hussienalsafi1149 Жыл бұрын
👌👌👌👌👌☺️☺️☺️😎😎😎😎
@jqnniz Жыл бұрын
Where are the slides of this lecture?
@labjujube Жыл бұрын
learn learn learn
@Alex-ph6ln Жыл бұрын
sweet
@abhishek-tandon Жыл бұрын
Expected a lecture on Diffusion Models too.
@everyzan-m2q Жыл бұрын
See lecture 7
@abinaychhetri6103 Жыл бұрын
Hlo
@dianzhang1015 Жыл бұрын
Hello! Where is the slide of this lecture? There is no link in the website