AI Art Explained: How AI Generates Images (Stable Diffusion, Midjourney, and DALLE)

  Рет қаралды 36,750

Jay Alammar

Jay Alammar

Күн бұрын

Пікірлер: 47
@TaherART
@TaherART Ай бұрын
Amazing Jalal, best material I've seen so far describing this matter. Keep up the great work old friend😊
@arp_ai
@arp_ai Ай бұрын
Glad to hear it, bro! Hope you're doing excellent!
@omidsajedi5
@omidsajedi5 Жыл бұрын
You have a very unique way of explaining deep learning concepts. The illustrations are very concise and to the point which really helps focus on the core concepts and not get distracted by technical details. Thanks for making this great video!
@herrbonk3635
@herrbonk3635 10 ай бұрын
Good for you, I understood nothing. Some concrete technical detail would have helped me.
@Tjeminee
@Tjeminee 10 ай бұрын
As a visual thinker, SD can be quite overwhelming under the hood. I have been using the graphical interface "Comfyui" and it has taken me quite a distance in understanding the dynamics of SD. Your video and page helped me a lot in taking the next step to the more advanced features and expanding my options. Thanks Jay!
@laostalk
@laostalk 4 ай бұрын
Very practical and useful information. Thanks!
@anupamsaha674
@anupamsaha674 3 ай бұрын
Thank you Sir for sharing ..your explanation is always different ..from transformer architecture i am following you..great
@trajesh81
@trajesh81 Жыл бұрын
Thanks Jay! just like your NLP Transformer series which still stands tall with the test of time.., one more added to the my list of go--to reference.! you are indeed a master in the art of teaching!!
@sanyahyde3959
@sanyahyde3959 Жыл бұрын
Excellent video, thank you!
@maxkhan4485
@maxkhan4485 Жыл бұрын
Thank you! I finally understand Stable Diffusion!
@JohnGilbertmoore
@JohnGilbertmoore Жыл бұрын
It renders the image from text instead of a 3D model. Its like Maya-but with words, and using 1B+’pre-trained models (images with their text descriptions) from the Internet wired up with plain English, so you don’t have to build the models in 3D, you can just type what you want to create using plain English, and the AI renders out the image.
@XishanAfzal
@XishanAfzal 11 ай бұрын
More than useful. Thanks
@karthik8972
@karthik8972 Жыл бұрын
Thanks Jay for the video, the concept of converting noised image to a clear image is understood. How does it creates a image which doesn't exist in its training ? It is understood that the model doesn't understand the concepts of the image and only focuses on the patterns. But how is the below operations performed, 1. Creating a cartoon image of cat based on caption ex: Place a hat on top of cat How does it creates a cartoon image of cat ? How does it know the exact location of cat's head ? How does it know to place the hat exactly at the head ? 2. A closeup shot of a dog facing the sun How does it knows to create a close shot of a dog ? How does it know to place the sun in the background ? How it makes the the object to turn towards the sun ? No videos exist to explain this concept. It would be of great help if you could make a video on this.
@unwind_ai
@unwind_ai Жыл бұрын
Great explaination, loved it!
@paresh1930
@paresh1930 Жыл бұрын
Thank you for this great explanation!
@andrechoi2553
@andrechoi2553 Жыл бұрын
Good video, very inspiring😁
@DrNoureddinSadawi
@DrNoureddinSadawi Жыл бұрын
Nice explanation, thanks!
@d.p.5874
@d.p.5874 Жыл бұрын
Thanks Jay for all your efforts to share a bit of your knowledge in AI. I am not an expert, by far, but I came to the conclusion that AI is mainly a construction of hundreds of lego bricks, assembled together into specific architectures and trained with the same gradient back propagation algorithm. Some of them perform well some other don't. Therefore, the only genuine piece of AI theory is the mathematical background of the training algorithm. The rest is pure heuristics more or less well explained, a kind of AI cook books with ad hoc recipees. The training algorithm itself seems very limited (even if highly powerful), since it is applied in a centralized way onto a predefined architecture and does not participate to the architecture topology definition. In other words, the topology is defined before the training while, intituively, the training should probably define the topology. Therefore incremental learning remains a big issue in most of the AI architectures if not all. This lack of a consistent and unified AI theory (there is no, to my limited knowledge, any AI theorems nor demonstrations that some sort of optimum is reached using a given architecture) makes me believe that we are at the very beginning of a new science still to come. Could you react to the above humble considerations and share your thoughts ? Kind regards,
@nqnam12345
@nqnam12345 Жыл бұрын
great Jay!
@rachidbensaid6629
@rachidbensaid6629 Жыл бұрын
Great Work, Good luck
@muhammedaneesk.a4848
@muhammedaneesk.a4848 Жыл бұрын
Thanks for the explanation. Can you please make a 1 hr or 2hr video with more deep dive into the internal? Maybe you already have it recorded I guess. Thanks.
@UnderstandingCode
@UnderstandingCode Жыл бұрын
love from Saudi arabia!
@daveonvr2192
@daveonvr2192 Жыл бұрын
Thanks Jay - I had been looking for something that does more than describe the denoising process and the attention bit related to prompts is what I was missing. That said, I still can't quite understand how you get a completely new image. I can understand that you should be able to get back to an original image (say a dog, or a flower) via the noisification and reverse process, but how can it, say, create an image with a flower and the dog such they are integrated in some way? Where does that data that come from? A visual example of the earlier stages which show this would be helpful. The examples you had jumped from basically to an image (albeit unrefined) in 3 steps - I'd like to see this broken down so I can "see" what is happening. Still requires a level of acceptance without evidence that I am not happy with....
@justaguy2365
@justaguy2365 7 ай бұрын
Oppose to the end!
@mostlynotworking4112
@mostlynotworking4112 Жыл бұрын
Simple question: does that mean it can't create a prompt (or specific word) that it hasn't been trained on? Thank you for your video!
@jamiewatts333
@jamiewatts333 Жыл бұрын
Is this simplified explanation of the process of noise in Stable Diffusion true? It's like teaching an artist about our visual world -- object definitions, shapes, dimensions, etc., and how they correspond to the person who commissioned the art (text prompts). The artist then watches a mosaic - say of an ice cream - being inserted by hundreds of tesserae (rectangular slabs used to create a mosaic) and then removed to restore the original mosaic. During this, the artist learns how to understand, recreate, and reinterpret the ‘ice cream’ image in other mosaics. The artist goes through this with millions of other depictions in mosaics (objects, locations, etc.) so they can create entirely new mosaics based on the requests (or text prompts) of the person commissioning them. Sampling steps are like commissioning an artist to interpret and construct a mosaic quickly or carefully. The more detail or accuracy you want, the more work and time have to go into it.
@RodrigoRibeiroGomes
@RodrigoRibeiroGomes 6 ай бұрын
Excelente!!!
@itsnotthattough7588
@itsnotthattough7588 Жыл бұрын
Thanks, sir!
@adeelgilll
@adeelgilll Жыл бұрын
excellent
@方小兰
@方小兰 8 ай бұрын
thank you
@anilsharma32g
@anilsharma32g Жыл бұрын
Dear Sir, I am your Subscriber I want to create a tool that finds text errors in the image. For Example: if I forgot to write CONTACT US, BUY NOW, CONTACT NUMBER, SPELLING MISTAKE, etc... in my social media post. that the tool finds error and suggests what are missing or what is incorrect in social media post. 🙏 Please guide me and suggest what course I need to buy or what I need to learn to create this tool Thank you!
@treksis
@treksis Жыл бұрын
👆just like the transformers series, excellent
@10FACTSABOUTGAMES
@10FACTSABOUTGAMES Жыл бұрын
Would you kindly tell me if it is possible to sell the artwork that I made with stable diffusion , and does the administration allow this, and how can I communicate with them i mban the mangemment or soppert for this program-, and where can the pictures be sold as pieces of art? I do not speak English, help me
@CptBlaueWolke
@CptBlaueWolke Жыл бұрын
*AI Pictures. Art means craftsmanship and personal expression
@nerdfinite
@nerdfinite Жыл бұрын
Not all photographs are art, but photography can be an art. The nonsense I draw in a game of Pictionary has no craftsmanship or personal expression. However, illustration is an important form of art. Not only would the AI never produce art on its own, it would never produce anything. The amount of craftsmanship and personal expression being put into the image is dependent on the person using it. A low effort random prompt to the AI is arguably not art, but that's not really the point.
@youtuberaphaell
@youtuberaphaell Жыл бұрын
Writing the prompts is personal expression
@avistryfe4534
@avistryfe4534 Жыл бұрын
@@youtuberaphaell nope. It aint shit. Even with a shortcut. You will still have zero talent or expression. Anyone can say those words. So you have the same skill and expressive power as a toddler. Enjoy. Pretend with your orgy of robots all you like. But you are not special.
@mingkko1
@mingkko1 Жыл бұрын
@@youtuberaphaell so is ordering food at a restaurant but that does not make you a chef.😉
@CptBlaueWolke
@CptBlaueWolke Жыл бұрын
@@youtuberaphaell no it isn't Writing a full text by yourself is.
@nerdfinite
@nerdfinite Жыл бұрын
Not all photographs are art, but photography can be an art. The nonsense I draw in a game of Pictionary has no craftsmanship or personal expression. However, illustration is an important form of art. Not only would the AI never produce art on its own, it would never produce anything. The amount of craftsmanship and personal expression being put into the image is dependent on the person using it. A low effort random prompt to the AI is arguably not art, but that's not really the point.
@simawpalmer7721
@simawpalmer7721 Жыл бұрын
Thanks, great video again, but Your voice has a lot of sibilants, making the listening experience is atrocious. If you make enough money making these videos, I suggest hiring a professional audio producer/mixing guy to clean up the audio. Email me, I'll suggest someone.
@anneallison6402
@anneallison6402 Жыл бұрын
This is not art don't be silly
@mpavankumar6695
@mpavankumar6695 Жыл бұрын
No, this is revolution
@pierrelebreton7634
@pierrelebreton7634 Жыл бұрын
Thank you, really nicely explained!
Why Does Diffusion Work Better than Auto-Regression?
20:18
Algorithmic Simplicity
Рет қаралды 368 М.
How AI 'Understands' Images (CLIP) - Computerphile
18:05
Computerphile
Рет қаралды 211 М.
Amazing remote control#devil  #lilith #funny #shorts
00:30
Devil Lilith
Рет қаралды 14 МЛН
Каха и лужа  #непосредственнокаха
00:15
You don't understand AI until you watch this
37:22
AI Search
Рет қаралды 798 М.
Why You're Prompting Wrong, Do This (Per Leonardo AI)
18:31
metricsmule
Рет қаралды 56 М.
The Narrated Transformer Language Model
29:30
Jay Alammar
Рет қаралды 312 М.
How Stable Diffusion Works (AI Image Generation)
30:21
Gonkee
Рет қаралды 155 М.
AI art, explained
13:33
Vox
Рет қаралды 2,4 МЛН
Diffusion Models | Paper Explanation | Math Explained
33:27
Outlier
Рет қаралды 258 М.
Stable Diffusion - How to build amazing images with AI
44:59
Serrano.Academy
Рет қаралды 19 М.
Large Language Models (LLMs) - Everything You NEED To Know
25:20
Matthew Berman
Рет қаралды 117 М.