I'm sorry but , "unlock your face with your phone" just cracked me up..
@deadfr0g2 жыл бұрын
This is inadvertently an excellent poetic description of someone using the selfie camera to apply makeup.
@zwenkwiel8162 жыл бұрын
Unlock your phace with your fone
@afog2 жыл бұрын
I think he was referring to using the Energizer Power Max P18K whilst in bed... :)
@davidm2.johnston6842 жыл бұрын
Hahahaha didn't even notice!
@absalomdraconis2 жыл бұрын
I am reminded of an odd commercial from a few years ago: "apply directly to the forehead".
@BernardJollans Жыл бұрын
If anyone is stuck with the code. The "i" should be a "t" in this line in the loop: ``` latents = scheduler.step(noise_pred, i, latents)["prev_sample"] ```
@alenmathew8115 Жыл бұрын
Did you get the code working?. for me it's showing "unsupported operand type(s) for /: 'DecoderOutput' and 'int'" in line 59
@Phobos221B Жыл бұрын
@@alenmathew8115 in the last few lines, change this line image = (image / 2 + 0.5).clamp(0, 1) to this image = (image.sample / 2 + 0.5).clamp(0, 1)
@peepdawg8995 Жыл бұрын
man this helped me. thanks bro :)
@mayurpatil9871 Жыл бұрын
Thanks man because of you I solved this error
@romainflorentz5771 Жыл бұрын
Also in the Image Loop section, this needs to be moved inside the for loop : ``` # Prep Scheduler scheduler.set_timesteps(num_inference_steps) ```
@DampeS8N2 жыл бұрын
I've been using Stable Diffusion to _deCGI_ images. Take a screenshot from a game, run it through SD with a low noise rate, give it a detailed description of everything in the picture and it produces pretty solid photo recreations of the images. Also, often, it gets possessed by Eldritch gods and spews out monsters.
@zwenkwiel8162 жыл бұрын
So win-win, right?
@MattRose300002 жыл бұрын
now do it in real time with DLSS and you've got something huge
@DampeS8N2 жыл бұрын
@@MattRose30000 This is a long way off. It isn't just that it currently takes my 3090 Ti about 5 minutes to do one frame at 1024x1024 but also it can't be playing a game at the same time and also-also it would be very disorienting because each frame will be a _different_ photo that isn't consistent from frame to frame but probably the worst part is that _you need to write a text prompt that reflects what is in the scene for each frame somehow._
@FayezButts2 жыл бұрын
@@DampeS8N that’s great. Have you messed around with reusing seeds across different frames? I imagine if you get an output you like you’d want to reuse that seed
@dibbidydoo43182 жыл бұрын
@@DampeS8N making text to video is the easy part, making video to text is the hard part.
@morphman862 жыл бұрын
Mike asked himself what the use case for mixing two prompts is. I used this only yesterday, to produce a photorealistic painting of an owlbear from DnD... So it has practical uses!
@MushookieMan2 жыл бұрын
Maybe google is planning to create new, even more impossible captchas. "Select all the cat-dogs in the picture"
@dembro272 жыл бұрын
Does it hoot or roar??
@IceMetalPunk2 жыл бұрын
@@dembro27 It hoots and growls, in fact, here at Aguefort's Adventuring Academy!
@euchale2 жыл бұрын
Its how I make my fish people too for tabletop. Tons of applications for DnD
@morphman862 жыл бұрын
@@euchale You get half-decent tieflings if you ask for a quarter human, a half lizard and the last quarter goat.
@Yupppi2 жыл бұрын
I really liked the stable diffusion that came with the webui that you could install on your own computer, to avoid quotas or subscription costs, and it provided easy to use UI as well. With inpaint feature inside the UI as well. Shoutouts to people who make those applications from the rough code for regular people to use.
@byteborg2 жыл бұрын
I love it how you simplify and explain this heap of complexity that is in generative models like this. You gave me the impulse to play around with it, inspite of being pretty complicated code due to the depth of the abstraction. It's a lot of fun to fantasize about something and have the model come up with a visual representation.
@IceMetalPunk2 жыл бұрын
The very concept of embeddings is amazing to me. It's literally "organize concepts themselves into points in space, where similar things are closer together, in many many dimensions; now you can do arithmetic on *the meanings of words, phrases, and sentences.* " Want to add the meaning of "horse" and the meaning of "male"? Well, just add these vectors together and the resulting coordinates will point right at "stallion"! They amaze me so much that, when I watched Everything, Everywhere, All At Once for the first time, I completely geeked out when I realized their description of the organization of the multiverse is effectively a well-embedded latent space 😅
@floydmaseda2 жыл бұрын
@@mrteco4236 It literally is and is done all the time.
@IceMetalPunk2 жыл бұрын
@@mrteco4236 It's... common, in fact. There's a whole video on this channel about embeddings. And it's how CLIP fundamentally works...
@TheColorman2 жыл бұрын
This is super fascinating, especially as someone studying Data Science just learning about vector spaces and their many uses!
@alexanderkirilov78202 жыл бұрын
@@mrteco4236 lol
@Emperorhirohito192722 жыл бұрын
@@mrteco4236 that is literally what it does bro
@lucamatteobarbieri24932 жыл бұрын
I like how your channel has adapted to the advent of the machine learning boom we are experiencing
@jeffwads2 жыл бұрын
SD is just outstanding. It can mimic the other projects and the 1.4/1.5 models will be public domain. You can't beat that.
@zwenkwiel8162 жыл бұрын
Lol just add "dall-e 2" to your prompts XD
@paryska9912 жыл бұрын
1.5 model just went public today i think
@StefanReich2 жыл бұрын
@@paryska991 Ye
@dgo44902 жыл бұрын
You can beat that with human creativity that doesn't require billions of calculations per second to brute force a synthetic result.
@zwenkwiel8162 жыл бұрын
@@dgo4490 doesn't it though?
@simplesimon45612 жыл бұрын
I would like to see a version of the code where it shows the result of each step, so you can see the noise getting reduced with each iteration
@JalexRosa2 жыл бұрын
me too!!
@gianluca.g2 жыл бұрын
I think I'm going to do it. I'm downloading the source code and save a png for each step
@AlphaNovaOfficial2 жыл бұрын
Not necessarily what you're after, but if you "interrupt" a run, you can see what it's current progress was. Depending on your steps and how early you catch it, I've seen some very interesting early "noisy" images that were themselves inspiration for other images!
@ReneArmenta192 жыл бұрын
There is already a script for that
@m0nkeyb0i6662 жыл бұрын
If you run automatic1111 there’s a setting for that, uses slightly more vram, but it’s great to watch it work
@paultapping95102 жыл бұрын
"there are questions of ethics, there are questions on how it's trained. Let's leave those for another time" well, if that doesn't just sum up the tech industry.
@monad_tcp2 жыл бұрын
what ethics ? its just a tool, and its highly dependent on human input.
@paultapping95102 жыл бұрын
@Luiz remember the AI chatbot that became incurably racist because it was trained on data scraped from 4chan amongst other places? That sort of thing.
@purplewine73622 жыл бұрын
that sums up every industry. you think people didn't copy art before ai? it's just a tool
@paultapping95102 жыл бұрын
@@purplewine7362 lol, not even close to the point I was making. Never mind.
@purplewine73622 жыл бұрын
@@paultapping9510 you weren't trying to make any point, otherwise you would have clarified. You were just trying to sound smart. Also, liking your own comments is pathetic.
@YSPACElabs2 жыл бұрын
I've been playing with Stable Diffusion (specifically the "InvokeAI" fork because I don't have 10gb VRAM), and I've found out that spamming the end with keywords like "realistic, 4k, trending on artstation, 8k, photorealistic, hyperrealistic" has more effect on how good the output image is than I thought.
@ShankarSivarajan2 жыл бұрын
You should try negative prompts.
@nicoliedolpot72132 жыл бұрын
to add, try emphasis "((x))" for specific objects. Edit: you can also use x(y), y being the weight value for that tag.
@christopherg23472 жыл бұрын
"Simple, you just chip away all the stone that doesn't look like David."
@housellama2 жыл бұрын
"I saw the angel in the marble and carved until I set him free" - Michalangelo
@thomasnicolet95612 жыл бұрын
The current version of the reference notebook is already deprecated due to Hugging Face's API changes :) You try to operate on "image", which is now a DecoderOutput class: image = (image/ 2 + 0.5).clamp(0, 1) It is fixed by unpacking its tensor attribute with its sample method: image = (image.sample / 2 + 0.5).clamp(0, 1)
@Dancedfsk82 жыл бұрын
The rest of the notebook is hard to fix, I tried but in vain. I think I'll wait for Mike's update.
@victorwesterlund48262 жыл бұрын
Same goes for pil_to_latent(): AutoencoderKL.encode() returns a AutoencoderKLOutput class: return 0.18215 * latent.mode() The desired DiagonalGaussianDistribution class is now a property ("latent_dist") of this new class: return 0.18215 * latent.latent_dist.mode()
@Dancedfsk82 жыл бұрын
in img2img, I just extract the code of add_noise and used int instead of floatTesnsor. Change add_noise function to the following. also notice the for loop now loop 51 times. Not sure if this is correct, but at least it works. # View a noised version noise = torch.randn_like(encoded) # Random noise for i in tqdm(range(51)): scheduler.sigmas = scheduler.sigmas.to(device=encoded.device, dtype=encoded.dtype) scheduler.timesteps = scheduler.timesteps.to(encoded.device) sigma = scheduler.sigmas[i].flatten() while len(sigma.shape) < len(encoded.shape): sigma = sigma.unsqueeze(-1) noisy_samples = encoded + noise * sigma img = latents_to_pil(noisy_samples)[0]
@aaron68072 жыл бұрын
@@victorwesterlund4826 What is the 0.18215 for? I keep seeing it in the code but I can't find an explanation for what is does or how it's derived
@jenka19802 жыл бұрын
Love Mikes explanations, somehow he manages explain so complicated stuff in so simple and understandable way. It will be interesting to know Mikes opinion om Midjourney as it's seems like the winner for now among the picture creation AIs.
@angeleeh Жыл бұрын
Mike is a legend, truly great videos with him
@dakotaknutson2 жыл бұрын
For anyone trying to get the notebook to work and is getting this error: "TypeError: unsupported operand type(s) for /: 'DecoderOutput' and 'int'" change "image = (image / 2 + 0.5).clamp(0, 1)" to "image = (image.sample / 2 + 0.5).clamp(0, 1)". As noted at the top of the notebook it seems the huggin API has changed.
@hipposhark2 жыл бұрын
wow thank you very much can confirm that this indeed solves it👍
@koh86142 жыл бұрын
In my case it outputs a Hugging Face Tokens page warning? It says that I need a token? Is it free?
@hipposhark2 жыл бұрын
@@koh8614 yes it is free. you need to create an account on the hugging face website and generate a token from your profile.
@JavadZahiri2 жыл бұрын
Thank you
@serta57272 жыл бұрын
Mikes explanations Aretha best ❤
@JavierSalcedoC2 жыл бұрын
Franklin true
@tacklemcclean2 жыл бұрын
@@JavierSalcedoC *slow clap*
@CyberMuzHR2 жыл бұрын
Great video! Can anyone recommend any other videos that explain the text encoding and the whole clipping process used to guide the image generation based on input prompt?
@aiartbx2 жыл бұрын
Hi Mike. This is the by far the most technically clear explanation of SD that I have seen so thank you for this! Now as you would be aware by now, the art community is up in arms against this tech and I would love to hear your opinion based on the factual knowledge you have. The main issue that keeps coming up is that SD tech is art theft because it steals copyrighted artwork then companies profit using the images. Another point artists are making is that SD is just a mish mash collage of original art so nothing generated by Ai is brand new. Would you agree or disagree with these points and why strictly based on from your technical knowledge.
@theemathas2 жыл бұрын
I doubt DALL-E 2 is the “biggest” image generator. Stable Diffusion is probably bigger. In my circle, the biggest one is NovelAI, which is a Stable Diffusion variant specialized in anime-style images. Notably, its training data is probably the best image dataset out there in terms of detailed labels. It’s already been causing a lot of drama in the community. One notable case involved someone feeding a WIP drawing to img2img, posting it, claiming it as their own drawing. When the actual artist posts their finished image, this person then proceeds to accuse the artist of copying “their” art.
@dibbidydoo43182 жыл бұрын
Imagen by Google and NUWA-infinity by Microsoft are probably superior.
@felixjohnson38742 жыл бұрын
Would your "circle" happen to fit after rule 33 and before rule 35?
@nicoliedolpot72132 жыл бұрын
The danbooru property labeling format, to be exact. Training is rather easy as the images in the booru databases are human-labeled.
@aorusaki Жыл бұрын
This video finally explained the code to me in a simple way! Now im less confused!!! Amazing extra documentation from you guys
@heurve2 жыл бұрын
On line 56, the image is coming from the sample property of the DecoderOutput, change to 55: with torch.no_grad(): 56: image = vae.decode(latents).sample
@HerleifJarle Жыл бұрын
Thanks for the explanations of how AIs are being trained. I can see a slight hint of a neural network here. I think the advantage now is that companies like Bluewillow is utilizing discord to quickly gain testers free of charge even.
@_inetuser2 жыл бұрын
this is so interesting and has so many unexplored use cases
@serta57272 жыл бұрын
So amazing ❤ I love stable diffusion Playing around the few last weeks
@DeKubus2 жыл бұрын
Immediately recognized the book on Dr. Ponds desk - Prof. Paar was one of my teachers when I studied IT sec. Nice to see it outside of Germany too!
@Mutual_Information2 жыл бұрын
Anyone else surprised that diffusion models are the clear winners for image generation? And GANs have almost completely fallen from favor? I haven’t seen them in any recent SOTA work..
@timmyt12932 жыл бұрын
Mmm isnt it still kinda a GAN? Stable diffusion uses a transformer block not just for the diffusion but for identifying what the actual image is from the diffusion output too. So isn't that technically a GAN? Generate images from the diffusion model, then try to categorize them through an adversarial transformer network?
@erikp73782 жыл бұрын
@@timmyt1293 Actually there is no adversarial training in diffusion models in general (in particular for stable diffusion model). The condition processing is used only for guidance (free classifier guidance in this case) and from a theoretical perspective the diffusions models are closer to hierarchical variational autoencoders where the encoders are fixed diffusion steps and decoders are denoising steps with the trained noise estimation model.
@JadeNeoma2 жыл бұрын
@@erikp7378 I wonder if you could impliment stable diffusion inside a GAN. So have the generator define the parameters for the stable diffusion based on an input and then give that to the classifier mixed in with non ai generated images
@dibbidydoo43182 жыл бұрын
@@JadeNeoma I don't know how that would work.
@erikp73782 жыл бұрын
@@JadeNeoma its depends on which parameters you have in mind but the main point is that the operations must remain differentiable in order to optimize the model. And in the case of hyper parameters inference it is not trivial in many cases (e.g. the number of steps)
@OliverHempel-r7p10 ай бұрын
great video. today SORA was launched, nad youvideos help to understand whats going on the background. many thanks!
@martinoandreascarpolini5128 Жыл бұрын
[notebook error] Hello, Thanks for the fantastic video. I noticed that as of today the notebook does not run since there are some errors. I do not why, probably some library changed a bit.The first error is at line 50 of the cell with the first inference loop. Instead of 'i' there should be 't'. The second error appears at line 59. Now to access the image's tensor you have to write 'image["sample"]' instead of just 'image'.
@martinoandreascarpolini5128 Жыл бұрын
same thing for the other inference loops
@enochsit Жыл бұрын
Thanks! this should be pinned
@Tymon00002 жыл бұрын
I generated thousands of images with stable diffusion. It's really fun and inpiring.
@3dlabs992 жыл бұрын
We need an entire "Frogs on stilts" channel.
@jaymalby2 жыл бұрын
Well, xkcd did pick the number 4 by die roll. Seems a random enough seed to me.
@reinei12 жыл бұрын
I had to scroll far too much to see this mentioned, but yes I agree 4 seemed quite a good random seed there...
@vanderkarl39272 жыл бұрын
Seeing that GPT-2 vid reminded me: we haven't had Robert Miles on in a fair while. Is he just too busy?
@andybaldman2 жыл бұрын
I love his content.
@vorlon4782 жыл бұрын
13:47 reminds me of the wave function collapse algorithm.
@peekpen Жыл бұрын
I'll copy your transcript and feed it to open.ai's playground and ask him to re-interpret your addresss for images but for my own audio interpolation in music. Brilliant.
@RelaxingSerbian7 ай бұрын
The notebook can still work with a few minor tweaks: The text prompt should be multiplied by the batch size; The scheduler step takes in "t" instead of "i", and now it prefers scaling via scheduler.scale_model_input(latent_model_input, t) rather than with explicit sigma. Also, torch.autocast did not work on my local machine for some reason. Anyway, thanks a lot for the code.
@johnnyw525 Жыл бұрын
I didn't realise that this is basically the next evolution of the "AI Upscaling" technology that has been used to in videogame mods: Take an image and then add detail until it looks like what I think it's supposed to. It's still mind-bending how it results in what it does, but AI Upscaling wasn't so scary, so I suppose this feels a bit less scary now.
@nocturne63202 жыл бұрын
Could you do a video about the different samplers? (eg. DDIM, Euler, Euler a, etc.) That part of the process is still a mystery for me
@havz0r2 жыл бұрын
Ddim, euler, lms, heun and dpm all produce identical results. The ones with "a" at the end (euler a, dpm2 a) are ancestral samplers and produce different results
@nocturne63202 жыл бұрын
@@havz0r I ment how they work under the hood. They've already explained how the network generates images from noise, but not how the different samplers work
@miltiadiskoutsokeras91892 жыл бұрын
I don't know if this is more amazing or more frightening. Brilliant stuff.
@andybaldman2 жыл бұрын
If you aren’t frightened, you aren’t paying attention.
@purplewine73622 жыл бұрын
@@andybaldman if you're frightened, you're a luddite
@andybaldman2 жыл бұрын
@@purplewine7362 Or you've worked in the tech field long enough to know how dangerous this is, and how it will be used against people eventually. As happens with all tech.
@lolerskates876 Жыл бұрын
Thank you for trying to fix the code after the API update broke it
@6DAMMK92 жыл бұрын
Thank you for the SCIENTIFIC video! It got outta control after the "novelaileak", which it is very important to leave some information as realistic as it can. I'm quite sad about the sub-culture but I still have hope on the artist / researcher to snap out from the chaos.
@dibbidydoo43182 жыл бұрын
what sub-culture?
@PunmasterSTP2 жыл бұрын
Stable Diffusion in code? More like “Super great explanation that’s solid gold!” 👍
@RaydenLGX2 жыл бұрын
So it is basically a morphing, blending and upscaling algorhythm of compressed/encoded data?
@MoritzvonSchweinitz2 жыл бұрын
You have an error at 7:23 . I was tought that when a compsi chooses a random number, it is 42.
@Kleyguerth2 жыл бұрын
xkcd 221. Random number is 4.
@bloody_albatross Жыл бұрын
If you mention another video please also link it in the description!
@alikaperdue Жыл бұрын
@14:47 - idea: hand draw your animation sequence.Give the first to image and text to AI and get the result. Then hand the resulting image, your next hand drawn frame and the text to generate the 2nd frame. Continue the process so that each new frame is a combination of the last and what you want it to look like combined. In this way the "flicker" might be reduced. But I haven't seen what you're talking about. I may be off.
@jytou9 ай бұрын
Excellent explanations, as always! Thanks!
@maltimoto Жыл бұрын
I don't understand at all how the result of this reconstruction process (remove noise) is stored. Sounds a bit like witchcraft to me. Remove some noise, here we go. I mean in which form is the noise reduction saved? In a database? Does it save pixels or what exactly?
@slimjimbigfoot5892 жыл бұрын
Amazing so stable diffusion helps un clutter all that extra pixel during the process of facial recognition.
@nkronert2 жыл бұрын
This is literally the first episode of Computerphile ever that I didn't understand anything of what was explained. And judging from the comments I'm the only one. Looks like I totally missed the boat on this topic.
@dibbidydoo43182 жыл бұрын
what was confusing?
@nkronert2 жыл бұрын
@@dibbidydoo4318 it wasn't actually confusing because there wasn't anything to confuse. I had literally never heard of these developments before.
@zwe1l1nkehaende2 жыл бұрын
@@nkronert this is the followup video on the topic, check out the first one, where the whole thing is explained.
@nkronert2 жыл бұрын
@@zwe1l1nkehaende thanks. I already found it. But I still don't really get it 😊 Doing some "best fit" on noise until a photorealistic image comes out still sounds like magic to me.
@TaranovskiAlex2 жыл бұрын
Awesome explanation, thank you!
@cyndicorinne Жыл бұрын
12:34 beautiful cityscapes 🏙️
@acobster Жыл бұрын
> There are questions about ethics. There are questions about how these were trained. Maybe we deal with them another time. I really hope there is a discussion of this at some point. As a discipline that skews very white/male and enjoys relatively posh working conditions, it's very easy to insulate ourselves from the very real problems of the world. And because computers are so powerful it's also simple to automate oppression of many kinds, helping it continue to run smoothly. I think we have a responsibility to talk about these issues and I would love to see this channel model that in a constructive way.
@peterw15342 жыл бұрын
Wow this is actually pretty amazing. Fascinating stuff
@YeloPartyHat2 жыл бұрын
Good timing with the NovelAI leaks
@FusionDeveloper2 жыл бұрын
Thanks for this video. So the Steps is actually the Noise Level.
@yuxiang3147 Жыл бұрын
Great video. However, could you explain what this line "latent_model_input = latent_model_input / ((sigma**2 + 1) ** 0.5)" does?
@ZedaZ802 жыл бұрын
7:18 is clearly a reference to xkcd 221
@heurve2 жыл бұрын
On line 50, i should be changed to t (as we need the FloatTensor) 50: latents = scheduler.step(noise_pred, t, latents)["prev_sample"]
@LinfordMellony Жыл бұрын
Mind giving a quick review of Bluewillow and which software does it utiized? I think you guys break down the whole infrastructure which is actually very informative.
@bluesailormercury Жыл бұрын
Somebody asked that in a Discord AMA a couple of days ago. They're not telling. But it's very likely Stable Diffusion, using a finetuned custom model, or several. So it should be the same infrastructure
@pb-vj1qs2 жыл бұрын
The code might have a bug, "TypeError: unsupported operand type(s) for /: 'DecoderOutput' and 'int'" on the line "image = (image / 2 + 0.5).clamp(0, 1)"
@alessandro_yt2 жыл бұрын
Same case here :(
@pb-vj1qs2 жыл бұрын
change a line before to image = vae.decode(latents).sample, the .sample fixes it but now trying to get it to display
@alessandro_yt2 жыл бұрын
@@pb-vj1qs It worked now, thanks! The image is displayed here...
@WilfEsme Жыл бұрын
I'm using one of the most accessible image generators called Bluewillow. Can you teach us a bit of how it works and what code it is currently embedded?
@bluesailormercury Жыл бұрын
It's very likely using a finetuned Stable Diffusion model. Or a few of them. So it should be the same internal mechanism. I don't know about the code though
@astroid-ws4py Жыл бұрын
This is more about the data and less about the code, Neural Networks mechanism is pretty simple, What changes everything is the data... HUGE amounts of it.
@will_hunt10 ай бұрын
1:00 “There are questions about ethics… maybe we deal with them another time” AI in a nutshell
@amventures1 Жыл бұрын
Can we add annotations along with the image in an image2image model? The annotations to tell us which part of the image needs to be regenerated. Like I want to change the background with the annotations to that background so it gives exactly the same person with a different background? Something like Photoshop Generative AI
@sebastianwette112 жыл бұрын
I wanted to ask if anyone got a simmilar error when running this line: "image = (image / 2 + 0.5).clamp(0, 1)" Error: "unsupported operand type(s) for /: 'DecoderOutput' and 'int'". Seems like to output of the vae decoding the latents is can not be used together with an int. Can anyone help?
@bret442 жыл бұрын
a comment below has come up with the fix: change image = vae.decode(latents) to image = vae.decode(latents).sample
@semidemiurge2 жыл бұрын
This was so helpful in understanding this new tech. thank you
@CrystalblueMage2 жыл бұрын
If you can make images by removing noise from random noise. Can you make P solutions from NP solutions the same way by training on known P solutions having "noise" added to make them NP?
@grayaj232 жыл бұрын
"What amount of frog DO you want in this image?" I WANT ALL THE FROG.
@levii27482 жыл бұрын
I was waiting for this 🙏🙏🙏
@gaptastic2 жыл бұрын
this video just put me on a wonderful path, thank you!
@sunaxes5 ай бұрын
why do you need to noise up your guiding image almost all the way before doing diffusion? Isnt that gonna destroy the guidance? Or does the denoising fails if noise is not somewhat Gaussian?
@t.michaeltracy20462 жыл бұрын
Great video, really informative. I was hoping to try out your Google Colab code, although it seems broken at the moment. Are there any updates regarding this announcement regarding the known bugs? "Note: There might be a handful of bugs at the moment. The developers of this stable diffusion implementation keep changing the api. Everyone should know not to make breaking api changes so regularly! I'll do a pass over the code and fix bugs as soon as I can. Am away this week :) thanks to Michael d for bringing this to my attention."
@briancunning4232 жыл бұрын
Great explanation.
@carlmalia292 жыл бұрын
love this tool but im having an error when trying to noise an image to run the AI over a guide image. the add_noise def returns an error of "AttributeError: 'int' object has no attribute 'to'". It come after the call line below any help would be amazing latents = scheduler.add_noise(encoded, noise, start_timestep)
@Indrikmyneur Жыл бұрын
Well, done, I just don't understand how the guiding works. What if I instruct it to create a complex image that certainly wasn't in any training data with many complex relations what should be where in the inquiry? How it can be constructed as a whole instead of creating and merging the parts it may have encountered?
@PapaVikingCodes2 жыл бұрын
Interviewer is the the guy from sonic state right?
@Pikefish2 жыл бұрын
4 is a random number, we know because we rolled a dice. Was that a PS3 cracking reference?
@gz69632 жыл бұрын
great video and very educational I'd love to hear you guys talk about textual inversion
@engineeranonymous2 жыл бұрын
I think there is an error in line "image = vae.decode(latents)" It should be "image = vae.decode(latents).sample" ????
@mylittleparody22772 жыл бұрын
Thank you for this video, it's really interesting!
@the_proffesional1713 Жыл бұрын
SD banned on Colab right? But some of people cracked it or bypass it and itd allows u to lauch SD on colab again, which is interesting. They probably changes something in the code of SD code to make them invisible as a unknown processed.
@watchyoutube-ge8xg Жыл бұрын
What does the training set look like? Where can I get it?
@gollolocura2 жыл бұрын
Me: "I'd like to order Rabbit" SD: "What percentage of Frog would you like with your Rabbit"
@threeMetreJim2 жыл бұрын
Depend on if you are in Yorkshire, UK; or Paris, France.
@Thinknotix Жыл бұрын
Is there a way to use 2 image prompts instead of 2 text prompts to get a 50/50 blend?
@Jianju692 жыл бұрын
A hybrid frog/snake is properly called a *SNOG*, obviously.
@properjob23112 жыл бұрын
Frake news
@brym91592 жыл бұрын
Mike said link to code in description!
@Computerphile2 жыл бұрын
Now sorted!
@ArcadianCatharsis2 жыл бұрын
Such a fun and interesting tool. Wish it wasn't used to do bad things, like stealing people's artworks
@Lodinn2 жыл бұрын
7:20 My man Mike knows that when you use a proper random function, the result would be 4. Guaranteed to be random!
@sawyermclane60392 жыл бұрын
When trying to run "pil_to_latent(im)", in the "Scheduling and Visualization" section, I'm getting "AttributeError: 'AutoencoderKLOutput' object has no attribute 'sample'". I've tried changing latent.mode() to latent.sample(), with no change.
@xormenterxormenter18832 жыл бұрын
Replace it with this line: return 0.18215 * latent.latent_dist.mode() # or .mean or .sample
@morgan02 жыл бұрын
so on the quality difference, dalle2 is 1024x but for some reason pretty heavily jpeg compressed, stable diffusion is 512x but (at least on replicate) much much less jpeg compressed, if at all (sometimes i’ve gotten stuff that looked compressed but it might’ve been from being trained on compressed images, not sure). so while it’s a lower resolution, i’ve found that it’s a higher quality image, but i’m sure there there are hosted versions that are much lower quality. also i’m not sure what differs between them for inpainting but i’ve found that for stable diffusion i can’t just add a mask, i have to inpaint stuff myself and get it somewhat close, otherwise i get variations on that part i was trying to get something else at
@morgan02 жыл бұрын
oh and dalle2 is way way pricier than stable diffusion on replicate so i don’t know why they’re compressing the images so much, surely they should be able to afford storage for the images at the cost they charge
@deathstroyer Жыл бұрын
I would assume thats the imperfections resulting from the upsampling from 64x64 youre seeing
@morgan0 Жыл бұрын
@@deathstroyer oh yeah the autoencoder vs directly diffusing the image. would be cool to see someone fork stable diffusion and add on a non-autoencoded diffusion final step to make the output higher quality
@morgan0 Жыл бұрын
and it’s not a 64x image, it’s latent space
@pmo19722 жыл бұрын
Excellent tutorial. Thank you.
@ipechman2 жыл бұрын
Where can I find the link for the google Collab?
@Computerphile2 жыл бұрын
Now in description -Sean
@ipechman2 жыл бұрын
@@Computerphile Thank you!
@ZT1ST2 жыл бұрын
So I know you briefly mentioned the ethics of using these in the previous video (Usually around the trained images as I understand) - does Stable Diffusion allow you to not just supply that original image like the rabbit image you provided there, but the *entire* training set for a local training process based *only* on images you've provided/made/created/got permission to train based off of?
@Nerdule2 жыл бұрын
The trouble is that in order to specify "only include data you can learn from these specific images and no others", you'd need to retrain the entire network from zero, which costs six hundred thousand dollars worth of graphics card time.
@ShankarSivarajan2 жыл бұрын
Another cool thing you can do is _negative prompts,_ that you can put in place of the "unconditioned" embedding.
@Onihikage2 жыл бұрын
Yep, negative prompts are great for things like getting hands right. It turns out Stable Diffusion, at least the 1.4 model everyone's been using so far, has trouble identifying where a hand or finger is supposed to stop, so you often get hands with too many fingers or fingers coming out of fingers as it keeps trying to "complete" a partial finger. Including a negative prompt for "hands" or "too many fingers" tends to produce much better results.
@ShankarSivarajan2 жыл бұрын
@@Onihikage Yes, that is precisely what I use it for too. I expect we got that advice from the same place.
@reecelawson24032 жыл бұрын
hi, could you guys make a video on what kernels are please?
@thecakeredux2 жыл бұрын
Only a matter of time until someone adapts this to 3d models. I mean, there are millions of 3d models on the internet in form of assets for all kind of engines and frameworks, all with a description to them, too.
@PaulFishwick2 жыл бұрын
I just watched this video. Obtained a Colab error on this statement: image = (image / 2 + 0.5).clamp(0, 1) . The error was: TypeError: unsupported operand type(s) for /: 'DecoderOutput' and 'int'
@uneek35 Жыл бұрын
Would love to see a test to see how it works when it's trained with a limited dataset.