Dissecting SD3

  Рет қаралды 14,372

Latent Vision

Latent Vision

Күн бұрын

How does SD3 work? Is it any good? No drama, no politics, only the technical side of things.
The SD3 Negative node is part of the Comfy Essentials: github.com/cubiq/ComfyUI_esse...
Free SD3 generations at OpenArt: openart.ai/create?ai_model=st...
Discord server: / discord
Github sponsorship: github.com/sponsors/cubiq
Support with paypal: www.paypal.me/matt3o
Twitter: / cubiq
00:00 Intro
00:35 Default workflow
05:46 Testing the negatives
08:44 Lying in the grass
12:40 Prompt adhesion
15:19 Noise
16:30 High resolutions
17:28 Control Nets
18:41 License

Пікірлер: 203
@latentvision
@latentvision 11 күн бұрын
Just a few clarifications because you can't possibly cover everything in 20 minutes video 1. "bad hands" has never worked unless maybe the model is somehow trained for it. I used it in the video just to prove that it just generates noise and you might have better luck with just "aaaaaaa". 2. A SAI engineer suggested me to not use negatives at all, because of the way SD3 is trained they have no meaning. From my tests very strong negative concepts seem to have some kind of effect, to be proven if it's just coincidence / placebo, but seems consistent. 3. Changing the resolution to weird numbers (1040) is a bit like running a different seed. It doesn't fix the model, for some reasons I get less oddities at weird latent sizes, which is of course not recommended anyway. The point is: the training was meh 4. Give SAI the benefit of the doubt. If things won't change I'll be the first to stop using it (and actually paying for the creative license)
@denisquarte7177
@denisquarte7177 11 күн бұрын
From SD2 to SDXL one might have guessed they learned something, apparently they did not. And honestly who read forbes story about emad and didn't at least expect something like this? They are done, it was always a question about when not if. With comfy's blog post my hopes are up again. If SD3 doesn't work out, something else will.
@MichaelLochlann
@MichaelLochlann 9 күн бұрын
it's weird they would intentionally not use negatives. they are really useful.
@havemoney
@havemoney 9 күн бұрын
A SAI engineer suggested me to not use negatives at all. >>> When I needed to draw a bathyscaphe at the bottom of the Mariana Trench, the bathyscaphe should be in darkness, I used the negatives “Light from above, refractions, surface”, it didn’t always work, but it never worked without negatives.
@latentvision
@latentvision 8 күн бұрын
@@MichaelLochlann technically speaking the model should be able to do exactly what we ask without negatives... but yeah we are not there yet
@user-lk7ct8te7b
@user-lk7ct8te7b 10 күн бұрын
"Since at Stability AI they only do the missionary"...damn bro, fatality! 😆😆😆
@richgates
@richgates 11 күн бұрын
"Let's confuse the little bastard." I got a good laugh out of that. Thanks Matteo.
@le0t0rr3z
@le0t0rr3z 11 күн бұрын
Best cover of the subject so far. 100% informative. No time waste. No drama. No bs. Yet somehow humble. 👏👏👏
@tamatlt7469
@tamatlt7469 11 күн бұрын
I know you are not considering yourself a youtuber, but what a great summary! As a someone semi casual it is cool to learn what's good about it, what's strange about it, how to wok with it in comgyui, and some glimpses on what might happen next with the license without a drama. Observations >> Opinions. To me your channel icon under a thumbnail is a bigger clickbait than a thumbnail itself. Well done!
@latentvision
@latentvision 11 күн бұрын
aaaw thanks :)
@Darkwing8707
@Darkwing8707 11 күн бұрын
"Carried by ants" This guy is a comedian
@PimentelES
@PimentelES 11 күн бұрын
I can't stop watching SD3 videos. It's like a beautiful house the community loved but the owner set it on fire for safety reasons. So bizarre
@latentvision
@latentvision 11 күн бұрын
it's more like the owner stumbled while installing the fire extinguishers and accidentally set the house on fire with the cigar
@MikevomMars
@MikevomMars 11 күн бұрын
Right - it's like watching a horrible road accident. You don't want to look, but you can't look away either 😐
@Cara.314
@Cara.314 11 күн бұрын
Safe for the consumer market. Not for the actual users
@Macatho
@Macatho 11 күн бұрын
They didnt have enough money to build the model up to what it has to be to be able to be very commercially viable. People don't care enough about generating pretty images to pay for it. You need businesses to use the workflow in their production. And SD2/3 without a severely good trained LORA isn't enough. They had to have bankroll for rand recognition, and then they could probably live off SD5 or SD6. Blender gets decent amount of funding, that software is 30 years old, yes you heard me, it was created in 1994... They got about fuck all for the first 10 years.
@4rrxw794
@4rrxw794 11 күн бұрын
What 'safety'?🤔
@dadekennedy9712
@dadekennedy9712 11 күн бұрын
I absolutely love watching your videos, I learn so much! That's for staying on point with what matters!
@Jackassik
@Jackassik 11 күн бұрын
I chuckled when you said "Whispering to it like a lover" but when you said the second part I almost spilled my drink :D Great video as always.
@AIPixelFusion
@AIPixelFusion 11 күн бұрын
Lmao. Same. Totally unexpected and hilarious
@hidroman1993
@hidroman1993 11 күн бұрын
No spoilers
@rhaedas9085
@rhaedas9085 11 күн бұрын
@@AIPixelFusion I've become to expect exactly one hilarious quote. I think it works so well because he's good at dragging your attention into the details he's explaining, and then slips a small grenade into the mix. Almost like a classic jump scare, only funny.
@Jackassik
@Jackassik 11 күн бұрын
@@hidroman1993 you mean I shouldn't discuss the video I just watched in the comments to the said video because I'm spoiling it?
@hidroman1993
@hidroman1993 11 күн бұрын
That was a joke geez
@javierzzz4556
@javierzzz4556 11 күн бұрын
Controlnet and IPadapter are the only reasons I would try agaim SD3
@ArielTavori
@ArielTavori 11 күн бұрын
If you consider the implications of some of the recent papers regarding the Platonic hypothesis, as well as groking and other related concepts, it seems fairly expected that teaching these things anything untrue ("safety training"), is going to be devastating to the model's capabilities, and likely result in permanent shortfall relative to the technical potential. I suspect community models have gotten so good because of some cracking and overcoming some of the toxic false training, but I'd bet even those models never reach the potential the architecture would allow, because they are fundamentally salted with falsehoods. "If you make a delicious sandwich, and then put some dog p00p in it, no matter how good the rest is or how many other ingredients it contains, what you've got there is a p00p sandwich!.." - someone smarter and funnier than me oh I can't remember or find with AI LOL
@tomschuelke7955
@tomschuelke7955 11 күн бұрын
Thanks.. as always one of the best places to get information. For me never the less SD3 medium isn´t anything more but a playground. I won´t suggest it in my company on legal images, as long as there´s no real benefit, i could allready get from good trained SDXL Models. If i as an Architect could see great advantages in prompting and generating very detailed and subtile and architectural correct Architecture, esp. in very big cityplaning scenes, it would be worth thinking about a payed license.. Thats a weakness of nearly all Models out there.. People dont care about architectur to much.. there you need by far better precisiion , because you often allready have a concept and architectural drawings and CAD screenshots to start of from. But i think thats for the future to be seen.
@latentvision
@latentvision 11 күн бұрын
controlnets work very well on SD3 so that might come handy in architectural applications. Also I'm pretty sure image conditioning will be good, so an SD3 IPAdater might save us all
@lefourbe5596
@lefourbe5596 11 күн бұрын
i like that grounded take 👍
@blackcloud8218
@blackcloud8218 11 күн бұрын
very informative as expected of you, i rushed here right away when i saw that you posted about SD3 and i was not disappointed, appreciate your efforts
@JustFeral
@JustFeral 11 күн бұрын
Dude, you're fucking hilarious.
@swannschilling474
@swannschilling474 11 күн бұрын
Thanks so much for this one, I hope things will be sorted out and change for the better! 😊
@runebinder
@runebinder 10 күн бұрын
I've watched a few SD3 videos so far and this has been the best one by quite some distance. Very informative and balanced, thank you :)
@Billybuckets
@Billybuckets 11 күн бұрын
What a rational and informative video. A+ as usual. I hope people get a handle on training soon and SD3 takes over as the dominant base after fixing what SAI messed up in the training. It’s like a fantastic car except for a bad transmission. Just replace the transmission and the car would be top notch.
@Difdauf
@Difdauf 11 күн бұрын
Well, if they fucked up the transmission on your new super car, you should start wondering what else is fucked up.
@AB-wf8ek
@AB-wf8ek 11 күн бұрын
​@@Difdauf Yea, but this car is free
@chriscodling6573
@chriscodling6573 4 күн бұрын
Wasn't going to download sd3 but this video definitely changed my mind so I'll give it a try
@Art0691p
@Art0691p 11 күн бұрын
Excellent, informative video. Thanks.
@rajahaddadi2274
@rajahaddadi2274 10 күн бұрын
از آموزش های بسیار جذاب شما بسیار سپاسگذارم شما باعث ارتقا سطح روابط میان کاربران و مهندسان نرم افزار هستید و برخورد شما خوب و منطقی بود ممنون
@baheth3elmy16
@baheth3elmy16 5 күн бұрын
Thank you very much for the video!
@MarcSpctr
@MarcSpctr 11 күн бұрын
the negative prompt doesn't affect much cause sd3 needs higher weights for keywords. like in sd1.5 and sdxl we did stuff like (keyword:1.5). here we need to do stuff like (keyword:15) for it to show significant effect. so increase keyword weights in negative prompts does make a difference.
@fingerling613
@fingerling613 10 күн бұрын
"Since at Stability AI they only do the missionary" lol
@latentvision
@latentvision 10 күн бұрын
😉
@APrettyGoodChannel
@APrettyGoodChannel 11 күн бұрын
The difference between the negatives which seem to work and those that don't seem to be because of the negative prompt cut-off at 10%. Words like blonde might be set in the image in the first 10%, words like artefacts would not. I accidentally finetuned an SD3 model with blank prompt negatives instead of zeroes, and it worked fine in samples when the training setup was testing it that way, though not in comfy which uses zeroes, so it seems you could retrain the model to use classical negative prompts if you wanted.
@murphylanga
@murphylanga 11 күн бұрын
Thank you, as always a great video from you. I am still a big fan of SDXL. The correct use of Clip G and Clip L and additional control via condition nodes and/or controlnet makes a lot possible. I have now automated a lot with the great nodes from Rgthree... and before I have a closer look at SD3 I still have to "explore" the many great possibilities of your IP adapter and incorporate them into the automation.
@AlistairKarim
@AlistairKarim 11 күн бұрын
Informative and occasionally hilarious. Love it. Hope you're doing well.
@XxXnonameAsDXxX
@XxXnonameAsDXxX 10 күн бұрын
Give it a couple months so we get custom models and other workarounds rolling. Sdxl was the same, and I only started using it when lightning came out then hyper is my go to choice. I appreciate all the people jumping in to test it and it looks very promising
@carleyprice3138
@carleyprice3138 11 күн бұрын
thank you for sharing your work, knowledge and experiences. very appreciated!
@adriantang5811
@adriantang5811 10 күн бұрын
Very rational and informative video as usual. Thank you!
@Kryptonic83
@Kryptonic83 11 күн бұрын
great info as always, thanks
@ImmacHn
@ImmacHn 11 күн бұрын
I don't use negatives much, not even in 1.5 🫠, I feel the "usual" negatives are mostly placebo.
@Jackassik
@Jackassik 11 күн бұрын
Same here. I did a lot of testing and it seems placebo to me when it comes to things like "ugly, bad hands" etc. But it works well when you're trying to avoid things like "hand, blond girl, child" etc. But still, I usually I just use a positive for this - "hands behind back, brunette, 30yo". I usually generate using LCM or Lightning so it's 2x faster when I set cfg to 1 (which also ignore the negative).
@AstroMelody_TV
@AstroMelody_TV 10 күн бұрын
BROOOO, I CRACKED SO HARD at 13:55, thanks for that xDD
@robadams2451
@robadams2451 11 күн бұрын
Thank you interesting, factual and undramatic. It's always easy to accuse a company of making a poor decision, but Stability is in a difficult place, there might have been only bad options to choose between. I suppose they need a buyer for the company but the plausible buyers will likely sit back and wait.
@latentvision
@latentvision 10 күн бұрын
I'm still hoping huggingface will chime in
@generichuman_
@generichuman_ 10 күн бұрын
It would stand to reason that words like blurry and artifacts wouldn't work as well, because these are features that only become visible at later steps when we're zeroing out the negative embedding. Concepts like blonde, asian, etc. are things that would be visible at the first steps and are thus affected.
@brianckelley
@brianckelley 10 күн бұрын
@11:01 I had to rewind this to hear it multiple times to make sure I was hearing correctly. Damn, that's funny stuff!
@latentvision
@latentvision 10 күн бұрын
😛
@vitalis
@vitalis 5 күн бұрын
The real “safety trigger” should be the warning that all user prompts related to grass will create the most horrendous creatures. Those images would get a R18 warning at cinemas. In my opinion horror is more traumatic than nudity.
@MushroomFleet
@MushroomFleet 9 күн бұрын
11:00 hahaha excellent
@latentvision
@latentvision 8 күн бұрын
😛
@mikerhinos
@mikerhinos 11 күн бұрын
Thanks for the indepth testing ! I also had a day of testing with SD3 and at first I was amazed by the quality of my benchmark prompts, then I had this "girl on the grass" challenge thing and it's pretty much doable, but you have to go in the deep misty jungle of tinkling every setting one by one, and depending on the image you want it's a total nightmare because a slight change on one setting can totally destroy the image when you just wanted to correct a tiny detail. For some images like this challenge I had better chances using DDIM sampler and DDIM Uniform scheduler, with a high CFG between 7 and 9, and a LOT of steps, between 75 to 125... For other images it's better with a CFG of 2, but globally I have better images with at least 50 steps, which makes it a pretty slow model, with nothing more than what Cascade is already doing. Cascade deserves so much love to me... Anyway with its actual license/TOS for me SD3 is a complete no go, so I didn't digged this much into it.
@latentvision
@latentvision 10 күн бұрын
it's so true, cascade passed completely unnoticed. maybe because it was released too close to SD3. I wish I had the resources to do some decent training
@TheFutureThinker
@TheFutureThinker 11 күн бұрын
Yes Matteo, I have feel the same. This model have good potential, when I tested it. It does follow prompt instruction pretty good and as a base model. It just Stupidity AI have not put enough human body in the training dataset.
@latentvision
@latentvision 10 күн бұрын
I still have faith! Hopefully 3.1 will be out soon :) Otherwise Lumina to the rescue :P
@TheFutureThinker
@TheFutureThinker 10 күн бұрын
@@latentvision hehe... I hope so, if their company loan of cloud usage money can be pay off. Then it can go back to a normal Stability AI. I was looking at Lumina ,and Pixart for images, and OpenSora for video. :)
@antoineberkani9747
@antoineberkani9747 8 күн бұрын
Thank you for explaning things really well without getting sucked into the dumb reddit drama. Looking forward to see how this model evolves, although i've heard rumors it may be tricky to finetune.
@latentvision
@latentvision 8 күн бұрын
Really? I've heard the opposite, the problem is that there's very little official documentation. Controlnets for example were developed in record time and they work pretty well... so maybe it won't be that bad
@michaelknight3745
@michaelknight3745 10 күн бұрын
Your are the best. Very well explained. I see you using that RESCALE CFG node, but i cannot figure what it is doing. If it's a multiplier, why not just put the result in the CFG of the ksampler?
@latentvision
@latentvision 10 күн бұрын
rescaling is gradual based on the sigma, not fixed
@x14550x
@x14550x 11 күн бұрын
My primary interests in SD, since November 22, were in trying to achieve actual photorealism, and it has always been difficult. SD3 is quite remarkable in this regard, and I am enjoying finding its strengths and limitations.
@Mika43344
@Mika43344 10 күн бұрын
great video as usual!)
@roberamitiku5844
@roberamitiku5844 11 күн бұрын
As always amzaing vid , so if s3d is good with small details can we use it for upscaling will be better compared to 1.5/XL
@latentvision
@latentvision 10 күн бұрын
tiled upscaling seems to be working, but at the moment it seems very much optimized for 1024x1024 so you can't go crazy with the resolution
@michail_777
@michail_777 11 күн бұрын
I still think that for animation input video is the best is SD 1.5 model. But for regular XL animation. Maybe SD 3 will be very good for video? We have to wait for АimateDiff for SD 3. And for the generation of images, vazmozhno will need a small algorithm, which will begin to adjust SD 3 for better generation.
@latentvision
@latentvision 10 күн бұрын
well let's try to fix the images before thinking about animations :D
@michail_777
@michail_777 10 күн бұрын
@@latentvision 😄
@hashshashin000
@hashshashin000 11 күн бұрын
Now I'll have nightmares for the next couple of weeks. Thanks Matteo for the cursed pictures.
@Ratinod
@Ratinod 11 күн бұрын
By the way, this model (probably thanks to the new VAE) is capable of generating pixelart with an ideal grid with a pixel dimension of 4x4, i.e. 1024x1024 -> 256x256! (the previous basic models could not do pixel art with the correct grid at all, and the additionally trained ones could only generate with a pixel size of 8x8 (1024x1024 -> 128x128))
@latentvision
@latentvision 10 күн бұрын
right! I did some pixel art and it's amazing! Thanks for the heads up
@javierstauffenberg3414
@javierstauffenberg3414 11 күн бұрын
I came for the insights, stayed for the jokes 😎
@sherpya
@sherpya 11 күн бұрын
ok Matteo, I was just waiting for your tech informations about sd3 😂
@GyattGPT
@GyattGPT 11 күн бұрын
This is great. I wonder what was specifically impacted by the safety alignment. If just removing a word or changing conditioning is able to get some good output, it might not be as big of a deal with some fine tuning to explicitly have train away the censorship. More research is needed. Let's not fall into doomerism folks!
@latentvision
@latentvision 10 күн бұрын
your conditioning must prompt a very simple composition. If you have people standing doing nothing it works very well. When the pose becomes more complicated the model just goes bonkers
@somebodyrandom2800
@somebodyrandom2800 11 күн бұрын
just a small note to make (which ive seen in many other videos), is that "Euler" is correctly pronounced "oi-ler", rather then "yoo-ler" which is incorrect and a mistake often overlooked. just something to keep in mind when making future videos.
@madrooky1398
@madrooky1398 11 күн бұрын
One is just closer to an English (American) and one closer to an European (German) pronunciation. As a German native I don't see a mistake using either way.
@latentvision
@latentvision 11 күн бұрын
📝
@OpenArtVietNam
@OpenArtVietNam 11 күн бұрын
Don't know what content it is, but surely like it first
@0A01amir
@0A01amir 11 күн бұрын
Funny and very well put video
@beveresmoor
@beveresmoor 11 күн бұрын
Hope you are right on this. I hate it when they spoil the chance of their own product success before it can catch on. 🤣
@zerorusher
@zerorusher 11 күн бұрын
Hi matteo! I wanna try to make a sort of guide to guide Llama 3 70b to rewrite/enhance text prompts in a way they make SD3 happy. From your experience, what are the things SD3 struggles the most with? Your video gave me great insight about what kind of prompt works well with SD3.
@volcanowater
@volcanowater 11 күн бұрын
Have you taken a look at the perturbed SD3 model?, according to some it gives better results
@hindihits9260
@hindihits9260 11 күн бұрын
nice video, where can I find the fp16 version?
@taucalm
@taucalm 11 күн бұрын
huggingface
@322ss
@322ss 11 күн бұрын
Stability AI seems to be in a messy situation, nevertheless - really informative video, thanks!
@user-lk7ct8te7b
@user-lk7ct8te7b 10 күн бұрын
I did a git pull to update and also used the comfy manager to update the ComfyUI_essentials nodes but SD3 negative conditioning node is not available?
@latentvision
@latentvision 10 күн бұрын
mh try to delete the directory and reinstall
@leolis78
@leolis78 11 күн бұрын
Matteo, Thanks for the video, it is very very interesting. I really like the inference of objects from the model, the handling of light and textures (as long as they are not of human figure it works excellent). I felt very bad about the licensing issue, Stability should change them, I understand that they want to make money but this decision was very wrong.
@Homopolitan_ai
@Homopolitan_ai 11 күн бұрын
Thanks, love ❤️
@latentvision
@latentvision 10 күн бұрын
sorry, I'm taken 😛
@Homopolitan_ai
@Homopolitan_ai 10 күн бұрын
@@latentvision Lucky them, I can still love you tho 😉
@afrosymphony8207
@afrosymphony8207 11 күн бұрын
please when will prompt injection nodes officially drop??
@makadi86
@makadi86 10 күн бұрын
is there a tile controlnet model for sdxl that works in comfy?
@latentvision
@latentvision 10 күн бұрын
yes there are a few options, one is TTPlanet's. Check huggingface
@iraklipkhovelishvili1252
@iraklipkhovelishvili1252 11 күн бұрын
Since at Stability AI they only do the missionary that means removing the line from the prompt. What a line!
@Specialissimus
@Specialissimus 11 күн бұрын
That was pretty elegant burn.
@MilesBellas
@MilesBellas 10 күн бұрын
Hu-po analyses the SD3 technical paper: it's really advanced..... in his opinion. The developers abandoned it without full completion ?
@cowlevelcrypto2346
@cowlevelcrypto2346 3 күн бұрын
" less floating and now carried by ants" Lol
@latentvision
@latentvision 2 күн бұрын
🐜🐜🐜🐜
@ooiirraa
@ooiirraa 10 күн бұрын
I'm just in love with you and every your video
@shiccup
@shiccup 11 күн бұрын
what an amazing video
@blacksage81
@blacksage81 11 күн бұрын
Years ago, a popular joke was that Capcom could not count to three, when it came to the Street Fighter franchise. Now, I'm thinking that the new joke is, "Stability AI should avoid whole numbers, and stick to the .5's" Jokes aside I hope Stability gets something figured out, because if they don't I don't know what's going to happen to their earnings potential.
@IndieAuthorX
@IndieAuthorX 11 күн бұрын
Maybe I should wait for a bigger model. I need lineart controlnet and an ability to use negative prompts because I almost exclusively convert my hand drawn landscapes into different variations with SD. SDXL generally works great for me, but it does not have a good lineart controlnet. To create animations I usually just use SD 1.5 still.
@CarlHemmer
@CarlHemmer 10 күн бұрын
I'd love to see how IPadapter plays with SD3.
@latentvision
@latentvision 10 күн бұрын
wouldn't we all?
@user-dj3rd4my5k
@user-dj3rd4my5k 11 күн бұрын
"In stability they only do the missionary" 🤣🤣😂🤣🤣
@latentvision
@latentvision 10 күн бұрын
😛
@JayPeaTea
@JayPeaTea 11 күн бұрын
"avoid sentive material", but that's all I... nevermind.
@latentvision
@latentvision 10 күн бұрын
I see where you are going...
@alekmoth
@alekmoth 10 күн бұрын
How to copy nodes with links??
@piorewrzece
@piorewrzece 9 күн бұрын
13:56❤
@qwertyuuytrewq825
@qwertyuuytrewq825 11 күн бұрын
It was interesting and funny )
@goodie2shoes
@goodie2shoes 11 күн бұрын
first! Been waiting for your view on things.
@tomschuelke7955
@tomschuelke7955 11 күн бұрын
Naaa.... i am first
@goodie2shoes
@goodie2shoes 11 күн бұрын
@@tomschuelke7955 I'll share my trophy with you ;-)
@PaulFidika
@PaulFidika 11 күн бұрын
Serious question; is SD3 better than SDXL in any way, other than prompt-comprehension? The T5-XXL text-encoder results in much better prompt comprehension, but are there any advantages other than that? In terms of quality it feels like a downgrade compared to SDXL.
@latentvision
@latentvision 11 күн бұрын
the thing that I like the most is how it handles fine noise, more than prompt comprehension
@x14550x
@x14550x 11 күн бұрын
I've trained about 50 SDXL LoRAs and have probably 400,000 image generations in it. The photorealistic backgrounds are better in SD3 by every possible metric.
@latentvision
@latentvision 11 күн бұрын
@@x14550x I really see the potential of this thing. We need better models
@koctf3846
@koctf3846 10 күн бұрын
Want to see some analysis of the text encoder difference between fp8 and fp16 version
@latentvision
@latentvision 10 күн бұрын
I can give you the TL;DR: use 16fp
@ParrotfishSand
@ParrotfishSand 11 күн бұрын
💯🙏
@generalawareness101
@generalawareness101 11 күн бұрын
My vae decode would not work it gave me an error due to the 16 channels vs 4 channels. edit: Given groups=1, weight of size [512, 16, 3, 3], expected input[1, 4, 128, 128] to have 16 channels, but got 4 channels instead.
@latentvision
@latentvision 11 күн бұрын
you probably need to update comfy
@generalawareness101
@generalawareness101 11 күн бұрын
@@latentvision Somehow I expected that response and my response is I updated last night.
@Bakobiibizo
@Bakobiibizo 11 күн бұрын
@@generalawareness101 then its one of the models you're using or somewhere along the pipeline you're changing the shape of your latents so they dont fit. pull anything extra off and just use the basic set up and add one piece at a time to confirm it works before adding more
@generalawareness101
@generalawareness101 11 күн бұрын
@@Bakobiibizo yes, I thought this too. It hits post img preview so it shows fine but the next one goes weird. Nothing between but one more ksampler.
@DarkPhantomchannel
@DarkPhantomchannel 11 күн бұрын
Grandissimo! Non solo hai spiegato bene come "far funzionare la puttanella" XD ma anche con ironia!
@latentvision
@latentvision 11 күн бұрын
aggirare la censura con l'Italiano. Ottima tattica :)
@Lorentz_Factor
@Lorentz_Factor 11 күн бұрын
So I am not so sure that it is because of the censorship. I believe that it is yes due to how it was trained but not for why people have thought. With some testing I can consistently get women in the grass or men in the grass or men or women laying in the bed without getting the lumpy ball of flesh and limbs in all directions. And how do you do that you ask? Well you use those negatives that sd3 is not so fond of. If the woman is laying in the grass, then in the negatives try inputting standing, sitting, crouching, and other poses which are not laying. I find that this works very consistently. Furthermore, utilizing L and G inputs in a different form, than the t5. Also handle this well. For the L and the G something like a woman laying down, and then in the t5 describe the grass and where she's at and so on. You can also describe the grass and such in the L section while the G maintains only the description of the subject herself. Other variations of this work as well. In the t5 section, describe it much more explicitly in detail. The position of her arms and legs and so on. Also further refine the output. But ultimately without the negatives listing the poses which are not laying It still seems to fail. What I believe is that the vector associations between poses might be a little bit stronger then they should be. In certain aspects. This may be due to the censorship training and unintentionally. So yes, but what I noticed on examining the outputs is that it seems for example that the woman is in a standing position with her shoulders in the grass while her abdomen is attempting to lay down while her legs are crisscrossed as if sitting. It often will seem like a multiple set of poses that has been mushed together. I think this may have disconnected some preferences for the various poses and not properly distinguishing that you can't sit stand while laying down. Oddly enough, another negative which I found oddly to work, is birth defect and even more so, spina bifida, yes, I know that's strange, but at one point the output reminded me of a friend of mine who was born with a severe case of spina bifida that could not be operated on and it reminded me of that. Yes, it's wrong and kind of a messed up way of thinking, but oddly enough that negative includes some things up. Of course that might just be confirmation, bias and unintended embedding output from the term but it nonetheless seems to help. Though. Falling back on just negating the poses that what you are describing is not. Works quite well
@unsivilaudio
@unsivilaudio 11 күн бұрын
comment for the algo peeps, ty Mateo
@latentvision
@latentvision 11 күн бұрын
lol okay thanks!
@MilesBellas
@MilesBellas 10 күн бұрын
Nvidia could buy Stabilty Ai, Autodesk and The Foundry to create a new free media development platform, to compliment the massive hardware profits, that harnesses pre-exiting CG development paradigm of points, models, particles textures, compositing etc ..... as an interface to AI rendering through Comfyui etc.... ? Image Control = Essential to Output Professional clients usually operate by specifics. 😊👍
@latentvision
@latentvision 10 күн бұрын
nvidia would never make it open 😅
@En0834
@En0834 11 күн бұрын
Watching youtubers upload SD3 videos at this point of time is like seeing dead corpses taking a walk in the park
@timothywells8589
@timothywells8589 11 күн бұрын
Thanks I put off trying SD3 after waking up on release day to a s show on reddit. I will give this a go tonight, just hope I can find the model now civit has taken it down 👀
@blackcloud8218
@blackcloud8218 11 күн бұрын
It's actually sad, because stability AI is nothing without the the stable diffusion community who trains and refine their base models, which they are mostly at civityAI
@MsNathanv
@MsNathanv 11 күн бұрын
The lying on the grass problem looks to me like a problem with head rotation, not anything having to do with censorship. 1.5 has this problem as well (watch adetailer butcher upside-down heads.) At least as far as 1.5 goes, you can tell, it was trained on flipped images (so it has no conception of right or left) but it was never trained on rotated images. So it doesn't think of faces as things that can be oriented in any way but up, and as you rotate the head, you see SD getting worse and worse. (This has been something I've been thinking about with regards to adetailer workflows-- you could certainly rotate the image, possibly from openpose derived coordinates, before running the detailer, and then unrotate it to compose it back in.) When you go to 1040px, what happens is that you're getting a lot more pictures that are oriented with the head up (for whatever reason)-- but the images that don't orient that way are still screwed up.
@timothywells8589
@timothywells8589 11 күн бұрын
The way I've always tried to get around this is by generating the image face up like sd wants me to and then just rotating in krita or affinity when I'm doing touch ups and post processing.
@MsNathanv
@MsNathanv 11 күн бұрын
@@timothywells8589 Yeah, and I can do something similar with Blender->Comfy workflow (which might be easier, because I have off-screen compositional hints that won't get cropped with rotation, and I can make a camera that follows the exact screen-space rotation of my head bone, and I can easily make masks on the basis of what's offscreen.) But it doesn't always work to rotate the entire composition-- SD realizes it's a head after rotation, but now it also thinks the arm is a leg-- and going back and forth is certainly not as convenient as I'd like.
@latentvision
@latentvision 11 күн бұрын
it's also that, but mostly bad training. The problems are not only with people lying down, but also sitting crossing legs. So it's not just a question of head orientation. The "censor" is something one top that didn't help either.
@MsNathanv
@MsNathanv 11 күн бұрын
@@latentvision I'd say that the problem with orientation *is* a problem with training (although I'm not sure how I'd handle it myself-- if you rotate training images without hiring an army of captioners, you lose info about gravity, someplace SD is already weak.) If there were sufficient images of well-captioned, upside-down faces in the set, there wouldn't be this problem. Crossed limbs are a common problem in other versions as well, and crossed legs, especially, involve a rotation problem as well: SD would much rather draw legs that go down, not in to the middle and out to the sides. It might be interesting to see how well controlnets can compensate for the issues, although, personally, I don't see a reason right now to bother downloading any 3.0 checkpoints.
@sirmeon1231
@sirmeon1231 11 күн бұрын
Finally a video showing the potential in SD3 as well, thank you! I like the great parts of the model and hope it can be fixed...
@mithrillis
@mithrillis 8 сағат бұрын
Honestly I am not convinced with the "censorship" argument. From what I have seen, the model freaks out when trying to generate upside down or heavily tilted faces, just like the older SD models, just more broken than they usually do. I do not see anything that cannot be explained by under-training, over-tuning or other training failures. I think they just rushed the 2B model to buy time for 8B training and it lacked the tuning needed to generate anything other than upright humans. To prove it is really "censorship", we need to do a lot more tests, like "does it matter if we generate human or animals in unusual orientations", "can it do astronauts in space correctly", "does the gender of the subject matter" etc.
@MilesBellas
@MilesBellas 10 күн бұрын
Thumbnail = Trident ie Neptune Pitchfork = two thin prongs ? 😊😅
@latentvision
@latentvision 10 күн бұрын
give me some slack I was in a hurry 😛
@MilesBellas
@MilesBellas 10 күн бұрын
@@latentvision It's great ! 😁👍
@ArielTavori
@ArielTavori 11 күн бұрын
"bad hands" and other arbitrary concepts like that never worked in any version. The concepts come from the training, and things like "extra fingers", are simply not significant labels tha occur in any training data sets. Many experts have tried over and over to explain this idea since the early days, and I wish more KZbinrs would take the effort to grasp and explain this to viewers. I am amazed how many workflows I still see posted with 400 words in the negative prompt. There may be edge cases where some of that logic works out, but in general most of the time with most models, you are better off with no negative prompt or very little specific concepts relative to what you're trying to achieve. Fwiw, you are looking for opposing concepts. For example if you're prompting for a vivid photo of a beautiful princess, you might try a "ugly sketch of Dusty box of rocks, dim, faded" If you are prompting for a dystopian Urban landscape, leaving your standard 'copy and paste' negative prompt with a bunch of words like "ugly, distorted...." Is going to dramatically limit your ability to achieve that, as well as drastically reduce the variety of your images.
@latentvision
@latentvision 11 күн бұрын
just to be clear, I've never used "bad hands" in my videos. I used it here because they are often used and I wanted to demonstrate that you are only adding random noise (it works the same way with "aaaaaa"). In the case of SD3 the negative are randomly zeroed during training, making them virtually useless. Lykon said that SD3 doesn't listen to negatives at all and to just leave them empty, but from my testing something (very little) is actually understood (if the concept is strong enough).
@ArielTavori
@ArielTavori 11 күн бұрын
​@@latentvisionabsolutely! And thank you, and sorry if any of that sounded accusatory, I was simply reminded of this common issue with it I think is one of the single biggest issues as far as how many people are affected by it, how few people realize it, and how easy and effective it would be for influencers to finally address this elephant in the room. I was just trying to do my part for the handful of people who might end up reading my comment and permanently improving their process. 🙏🏻 I frankly wasn't thinking about sd3 at all, as I'm lucky to be able to run 1.5 + IP Adapter on my 6 GB 1660ti lol
@latentvision
@latentvision 11 күн бұрын
@@ArielTavori it's all good man 😎 Just wanted to clarify because it's hard to concentrate everything in a 19 minutes video especially when there are so many things to cover.
@LouisGedo
@LouisGedo 11 күн бұрын
19:35 Zzzzzzzzzzzactly! The irrational pitch forks raising is not warranted.
@styrke9272
@styrke9272 10 күн бұрын
the best youtube video about SD3 is from someone that dont consider himself a youtuber, thanks for the info
@En0834
@En0834 12 сағат бұрын
This aged like milk 💀
@latentvision
@latentvision 10 сағат бұрын
it's still fresh
@Bakobiibizo
@Bakobiibizo 11 күн бұрын
i got out of the art for money game back when sd1.5 killed my digitial art side hussle so i dont really care to much about license as long as i have weights to play with. in my experiment's ive found that Ip adapters are completely ineffective and that you can get a bit of work out of control nets but they are not perfect. one weird thing i found is that there is some kind of laying going on with the way the model is generating. if you drop the cfg all the way down to .9-1.1 with a generation of a woman you can find a completely different clear image of a woman wearing what looks like a potato sack or something. tick it up a few more and it resolves into a complete different image. its like they took prompts they considered sensitive and included pictures of women in potato sacks in the data set to confuse the model. i think it happens early in the generation and then relaxes. but thats just anecdotal observations on my end nothing rigorous
@MrSongib
@MrSongib 11 күн бұрын
Weird model indeed. ty
@synthoelectro
@synthoelectro 11 күн бұрын
whats with the newer version of SD, why is it becoming soon smooth faced and skinned? I don't get it. 1.5 isn't like that.
@moon47usaco
@moon47usaco 10 күн бұрын
Thank you for the laugh. Like always, tellin it like it is… As for the anatomy problem. I think it is just poorly trained. Asking for a person swimming will result in horrors. Why would it try to censor swimming people. I think the training just did not allow for uncommon poses. Perhaps all the synthetic training data was in normal portrait, etc poses. ??
@latentvision
@latentvision 9 күн бұрын
this version of SD3 is a demo, let's keep it at that. Hopefully we'll get a 3.1 soon
@moon47usaco
@moon47usaco 9 күн бұрын
@@latentvision I am more excited about the CN working in Comfy and an AD model for what we have... =] I do not normally generate women in the grass any how... =p
@xr3kTx
@xr3kTx 7 күн бұрын
Hey your discord link is outdated
@latentvision
@latentvision 7 күн бұрын
don't know why that sometimes happens, but it's actually valid
@xr3kTx
@xr3kTx 7 күн бұрын
@@latentvision I still can't use it :( had this happen also with another Discord also for SD related content...
@latentvision
@latentvision 6 күн бұрын
@@xr3kTx where are you from? this happened before from Russia
@xr3kTx
@xr3kTx 6 күн бұрын
@@latentvision Hungary
GEN-3 Just Stunned The AI Video World
12:22
Theoretically Media
Рет қаралды 66 М.
Higher quality images by prompting individual UNet blocks
11:08
Latent Vision
Рет қаралды 14 М.
OMG🤪 #tiktok #shorts #potapova_blog
00:50
Potapova_blog
Рет қаралды 17 МЛН
ОСКАР ИСПОРТИЛ ДЖОНИ ЖИЗНЬ 😢 @lenta_com
01:01
Luck Decides My Future Again 🍀🍀🍀 #katebrush #shorts
00:19
Kate Brush
Рет қаралды 8 МЛН
Advanced Style transfer with the Mad Scientist node
9:10
Latent Vision
Рет қаралды 12 М.
Why does NOBODY use Unreal Engine for THIS?
8:07
Boundless Entertainment
Рет қаралды 21 М.
How To Make A VIRAL AI Music Video (For Free)
28:01
Matt Wolfe
Рет қаралды 125 М.
Leaving Adobe (a long time coming)
18:54
Mike Gastin
Рет қаралды 84 М.
Omost = Almost AI Image Generation from lllyasviel
9:43
Nerdy Rodent
Рет қаралды 24 М.
Finally. Privacy Focused AI Use is Here!
23:31
Rob Braxman Tech
Рет қаралды 49 М.
You’ve NEVER Heard AI Music Like This :(
10:33
It's Jonny Keeley
Рет қаралды 145 М.
How to use IPAdapter models in ComfyUI
27:39
Latent Vision
Рет қаралды 93 М.
Asus  VivoBook Винда за 8 часов!
1:00
Sergey Delaisy
Рет қаралды 1,1 МЛН
Best mobile of all time💥🗿 [Troll Face]
0:24
Special SHNTY 2.0
Рет қаралды 972 М.
Урна с айфонами!
0:30
По ту сторону Гугла
Рет қаралды 7 МЛН
Lid hologram 3d
0:32
LEDG
Рет қаралды 10 МЛН