Higher quality images by prompting individual UNet blocks

Рет қаралды 20,383

Күн бұрын

Пікірлер

@AustinHartt 6 ай бұрын

This is pretty incredible. That’s some serious fine control and starting to feel like a real tool rather than a lottery game.

@beatemero6718 6 ай бұрын

Dude..... Another Game changer. Honestly, your work helps the most with composition. With further development of this node we can have specific prompt inputs of things like subject, stlye, maybe even background and such. I gave up on prompting and always prompt only the bare minimum, while using mainly img2img and the different ip Adapter models. This will help a lot with prompting way more precisely. Thanks for all the work you are doing.

@nissessin Ай бұрын

Thanks!

@bobbobster8619 6 ай бұрын

Hello, just want to say that I really appreciate your delivery of your comfyui concepts. Out of the several content videos I learn more from yours. Keep up to great job! I look forward to your videos.

@ysy69 6 ай бұрын

Amazing! I was reading an article titled "Anthropic's Breakthrough: Understanding Frontier AI" by Ignacio de Gregorio Nobleja where they trained a sparse autoencoder (SAE) model that could' dissect’ neuron activations into more fine-grained data - the interpretable features - but also reconstruct the original data - going back to the original activations. Your R&D made me think about this article.

@neofuturist 6 ай бұрын

The architect strikes again!!

@AbsolutelyForward 6 ай бұрын

The part where you just changed the haircut (prompt) without loosing the rest of the image made me realize the potential of this technique - fascinating ❤

@bgtubber 6 ай бұрын

So cool! Experiments like these push progress forward.

@aivideos322 6 ай бұрын

You keep coming out with things you would think the creators of the models would have thought of. Great work as always

@joeljoonatankaartinen3469 6 ай бұрын

It just might be that the creator who understands the model structure and functioning to the degree required to get this idea, doesn't actually exist. A lot of the progress has been iterative papers where the author takes an existing structure, changes a couple of things and makes a new paper from the results. Also, to get an understanding of the functioning of the model requires using the model a lot, which is not necessarily the thing that people who designed the model are interested in doing. There can also be a kind of blindness that comes from being too deep with the models. You can end up with observations from early versions that are no longer true with later versions that you never end up rechecking and that can blind you to possibilities. It's very often the case that breakthroughs happen from someone who doesn't understand working towards understanding. Someone who doesn't understand has a more flexible mind and is thus more likely to discover something new than someone who already has an understanding.

@lefourbe5596 6 ай бұрын

it's time for me to target train my LORA on specific layer depending on subject !!! it's gonna be so GOOOOOD THANKS YOU !!!!

@alu3000 6 ай бұрын

my thoughts exactly!

@Neront90 6 ай бұрын

Hey, i want that too! :D

@hilbrandbos 6 ай бұрын

Can you already target a layer while training?

@lefourbe5596 6 ай бұрын

@@hilbrandbos yes in standard LORA. I do like LoCon beter but i found that kohya SS doesn't let us target blocks (with Locon)

@hilbrandbos 6 ай бұрын

@@lefourbe5596 Oh I have to look that feature up in Kohya then... of course we first have to find out what block does what, it would be really handy if you're training a style you'd only address the style block.

@ArrowKnow 6 ай бұрын

every time I run into a specific problem you release a video with the solution within a week. Thank you so much I was wondering how to do this to stop a bleeding issue last night! Excited to try this out. Keep up the amazing work.

@catalystzerova 6 ай бұрын

This is the first time I’ve badly wanted to contribute to a project

@latentvision 6 ай бұрын

DO IT!

@APrettyGoodChannel 6 ай бұрын

There was a paper a long time back about doing Textual Inversion this way, an embedding per block rather than an embedding for the whole model, which apparently gave much better results.

@ryo0ka936 6 ай бұрын

definitely like to see that in play!

@Nikki29oC 5 ай бұрын

Do you remember the title of the paper? I'd like to check it out. Thx❤

@APrettyGoodChannel 5 ай бұрын

@@Nikki29oC Unfortunately not, it was probably 1.5 years ago now.

@morenofranco9235 6 ай бұрын

This is incredible. I will have towatch it two or three time more to get a real understanding. Thanks for the lesson.

@jasondulin7376 5 ай бұрын

Just starting this video but want to say: You are a great teacher and your sense of humor is spot on!

@dadekennedy9712 6 ай бұрын

This is absolutely amazing. I look forward to seeing the development of this process.

@davidb8057 6 ай бұрын

Another gem from Matteo, thank you! It's indeed very promising.

@dck7048 6 ай бұрын

Thank you for sharing your ideas Matteo. My bet is that this isn't something that has never been conceptualized before- it likely has, but as so many other breakthroughs it's locked behind close source. This is probably the most exciting news I've seen for gen AI in a while, definitely the seed of something big. Great work!

@2shinrei 6 ай бұрын

Wow! Matteo, you deliver as always. This could be the next big thing to get more control over inference. I'm excited to see how this will evolve.

@DarkYukon 6 ай бұрын

Great work. Good to see that regional fine tools are progressed. First Omost and now this. Matteo you are really magician.

@Mranshumansinghr 6 ай бұрын

Wow. Just tried it a few minutes ago. Its like you are in control of the prompt. Genius!!

@latent-broadcasting 6 ай бұрын

I did some quick test and at a first glance I noticed less aberrations, better hands, backgrounds that make more sense, i.e. I was getting a couch that had different sizes for the chair backrest and this fixed it, also it gets the colors and the style of the photo much better with IP-Adaptor. I'll make better test tomorrow. Thanks for sharing this!

@odw32 6 ай бұрын

For the times when you want "fast results" rather than "fine-grained control", I could imagine that it could be interesting to split a single prompt into seperated UNet block inputs, using some kind of text classification.

@pseudomonarchiadaemonum4566 4 ай бұрын

High quality content! Thanks Mateo!

@canibalcorps 6 ай бұрын

OMG it's exactly what I was thinking about! Thank you for your efforts and your works. You're a genius.

@superlucky4499 6 ай бұрын

This is incredible! Thank you for all your work!

@Sedtiny 6 ай бұрын

Matteo's creation of such precise tools elevates technology to the level of art. The specificity inherent in these tools rivals that of art itself

@PulpetMaster 6 ай бұрын

Thank you for that. I'm learning how to make lora/model and trying to understand the blocks during the training to speed up/ make model output better quality. cannot wait for findings of the tests!

@HestoySeghuro 6 ай бұрын

GOLD CONTENT. For real.

@jancijak9385 6 ай бұрын

I posted long ago, when sd1.5, came out, what we lack in the models are control. The answer from openai dale models was more natural language prompting, which is failure to understand the problem. When controlnet and ipadapters came out, it seamed like the right direction. There are other parts of the entire pipeline encoding, embedding, latent operations, which could have more nodes, to control the input/output. For example you could have different scheduling for each unet block, or have unet block from different model. I would split all the unet blocks into separate nodes.

@PamellaCardoso-pp5tr 6 ай бұрын

Yeah i already so something similar to that in the latent, where i got a workflow with 3 separate advanced ksamplers and i manually adjust the amount of denoise for each step (basically im manually scheduling) dividing the workload into 3 advanced ksamplers improves the quality of the generation BY A FUCKING LOT and you can even add newer details into it at specific areas by knowing at which step such concepts might be added (like small details usually show up closer to the end steps, while the composition is defined in the first 25% of the total steps. So definitelly making separate schedulers for each unet block would improve the AI generations by a lot

@the_infinity_time 6 ай бұрын

gonna tried today, thanks for all the work and the tutorials Matteo, you are awesome

@Timbershield 6 ай бұрын

I played with block merging a long time ago and found that about 5 of the output blocks influenced body composition, hair and clothing but i no longer have my list. From memory it was something like block 4,6,7 and 8 or 9, but i'll leave it to the experts as you only seem to have 5 output blocks and not the 10 the model merger i used had.

@jibcot8541 6 ай бұрын

This is so cool, the results look great!, I will have a play with the block impacts.

@swannschilling474 6 ай бұрын

Thank you so much for all the work and all the great content!! You are the best!! 🥰

@zef3k 6 ай бұрын

This is amazing. Been wanting a way to interact with the u-net without having to send random numbers at it that I don't know how scale. Something like this for LoRa would be pretty amazing. To only apply it to certain layers in a node like this. The conditioning here will make it easy to see where to focus.

@Seany06 6 ай бұрын

Have you checked the lora block weight extension?

@zef3k 6 ай бұрын

@@Seany06I think so, but didn't really have a grasp of what each affects and when. Still don't really, but this definitely helps. I was also thinking of trying to apply controlnet conditioning to specific i/o with this.

@fernandomasotto 6 ай бұрын

this is really interesting. Allways looking where no one else does.

@MarcSpctr 6 ай бұрын

If this gets paired up with OMOST. It's a whole different level of Image Generation we can achieve. edit: As omost currently targets Area to determine the composition, but if with the area, it can also target specific blocks, it's just next level thing.

@vitalis 6 ай бұрын

This channel is so underrated.

@latentvision 6 ай бұрын

I'm keeping a low profile

@urgyenrigdzin3775 6 ай бұрын

Wow... this looks very powerful, thank you very much!

@generalawareness101 6 ай бұрын

Very nice and your findings are what I found with peturb as I was playing around with it. Funny, as I was saying I would love to do what this does and here it is.

@PeppePascale_ 6 ай бұрын

1:51 mi son piegato dal ridere . Grazie per il video come sempre. il migliore

@kovakavics 6 ай бұрын

mate this is crazy! well done, and thank you!

@DreamStarter_1 6 ай бұрын

Very interesting, thanks for the great work!

@amorgan5844 6 ай бұрын

"And probably all Christian too".....the Irish catching strays for no reason 😂

@Mika43344 6 ай бұрын

great video as always! better prompting FINALLY!!!

@fgrbl 6 ай бұрын

Valeu!

@BubbleVolcano 6 ай бұрын

I played PAG in April, and my feeling was that saturation increased a lot, and I didn't expect it to change the content in addition to changing the brightness. It's kind of like a lora-block-weight operation, and then it's a direct checkpoint operation, right? There might be something to learn from it.I hope this to be definitive, not metaphysical / superstition. we need more precision for AI.

@jccluaviz 6 ай бұрын

Amazing !!! Genius !!! A new master piece is comming

@courtneyb6154 6 ай бұрын

If anyone could prove that some ai artists should have copyright protection over their image generations, it would be you. Definitely no "Do It" button here. Amazing stuff and thank you for taking the time to grind through all the possibilities and then breaking it down for the rest of us dummies ;-)

@nekodificador 6 ай бұрын

I knew I was smelling the week's game changer! How can I contribute?

@alu3000 6 ай бұрын

Controlling weights per block in Lora was a gamechanger for me, but this takes it on another level!

@lefourbe5596 6 ай бұрын

any doc you have to in reference that plz :) ?

@victorhansson3410 5 ай бұрын

i laughed loudly at the sudden "they're all probably christian too". great video as always matteo

@PixelPoetryxIA 6 ай бұрын

That is crazy! Good job

@Neront90 6 ай бұрын

You are the tip of the open source spear.

@Firespark81 6 ай бұрын

This is amazing!

@Foolsjoker 6 ай бұрын

Why is it every time you post something, I'm like, "this, this is the direction SD will be heading in. Everyone is going to be using this." Oh, it is because that is exactly what happens. I cannot wait to try this.

@shadystranger2866 6 ай бұрын

Astonishing! Thank you for your work! It would be great to have similar control over IPAdapter conditioning in the future

@shadystranger2866 6 ай бұрын

I guess it would be great to try this thing with Stability's Revision technique

@latentvision 6 ай бұрын

It's kinda possible already with the undocumented "mad scientist" node :)

@Paulo-ut1li 6 ай бұрын

That’s amazing work, Matteo!

@hackerthumb1551 6 ай бұрын

thank you for the excellent video and all your work :D you truly rock dude ^^ after watching i was wondering if, theoretically could the injection method be adapted to use controlnets as an input also? using the injection to target the blocks that you would want the controlnet applied too. i only ask as when using controlnets, iv observed input bleeding, similar to the prompt bleeding. it may be a way to achieve modification via controlnet without loosing as much consistency of the original character. Thank you for all your hard work and passion :)

@mariokotlar303 6 ай бұрын

I feel I'm witnessing history in the making

@skycladsquirrel 6 ай бұрын

It would be awesome to be able to customize the label all the inputs. Great work Matteo

@DJVARAO 6 ай бұрын

You man are a hero. Thanks!😁

@demondemons 6 ай бұрын

Your humour is amazing! :D

@styrke9272 6 ай бұрын

You Always bringing good surprises

@kalicromatico 6 ай бұрын

Amazing Mateo!!!!!! ❤❤❤❤

@lrkx_ 6 ай бұрын

Damn, this is genius!!!

@Lahouel 6 ай бұрын

Back to your BEST Matteo. 👋👋👋👋👋

@RamonGuthrie 6 ай бұрын

When Matteo says suffer, I just think joy! 😄

@latentvision 6 ай бұрын

no kink shaming

@tomschuelke7955 6 ай бұрын

Reminds me of a sientists who uses a MRT to identify the parts of a brain that react to language or specific words.. so you are cerating sort of a card of understanding..

@ProzacgodAI 6 ай бұрын

I wonder what a textual inversion would do on this, like the character turn around, in some cases the details of the character can be lost. This makes me think that you could use charturner on just one of these inputs, ipadapter for a character reference, and the prompts to help guide it a bit.

@solidkundi 6 ай бұрын

you are the Satoshi of Stable Diffusion!

@ahminlaffet3555 3 ай бұрын

I get a tensor size error when i try.. Also, if i try to write a patched model with save checkpoint, it does not seem to get included into the result. I believe the error is still there, just being ignored. When i render the result model in a ksampler, it throws the different tensor size error, and wont continue. torch 2.4.1 and current version. "stack expects each tensor to be equal size"

@beveresmoor 6 ай бұрын

Such an interesting finding. I have to try it myself. 👍😘

@dmcdcm 6 ай бұрын

I’d be interested to test out control net conditioning sent to specific blocks

@Hearcharted 6 ай бұрын

You are a very smart person!

@latentvision 6 ай бұрын

I wish...

@Hearcharted 6 ай бұрын

@@latentvision LOL Man, you are very humble too ;)

@tomaslindholm9780 6 ай бұрын

I have actually been experimenting with Kohya deep shrink to increase resolution doing blockwise (0-32) alternations. Now, with this, if i prompt for attention to body proportions on in8, it seems like I get much better output for full body poses. In Kohya, having a downscale factor 2, on block 3 with a ending weight between 0.35 and 0.65 seems to do the trick producing double sdxl resolution output.

@aronsingh9243 2 ай бұрын

how are you generating the landscape without a prompt?

@bordignonjunior 6 ай бұрын

amazing work !

@matthewharrison3813 6 ай бұрын

How do the layers map to the denoise process? Might the latter layers be good for detail prompts?

@andreh4859 6 ай бұрын

Cool stuff! Is it only working for Turbo models? I've got am error at the KSampler

@throttlekitty1 6 ай бұрын

How about a debug sort of node that takes a prompt, and outputs a set of images for each unet block in isolation? Maybe useful, but I suspect this is going to vary a lot between prompts and the finetune being used. I remember seeing heatmap style visualizations for attention heads in the past, maybe that can be done here?

@madmushroom8639 6 ай бұрын

Yer a wizard, Matteo!

@jochaboon 6 ай бұрын

thanks!!

@wndrflx 6 ай бұрын

What does the ConditioningZeroOut do?

@Ymirheim 6 ай бұрын

I'm looking forward to experiment with this! One thing that stands out to me out of the gate though. Shouldn't I get the same render if I pass the same prompt to a regular sampler and to the all pin on the injector while using the same seed for both? What is the technical reason for getting two slightly different images rather than two exact duplicates?

@latentvision 6 ай бұрын

the results is very similar but not the same because the embeds are applied in different places

@SBaldo8 6 ай бұрын

Do you think that something going into input 4 is still affecting everything after it or by design of this node only that specific part of the unet is affected? Very interesting stuff I'd love to contribute maybe setting up a wildcard setup w random connections or even just by letting my gpu time going w your custom workflow and reporting results

@mussabekovdaniyar8157 6 ай бұрын

How it works with ControlNet? We can use separate control nets to each block for better influence?

@KDawg5000 6 ай бұрын

Preventing color bleeding would be nice. If there was a way to tease out which blocks look at color for foreground objects vs background, that would be useful.

@Neront90 6 ай бұрын

Yes, we need download this and report our findings

@kmdcompelelct 6 ай бұрын

I'm wondering if this can be applied to training LoRAs and doing fine tuning?

@latentvision 6 ай бұрын

I it might help with fine tuning yeah

@BuffPuffer 6 ай бұрын

Yes, this is already possible with B-LoRA. Look it up.

@Falkonar 6 ай бұрын

Genius ❤

@allhailthealgorithm 6 ай бұрын

You might be able to use an LLM to automatically try prompting different blocks and a different model to analyze the outputs, like the RAM++ model...

@lonelyeyedlad769 6 ай бұрын

Wonderful idea! Trying to test it out now. Ran into an error off the bat aha. Have you ever seen this by chance? 'Error occurred when executing KSampler: stack expects each tensor to be equal size, but got [1, 231, 2048] at entry 0 and [1, 77, 2048] at entry 1'

@mdx-fm3vj 6 ай бұрын

it seems there is certain max number of tokens per prompt, shortening each prompt fixes this (for me)

@latentvision 6 ай бұрын

yeah at the moment it only works with simple prompts (no concat or long prompts). I'll fix that if there's enough interest

@lonelyeyedlad769 6 ай бұрын

@@latentvision No worries. Thank you for the help!

@timothywcrane 6 ай бұрын

I know this is stable diffusion, but could this same arch be put to use to in CLIP/VQGAN? I have TBs of retro (in AI time) complete CLIP/VQGAN step reels with known seeds ;) .

@omegablast2002 6 ай бұрын

this is amazing

@christianholl7924 6 ай бұрын

I guess the tokens will be concatinated into each block? Could they also be replaced?

@amorgan5844 6 ай бұрын

Question. How much do loras have an effect on the input and output?

@latentvision 6 ай бұрын

how do you mean? we work on the cross attention, the lora weights have been already added at that point. depends on the lora the kind of influence that it has

@vintagegenious 6 ай бұрын

We could choose which blocks to apply lora weights to

@amorgan5844 6 ай бұрын

@latentvision like vintage put it, loras only influencing certain blocks. I haven't had time to test what I'm asking so it may not be a very well structured question. I'll ha e some time this weekend and will come back🤣

@amorgan5844 6 ай бұрын

@@vintagegenious well put, that's what my brain was struggling to spit out😂 good job mate.

@vintagegenious 6 ай бұрын

@@amorgan5844 😁

@MrPaPaYa86 6 ай бұрын

We need to come up with a system for systematic testing and reporting of blocks functions in stable diffusion, so they can be added to the models information in civitAI

@latentvision 6 ай бұрын

ah! that would be great!

@xyem1928 5 ай бұрын

It seems like it would be fairly easy to do. We'd just need to build a set of keywords along with what they represent (e.g. composition, pose, medium) and then present the "base" image as well as the one with the block prompt and have a user rating of 1) how different it is from the base and 2) how well the second one represents the keyword. This would identify which blocks were sensitive to which concept class and concept.