This is pretty incredible. That’s some serious fine control and starting to feel like a real tool rather than a lottery game.
@beatemero67186 ай бұрын
Dude..... Another Game changer. Honestly, your work helps the most with composition. With further development of this node we can have specific prompt inputs of things like subject, stlye, maybe even background and such. I gave up on prompting and always prompt only the bare minimum, while using mainly img2img and the different ip Adapter models. This will help a lot with prompting way more precisely. Thanks for all the work you are doing.
@nissessinАй бұрын
Thanks!
@bobbobster86196 ай бұрын
Hello, just want to say that I really appreciate your delivery of your comfyui concepts. Out of the several content videos I learn more from yours. Keep up to great job! I look forward to your videos.
@ysy696 ай бұрын
Amazing! I was reading an article titled "Anthropic's Breakthrough: Understanding Frontier AI" by Ignacio de Gregorio Nobleja where they trained a sparse autoencoder (SAE) model that could' dissect’ neuron activations into more fine-grained data - the interpretable features - but also reconstruct the original data - going back to the original activations. Your R&D made me think about this article.
@neofuturist6 ай бұрын
The architect strikes again!!
@AbsolutelyForward6 ай бұрын
The part where you just changed the haircut (prompt) without loosing the rest of the image made me realize the potential of this technique - fascinating ❤
@bgtubber6 ай бұрын
So cool! Experiments like these push progress forward.
@aivideos3226 ай бұрын
You keep coming out with things you would think the creators of the models would have thought of. Great work as always
@joeljoonatankaartinen34696 ай бұрын
It just might be that the creator who understands the model structure and functioning to the degree required to get this idea, doesn't actually exist. A lot of the progress has been iterative papers where the author takes an existing structure, changes a couple of things and makes a new paper from the results. Also, to get an understanding of the functioning of the model requires using the model a lot, which is not necessarily the thing that people who designed the model are interested in doing. There can also be a kind of blindness that comes from being too deep with the models. You can end up with observations from early versions that are no longer true with later versions that you never end up rechecking and that can blind you to possibilities. It's very often the case that breakthroughs happen from someone who doesn't understand working towards understanding. Someone who doesn't understand has a more flexible mind and is thus more likely to discover something new than someone who already has an understanding.
@lefourbe55966 ай бұрын
it's time for me to target train my LORA on specific layer depending on subject !!! it's gonna be so GOOOOOD THANKS YOU !!!!
@alu30006 ай бұрын
my thoughts exactly!
@Neront906 ай бұрын
Hey, i want that too! :D
@hilbrandbos6 ай бұрын
Can you already target a layer while training?
@lefourbe55966 ай бұрын
@@hilbrandbos yes in standard LORA. I do like LoCon beter but i found that kohya SS doesn't let us target blocks (with Locon)
@hilbrandbos6 ай бұрын
@@lefourbe5596 Oh I have to look that feature up in Kohya then... of course we first have to find out what block does what, it would be really handy if you're training a style you'd only address the style block.
@ArrowKnow6 ай бұрын
every time I run into a specific problem you release a video with the solution within a week. Thank you so much I was wondering how to do this to stop a bleeding issue last night! Excited to try this out. Keep up the amazing work.
@catalystzerova6 ай бұрын
This is the first time I’ve badly wanted to contribute to a project
@latentvision6 ай бұрын
DO IT!
@APrettyGoodChannel6 ай бұрын
There was a paper a long time back about doing Textual Inversion this way, an embedding per block rather than an embedding for the whole model, which apparently gave much better results.
@ryo0ka9366 ай бұрын
definitely like to see that in play!
@Nikki29oC5 ай бұрын
Do you remember the title of the paper? I'd like to check it out. Thx❤
@APrettyGoodChannel5 ай бұрын
@@Nikki29oC Unfortunately not, it was probably 1.5 years ago now.
@morenofranco92356 ай бұрын
This is incredible. I will have towatch it two or three time more to get a real understanding. Thanks for the lesson.
@jasondulin73765 ай бұрын
Just starting this video but want to say: You are a great teacher and your sense of humor is spot on!
@dadekennedy97126 ай бұрын
This is absolutely amazing. I look forward to seeing the development of this process.
@davidb80576 ай бұрын
Another gem from Matteo, thank you! It's indeed very promising.
@dck70486 ай бұрын
Thank you for sharing your ideas Matteo. My bet is that this isn't something that has never been conceptualized before- it likely has, but as so many other breakthroughs it's locked behind close source. This is probably the most exciting news I've seen for gen AI in a while, definitely the seed of something big. Great work!
@2shinrei6 ай бұрын
Wow! Matteo, you deliver as always. This could be the next big thing to get more control over inference. I'm excited to see how this will evolve.
@DarkYukon6 ай бұрын
Great work. Good to see that regional fine tools are progressed. First Omost and now this. Matteo you are really magician.
@Mranshumansinghr6 ай бұрын
Wow. Just tried it a few minutes ago. Its like you are in control of the prompt. Genius!!
@latent-broadcasting6 ай бұрын
I did some quick test and at a first glance I noticed less aberrations, better hands, backgrounds that make more sense, i.e. I was getting a couch that had different sizes for the chair backrest and this fixed it, also it gets the colors and the style of the photo much better with IP-Adaptor. I'll make better test tomorrow. Thanks for sharing this!
@odw326 ай бұрын
For the times when you want "fast results" rather than "fine-grained control", I could imagine that it could be interesting to split a single prompt into seperated UNet block inputs, using some kind of text classification.
@pseudomonarchiadaemonum45664 ай бұрын
High quality content! Thanks Mateo!
@canibalcorps6 ай бұрын
OMG it's exactly what I was thinking about! Thank you for your efforts and your works. You're a genius.
@superlucky44996 ай бұрын
This is incredible! Thank you for all your work!
@Sedtiny6 ай бұрын
Matteo's creation of such precise tools elevates technology to the level of art. The specificity inherent in these tools rivals that of art itself
@PulpetMaster6 ай бұрын
Thank you for that. I'm learning how to make lora/model and trying to understand the blocks during the training to speed up/ make model output better quality. cannot wait for findings of the tests!
@HestoySeghuro6 ай бұрын
GOLD CONTENT. For real.
@jancijak93856 ай бұрын
I posted long ago, when sd1.5, came out, what we lack in the models are control. The answer from openai dale models was more natural language prompting, which is failure to understand the problem. When controlnet and ipadapters came out, it seamed like the right direction. There are other parts of the entire pipeline encoding, embedding, latent operations, which could have more nodes, to control the input/output. For example you could have different scheduling for each unet block, or have unet block from different model. I would split all the unet blocks into separate nodes.
@PamellaCardoso-pp5tr6 ай бұрын
Yeah i already so something similar to that in the latent, where i got a workflow with 3 separate advanced ksamplers and i manually adjust the amount of denoise for each step (basically im manually scheduling) dividing the workload into 3 advanced ksamplers improves the quality of the generation BY A FUCKING LOT and you can even add newer details into it at specific areas by knowing at which step such concepts might be added (like small details usually show up closer to the end steps, while the composition is defined in the first 25% of the total steps. So definitelly making separate schedulers for each unet block would improve the AI generations by a lot
@the_infinity_time6 ай бұрын
gonna tried today, thanks for all the work and the tutorials Matteo, you are awesome
@Timbershield6 ай бұрын
I played with block merging a long time ago and found that about 5 of the output blocks influenced body composition, hair and clothing but i no longer have my list. From memory it was something like block 4,6,7 and 8 or 9, but i'll leave it to the experts as you only seem to have 5 output blocks and not the 10 the model merger i used had.
@jibcot85416 ай бұрын
This is so cool, the results look great!, I will have a play with the block impacts.
@swannschilling4746 ай бұрын
Thank you so much for all the work and all the great content!! You are the best!! 🥰
@zef3k6 ай бұрын
This is amazing. Been wanting a way to interact with the u-net without having to send random numbers at it that I don't know how scale. Something like this for LoRa would be pretty amazing. To only apply it to certain layers in a node like this. The conditioning here will make it easy to see where to focus.
@Seany066 ай бұрын
Have you checked the lora block weight extension?
@zef3k6 ай бұрын
@@Seany06I think so, but didn't really have a grasp of what each affects and when. Still don't really, but this definitely helps. I was also thinking of trying to apply controlnet conditioning to specific i/o with this.
@fernandomasotto6 ай бұрын
this is really interesting. Allways looking where no one else does.
@MarcSpctr6 ай бұрын
If this gets paired up with OMOST. It's a whole different level of Image Generation we can achieve. edit: As omost currently targets Area to determine the composition, but if with the area, it can also target specific blocks, it's just next level thing.
@vitalis6 ай бұрын
This channel is so underrated.
@latentvision6 ай бұрын
I'm keeping a low profile
@urgyenrigdzin37756 ай бұрын
Wow... this looks very powerful, thank you very much!
@generalawareness1016 ай бұрын
Very nice and your findings are what I found with peturb as I was playing around with it. Funny, as I was saying I would love to do what this does and here it is.
@PeppePascale_6 ай бұрын
1:51 mi son piegato dal ridere . Grazie per il video come sempre. il migliore
@kovakavics6 ай бұрын
mate this is crazy! well done, and thank you!
@DreamStarter_16 ай бұрын
Very interesting, thanks for the great work!
@amorgan58446 ай бұрын
"And probably all Christian too".....the Irish catching strays for no reason 😂
@Mika433446 ай бұрын
great video as always! better prompting FINALLY!!!
@fgrbl6 ай бұрын
Valeu!
@BubbleVolcano6 ай бұрын
I played PAG in April, and my feeling was that saturation increased a lot, and I didn't expect it to change the content in addition to changing the brightness. It's kind of like a lora-block-weight operation, and then it's a direct checkpoint operation, right? There might be something to learn from it.I hope this to be definitive, not metaphysical / superstition. we need more precision for AI.
@jccluaviz6 ай бұрын
Amazing !!! Genius !!! A new master piece is comming
@courtneyb61546 ай бұрын
If anyone could prove that some ai artists should have copyright protection over their image generations, it would be you. Definitely no "Do It" button here. Amazing stuff and thank you for taking the time to grind through all the possibilities and then breaking it down for the rest of us dummies ;-)
@nekodificador6 ай бұрын
I knew I was smelling the week's game changer! How can I contribute?
@alu30006 ай бұрын
Controlling weights per block in Lora was a gamechanger for me, but this takes it on another level!
@lefourbe55966 ай бұрын
any doc you have to in reference that plz :) ?
@victorhansson34105 ай бұрын
i laughed loudly at the sudden "they're all probably christian too". great video as always matteo
@PixelPoetryxIA6 ай бұрын
That is crazy! Good job
@Neront906 ай бұрын
You are the tip of the open source spear.
@Firespark816 ай бұрын
This is amazing!
@Foolsjoker6 ай бұрын
Why is it every time you post something, I'm like, "this, this is the direction SD will be heading in. Everyone is going to be using this." Oh, it is because that is exactly what happens. I cannot wait to try this.
@shadystranger28666 ай бұрын
Astonishing! Thank you for your work! It would be great to have similar control over IPAdapter conditioning in the future
@shadystranger28666 ай бұрын
I guess it would be great to try this thing with Stability's Revision technique
@latentvision6 ай бұрын
It's kinda possible already with the undocumented "mad scientist" node :)
@Paulo-ut1li6 ай бұрын
That’s amazing work, Matteo!
@hackerthumb15516 ай бұрын
thank you for the excellent video and all your work :D you truly rock dude ^^ after watching i was wondering if, theoretically could the injection method be adapted to use controlnets as an input also? using the injection to target the blocks that you would want the controlnet applied too. i only ask as when using controlnets, iv observed input bleeding, similar to the prompt bleeding. it may be a way to achieve modification via controlnet without loosing as much consistency of the original character. Thank you for all your hard work and passion :)
@mariokotlar3036 ай бұрын
I feel I'm witnessing history in the making
@skycladsquirrel6 ай бұрын
It would be awesome to be able to customize the label all the inputs. Great work Matteo
@DJVARAO6 ай бұрын
You man are a hero. Thanks!😁
@demondemons6 ай бұрын
Your humour is amazing! :D
@styrke92726 ай бұрын
You Always bringing good surprises
@kalicromatico6 ай бұрын
Amazing Mateo!!!!!! ❤❤❤❤
@lrkx_6 ай бұрын
Damn, this is genius!!!
@Lahouel6 ай бұрын
Back to your BEST Matteo. 👋👋👋👋👋
@RamonGuthrie6 ай бұрын
When Matteo says suffer, I just think joy! 😄
@latentvision6 ай бұрын
no kink shaming
@tomschuelke79556 ай бұрын
Reminds me of a sientists who uses a MRT to identify the parts of a brain that react to language or specific words.. so you are cerating sort of a card of understanding..
@ProzacgodAI6 ай бұрын
I wonder what a textual inversion would do on this, like the character turn around, in some cases the details of the character can be lost. This makes me think that you could use charturner on just one of these inputs, ipadapter for a character reference, and the prompts to help guide it a bit.
@solidkundi6 ай бұрын
you are the Satoshi of Stable Diffusion!
@ahminlaffet35553 ай бұрын
I get a tensor size error when i try.. Also, if i try to write a patched model with save checkpoint, it does not seem to get included into the result. I believe the error is still there, just being ignored. When i render the result model in a ksampler, it throws the different tensor size error, and wont continue. torch 2.4.1 and current version. "stack expects each tensor to be equal size"
@beveresmoor6 ай бұрын
Such an interesting finding. I have to try it myself. 👍😘
@dmcdcm6 ай бұрын
I’d be interested to test out control net conditioning sent to specific blocks
@Hearcharted6 ай бұрын
You are a very smart person!
@latentvision6 ай бұрын
I wish...
@Hearcharted6 ай бұрын
@@latentvision LOL Man, you are very humble too ;)
@tomaslindholm97806 ай бұрын
I have actually been experimenting with Kohya deep shrink to increase resolution doing blockwise (0-32) alternations. Now, with this, if i prompt for attention to body proportions on in8, it seems like I get much better output for full body poses. In Kohya, having a downscale factor 2, on block 3 with a ending weight between 0.35 and 0.65 seems to do the trick producing double sdxl resolution output.
@aronsingh92432 ай бұрын
how are you generating the landscape without a prompt?
@bordignonjunior6 ай бұрын
amazing work !
@matthewharrison38136 ай бұрын
How do the layers map to the denoise process? Might the latter layers be good for detail prompts?
@andreh48596 ай бұрын
Cool stuff! Is it only working for Turbo models? I've got am error at the KSampler
@throttlekitty16 ай бұрын
How about a debug sort of node that takes a prompt, and outputs a set of images for each unet block in isolation? Maybe useful, but I suspect this is going to vary a lot between prompts and the finetune being used. I remember seeing heatmap style visualizations for attention heads in the past, maybe that can be done here?
@madmushroom86396 ай бұрын
Yer a wizard, Matteo!
@jochaboon6 ай бұрын
thanks!!
@wndrflx6 ай бұрын
What does the ConditioningZeroOut do?
@Ymirheim6 ай бұрын
I'm looking forward to experiment with this! One thing that stands out to me out of the gate though. Shouldn't I get the same render if I pass the same prompt to a regular sampler and to the all pin on the injector while using the same seed for both? What is the technical reason for getting two slightly different images rather than two exact duplicates?
@latentvision6 ай бұрын
the results is very similar but not the same because the embeds are applied in different places
@SBaldo86 ай бұрын
Do you think that something going into input 4 is still affecting everything after it or by design of this node only that specific part of the unet is affected? Very interesting stuff I'd love to contribute maybe setting up a wildcard setup w random connections or even just by letting my gpu time going w your custom workflow and reporting results
@mussabekovdaniyar81576 ай бұрын
How it works with ControlNet? We can use separate control nets to each block for better influence?
@KDawg50006 ай бұрын
Preventing color bleeding would be nice. If there was a way to tease out which blocks look at color for foreground objects vs background, that would be useful.
@Neront906 ай бұрын
Yes, we need download this and report our findings
@kmdcompelelct6 ай бұрын
I'm wondering if this can be applied to training LoRAs and doing fine tuning?
@latentvision6 ай бұрын
I it might help with fine tuning yeah
@BuffPuffer6 ай бұрын
Yes, this is already possible with B-LoRA. Look it up.
@Falkonar6 ай бұрын
Genius ❤
@allhailthealgorithm6 ай бұрын
You might be able to use an LLM to automatically try prompting different blocks and a different model to analyze the outputs, like the RAM++ model...
@lonelyeyedlad7696 ай бұрын
Wonderful idea! Trying to test it out now. Ran into an error off the bat aha. Have you ever seen this by chance? 'Error occurred when executing KSampler: stack expects each tensor to be equal size, but got [1, 231, 2048] at entry 0 and [1, 77, 2048] at entry 1'
@mdx-fm3vj6 ай бұрын
it seems there is certain max number of tokens per prompt, shortening each prompt fixes this (for me)
@latentvision6 ай бұрын
yeah at the moment it only works with simple prompts (no concat or long prompts). I'll fix that if there's enough interest
@lonelyeyedlad7696 ай бұрын
@@latentvision No worries. Thank you for the help!
@timothywcrane6 ай бұрын
I know this is stable diffusion, but could this same arch be put to use to in CLIP/VQGAN? I have TBs of retro (in AI time) complete CLIP/VQGAN step reels with known seeds ;) .
@omegablast20026 ай бұрын
this is amazing
@christianholl79246 ай бұрын
I guess the tokens will be concatinated into each block? Could they also be replaced?
@amorgan58446 ай бұрын
Question. How much do loras have an effect on the input and output?
@latentvision6 ай бұрын
how do you mean? we work on the cross attention, the lora weights have been already added at that point. depends on the lora the kind of influence that it has
@vintagegenious6 ай бұрын
We could choose which blocks to apply lora weights to
@amorgan58446 ай бұрын
@latentvision like vintage put it, loras only influencing certain blocks. I haven't had time to test what I'm asking so it may not be a very well structured question. I'll ha e some time this weekend and will come back🤣
@amorgan58446 ай бұрын
@@vintagegenious well put, that's what my brain was struggling to spit out😂 good job mate.
@vintagegenious6 ай бұрын
@@amorgan5844 😁
@MrPaPaYa866 ай бұрын
We need to come up with a system for systematic testing and reporting of blocks functions in stable diffusion, so they can be added to the models information in civitAI
@latentvision6 ай бұрын
ah! that would be great!
@xyem19285 ай бұрын
It seems like it would be fairly easy to do. We'd just need to build a set of keywords along with what they represent (e.g. composition, pose, medium) and then present the "base" image as well as the one with the block prompt and have a user rating of 1) how different it is from the base and 2) how well the second one represents the keyword. This would identify which blocks were sensitive to which concept class and concept.
@Latentnaut2 ай бұрын
Ey Mateo, Is this working for Flux? and BTW, are we going to have flux ipadapters soon?
@MilesBellas6 ай бұрын
A node to convert into a hdr file, using physically accurate light, would be amazing! Export .exr files too ?!