Animatediff perfect scenes. Any background with conditional masking. ComfyUI Animation

Рет қаралды 9,853

Күн бұрын

Пікірлер: 42

@Foolsjoker 10 ай бұрын

I had been trying to do this workflow for almost a month, but I could never get the foreground and background to merge correctly. Obviously, mine had some major missing components compared to this. So glad you posted. Thank you!

@koalanation 10 ай бұрын

Glad I could help! I tried several tricks: masking, inpainting, trying to add correction layers in the video editor...so it also took me a while to find out the way to do how I want it.

@skaramicke 6 ай бұрын

Couldn't you just reuse the mask from the compositing step when isolating background from foreground in the later stages?

@koalanation 6 ай бұрын

The mask in the compounding is only applied to one image. For the background/foreground: we are creating an individual mask for each of the video frames. The first is static, the second 'dyanamic', so to say. I hope this resolves your doubts!

@skaramicke 6 ай бұрын

@@koalanation yes of course! Didn’t think of that.

@lukaso2258 7 ай бұрын

Hey Koala, thank you very much for this guide, exactly what i needed. I have one question, this works very well with merging two conditions into one scene, but what if im using separete IPAdapter for background and character? I found way to merge the two IPAdapter outputs, but cant find way a to mask each IPA model for purpose of character and background. Do you see any solution for this? (In my workflow im doing lowres output of char and background first, then upscaling both and now im figuring out how to run it throught another sampler and properly merging them together) Thanks again for ur work

@koalanation 7 ай бұрын

Nowadays, in the Apply IPAdapter node, there is the possible to use 'attn_mask', so you can use the two separated images (foreground/background). This gives you more flexibility regarding the type of IP adapter, strength, use of batches.... When I was preparing the video, that was still not possible. You can also use CN with masks. So having different layers is possible in several ways. Results will be slightly different, though, depending on how you do it. Good luck

@lukaso2258 7 ай бұрын

@@koalanation You are legend, it works :) Thank you!

@StyleofPI 4 ай бұрын

Load Image I change to Load Video, does it work? Video to Video

@koalanation 4 ай бұрын

You can use the Load Video node too for the controlnet reference images.

@dkamhaji 10 ай бұрын

hey yo! super great video and many interesting techniques going on here. I will definitely be integrating this into my workflow. so I do have a question though, I get you are moving the character with the open pose animation. but how is the background moving? (and camera) are you using some video input to drive that or something else like a motion lora?

@koalanation 10 ай бұрын

For the background, I have used this pexels video as a base: tinyurl.com/yn4y8bdf I reversed the video in ezgif first. In the workflow, I tested a few preprocessors to see which ones work the best, and adjusted how many frames per second match better with the foreground. In this case, Zoe depth maps and MLSD work well. I adjusted the frequency of the frames for one every 3 frames, starting from frame 90 (in a VHS Load Video node). To avoid running the preprocessors all the time during my tests, I just extracted the same number of frames as in the OpenPose and saved them as images and used them in the final workflow.

@dkamhaji 8 ай бұрын

Hello!@@koalanationIm building a workflow that has similar intentions to yours here - but with a different slant. Im just using seg masks to separate the BG from the Character and applying separate masks to each IP to influence the character and the background separately. everything works great except that I'm trying to apply the motion from the original input video to the new background created by the attention masked ip adpater. is there a world we can discuss this further to try to find some possible solutions? I would love to share this with you

@AIImaGENation 9 ай бұрын

I run the workflow and get this error:Error occurred when executing IPAdapterApply: Error(s) in loading state_dict for Resampler: size mismatch for proj_in.weight: copying a param with shape torch.Size([768, 1280]) from checkpoint, the shape in current model is torch.Size([768, 1664]). File "/root/ComfyUI/execution.py", line 153, in recursive_execute output_data, output_ui = get_output_data(obj, input_data_all) File "/root/ComfyUI/execution.py", line 83, in get_output_data return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True) File "/root/ComfyUI/execution.py", line 76, in map_node_over_list results.append(getattr(obj, func)(**slice_dict(input_data_all, i))) File "/root/ComfyUI/custom_nodes/ComfyUI_IPAdapter_plus/IPAdapterPlus.py", line 426, in apply_ipadapter self.ipadapter = IPAdapter( File "/root/ComfyUI/custom_nodes/ComfyUI_IPAdapter_plus/IPAdapterPlus.py", line 175, in __init__ self.image_proj_model.load_state_dict(ipadapter_model["image_proj"]) File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2152, in load_state_dict raise RuntimeError('Error(s) in loading state_dict for {}: \t{}'.format(

@AIImaGENation 9 ай бұрын

Thanks if you have time to help solve this problem

@koalanation 9 ай бұрын

Check out which model version and clipvision and IP adapter models you are using. I think this error is because maybe you are using a SDXL model. Change the checkpoint or the IP Adapter model and/or the clipvision

@Disco_Tek 10 ай бұрын

Any idea how to keep consistent color for items like clothing in vid2vid? Also... you can rotoscope in Comfyui now?

@koalanation 10 ай бұрын

For clothing consistency, I think playing with masks and SAM detector (with for example, deepfashion2) it should be possible. But personally I have struggled to get the masks correctly for all frames (with other animations). I did a video using Trackanything, which I think can track clothes nicely. I believe with the right workflow it should be possible to do nicer things, but playing to the masks is not straightforward, so I did not elaborated further. Regarding rotoscoping: yes, with SAM is possible, but I find easier and faster use video editors (Premiere, DaVinci...). When rotoscoping, you may eventually need to correct some frames, and with ComfyUI becomes a very tedious task. TrackAnything is more user friendly, for adjustments, but it is a pity is not really maintained or integrated into ComfyUI (that I am aware)

@Disco_Tek 10 ай бұрын

@@koalanation thanks for the reply. Yeah there has to be a way for consistent clothing and some lora's have helped but things like colors constantly want to shift. As far as rotoscoping I didn't know that was possible and normally just stick with runwayml if I need to do it.

@aivideos322 10 ай бұрын

@@Disco_Tek animate diff, use 24 frame context length, and 8 context stride, works for 48 frames, keep any text prompt short and dont repeat, (wool scarf, red scarf) is not good, wool red scarf works and do not mention scarf again in the prompt. If you want to describe it better, reword the original like. wool textured red long scarf. Prompting is very important, as is the model you choose.

@Disco_Tek 10 ай бұрын

@@aivideos322 I've been using a context length of 16 and a overlap of 12 lately with pretty good results. I will mess with prompts though the next time I trying running without a LORA. I'm usually then just using the upscaler to get me home. Any suggestions for color bleed for when I add color to clothing item to prevent it from polluting the rest of the image?

@aivideos322 10 ай бұрын

@@Disco_Tek colour bleed is a problem for even images, and I have found no real solution to that. For upscaling, you can use tile/lineart/temporalnet control nets to upscale reliably and full denoise the video for double or triple sizes. Can even colour the videos with a different model at this step. You can give more details in this prompt that tend to work better for colouring things. This step does not use the animate diff model, it uses whatever model you want and controlnet so it has more freedom to colour what your prompt says. I use impact pack nodes to turn batches into lists before the upscale to lower the memory used and allow larger upscales. This does each frame 1 by 1.

@TheNewOption 10 ай бұрын

Damn I'm behind on AI stuff, haven't seen this UI and is this a new version of SD?

@koalanation 10 ай бұрын

Yep, everything goes quick lately...but you will catch up, no worries.

@happytoilet1 10 ай бұрын

Good stuff. many thanks. if the scene is not generated by SD, say it's a real photo taken by a camera, can SD still merge character and the scene? Thank you. @@koalanation

@koalanation 10 ай бұрын

Hi, thanks to you! I think so...but take into account that the output is also affected by the model and the prompt you use. I am more fan of using cartoon and anime animations, but I think if you use realistic models (such as realistic vision), I think you will get what you are aiming for. At the end, there is quite a bit of experimentation here. Change and play also with the weights of the adapters.

@happytoilet1 10 ай бұрын

thank you for your advice. Really appreciate it. @@koalanation

@eyesta 6 ай бұрын

Good video. Slightly different question. I made vid to vid in comfyui, my background changes, but I have a static background to replace, how to render the model/character on a green background like you have in this video?

@koalanation 6 ай бұрын

There are several custom nodes that do that Check rembg, for example: github.com/Jcd1230/rembg-comfyui-node or Was Node suit. However, this go frame by frame and you will need to review them. I made a video using segmentation with track anything, but no one has developed a comfyui node/tool. It used to work very nicely, but I have not used it for a while: kzbin.info/www/bejne/fqC3n4euodyXe9ksi=Pnlr-YUo-YmRz8UL At the end, I think it is easier and faster to use video editing software with rotoscope features: adobe premiere, DaVinci resolve or Runway.ml. I personally use runway.ml, but choose what you prefer

@eyesta 6 ай бұрын

ty!@@koalanation

@matthewma7886 8 ай бұрын

Great Workflow!That's what im looking for. But I run the workflow and get this error: Error occurred when executing ConditioningSetMaskAndCombine: too many values to unpack (expected 3) Does anyone know how to fix it?Thanks a lot:)

@matthewma7886 8 ай бұрын

Try several ways,finally got the reason.the bug comes from the growmaskwithblur node,and the blur_radius.if blur_radius is not 0,the error happen.I thing there is a bug in this version of Browmaskwithblur.can use baussian blur mask or mask blur instead of blur function until the next version.

@koalanation 8 ай бұрын

Hi! Thanks for checking it out! I had some time to look at it. As you say, it seems the error comes from the GrowWithMaskBlur Node. I checked the workflow and it seems like this Node has changed. The numbers are swapped, the blur radius of 20 appears in the lerp alpha field. And there is no sigma parameter... I have changed the values according to what is shown in the video (blur radius 20, lerp alpha 1 and decay factor 1, no sigma anymore), and the workflow works for me.

@matthewma7886 8 ай бұрын

@@koalanation All right,bro.Great appreciate for your work:-)

@Cioccolata-m7l 10 ай бұрын

do you think this workflow will work with 8gb vram?

@koalanation 10 ай бұрын

I understand with AnimatedDiff you need 10 but I have read people can also do it with less. In this workflow, though, we are in reality doing one render for the foreground and another for the background, so it actually takes longer...However. with LCM you can decrease the render time quite a lot. I just made a video about it and I am really happy with how LCM works (with the right settings)

@Cioccolata-m7l 10 ай бұрын

@@koalanation Cool, right now I am using animatediff with 2 cn on only 8gb 😅I will try your workflow though.

@SparkFlowAAA 8 ай бұрын

Great tutorial and method!! I have an isse with ConditioningSetMask: Error occurred when executing ConditioningSetMask: too many values to unpack (expected 3). If u can help would be awesome. Thank you Error log: `Error occurred when executing ConditioningSetMask: too many values to unpack (expected 3) File "/workspace/ComfyUI/execution.py", line 154, in recursive_execute output_data, output_ui = get_output_data(obj, input_data_all) File "/workspace/ComfyUI/execution.py", line 84, in get_output_data return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True) File "/workspace/ComfyUI/execution.py", line 77, in map_node_over_list results.append(getattr(obj, func)(**slice_dict(input_data_all, i))) File "/workspace/ComfyUI/nodes.py", line 209, in append _, h, w = mask.shape`

@koalanation 8 ай бұрын

Hi! It seems there were some changes in the GrowMasWithBlur Node. Can you change, in that node (at the bottom in Mask Foreground Group), and change make sure the values in the node are: blur radius = 20, lerp alpha = 1.0 and decay factor = 1?