Creating an uber-realistic video animation from an avatar with Stable Diffusion

Рет қаралды 73,500

Күн бұрын

This tutorial will guide you through the process of creating an avatar with ReadyPlayerMe, animating it in Mixamo, building a 3d scene around it in Blender and feeding this scene into StableDiffusion Automatic1111 to create a video animation, using an uber-realistic custom model with the Deforum and ControlNet extensions.
To watch some more great music videos created with StableDiffusion, Unreal Engine and Blender, visit our KZbin music channel:
/ @-vero-
----------------------------
Chapters:
00:00 Intro
00:20 Creating an avatar with ReadyPlayerMe
01:45 Preparing the avatar for uploading it to Mixamo
03:00 Animating the avatar in Mixamo and importing it into Blender
05:41 Importing a 3d background scene from Sketchfab into blender and render the scene
10:19 Preparing Automatic1111 for a hyper-realistic render
12:13 Using the Deforum and ControlNet extension for rendering the animation
14:47 Creating the final video
Link to the ReadyPlayerMe website for creating free 3d avatars:
readyplayer.me/
Link to the Mixamo website:
www.mixamo.com/
Link to Sketchfab for downloading free 3d models:
sketchfab.com/feed
Downloading custom models for Stable Diffusion:
civitai.com/
Deforum Extension - download and infos:
github.com/deforum-art/deforu...
Controlnet Extension - download and infos:
github.com/Mikubill/sd-webui-...
Downloading models for the Controlnet extension:
huggingface.co/lllyasviel/Con...
Download Blender:
www.blender.org/download/
-----------------------------------------
Local installation guide for Automatic1111 on a Windows-PC:
stable-diffusion-art.com/inst...
and on a Mac with Apple Silicon:
stable-diffusion-art.com/inst...
Some great KZbin channels, covering Stable Diffusion:
/ @oliviosarikas
/ @nerdyrodent
/ @sebastiankamph
/ @promptmuse
#stablediffusion #automatic1111 #deforum #controlnet #blender #mixamo #tutorial #readyplayerme

Пікірлер: 94

@STVinMotion Жыл бұрын

I think that showing the outcome at the beginning of the video and only then starting explaining how it's done can be a kind of a "hook" for viewers. Keep it up!

@-RenderRealm- Жыл бұрын

Thanks, I will :-)

@darkgenix Жыл бұрын

what in the jennifer lawrence is going on here?

@-RenderRealm- Жыл бұрын

It wasn't about Jennifer Lawrence, but using a prompt with a celebrity (that surely is included in the StableDiffusion model), helps improving the consistency between a sequence of frames.

@jimconner3983 Жыл бұрын

lol i read that wrong . . dirty mind

@RM_VFX Жыл бұрын

Jennifer Lawrence is so innovative, she reinvents herself in every frame 😂

@ireincarnatedasacontentcreator Жыл бұрын

thank u so much for the tutorial and giving sources as wel

@ArtisteImprevisible Жыл бұрын

Great video man thanks for sharing !

@richardglady3009 Жыл бұрын

Extremely informative and very well done. Thank you.

@FrankJonen Жыл бұрын

Looks like the best way to do prepared this is to render the character against a green background and render the background separately. Then just key it out as normally.

@-RenderRealm- Жыл бұрын

Yes, green-screening the character in Blender and making 2 render-passes in SD, one with the character and one with the background, would also be an option. I used this method in one of my previous tutorials about creating an audio-reactive music video with StableDiffusion. Makes sense!

@OriBengal Жыл бұрын

Thanks- very comprehensive.

@EdnaKerr Жыл бұрын

Super video and excellent teaching. It even looks easy to do. I am going to try.🤩

@EdnaKerr Жыл бұрын

I sent a message on your music channel. please have a look.

@mraahauge Жыл бұрын

This is great. Thank you.

@gamingdayshuriken4192 Жыл бұрын

Nice Thx Good Work !

@izzaauliyairabby5421 Жыл бұрын

Thanks

@z1mt0n1x2 Жыл бұрын

man that's a trippy video

@jadonx Жыл бұрын

Ive been and sat in that cafe in France(Saint-Guillhem-le-desert), one of the most beautiful places in the world!

@taportnaya2136 Жыл бұрын

wow

@aribjarnason Жыл бұрын

Thanks a lot for this awesome tutorial and all your work. Would the ebsynth program not be a good step in the process to make the video look more consistent?

@-RenderRealm- Жыл бұрын

Yes, it might, and I've already installed the app, but didn't have the time yet, to get deeper into it. There's also an extension for Automatic1111 available under Extensions ->Availabe->Load from, which helps you through the process. Maybe I'll make a video about it, if I think I it can be helpful. Just working on another solution for improving the consistency, by green-screening the character in Blender and rendering it separately from the background scene, then making 2 render passes with batch img2img + 2 ControlNets, one for the character and one for the background, and putting them together again in my video editor. Looks promising and I'm going to make a quick tutorial about it soon, together with a short review about RunwayML Gen1.

@PayxiGandia Жыл бұрын

very trippy

@-RenderRealm- Жыл бұрын

Yeah, Automatic1111 is quite capable of producing great images, but there are still major issues with the temporal coherence in videos, which makes them look kind of trippy. But I think I'm about to crack that nut... just working on a new video, where I'll be addressing this issue (and hopefully can provide some ideas how to solve it). I'm also checking out some other stuff in this regard, like RunwayML Gen1, which is far from being a perfect solution, but gave me some conceptional ideas how to deal with the coherence issues in Automatic1111. Well, we shall see...

@AZSDXC8954 Жыл бұрын

true future will come once stable diffusion will be able to output consistent result for both background and characters

@-RenderRealm- Жыл бұрын

Right, temporal coherence still is one of the big issues with this technology. Still, it's getting better over time and just I'm trying to find some workarounds, by green-screening the character from the background and rendering it separately, which seems to produce better results in most cases. Also tried RunwayML Gen 1, but that's in my view not yet suitable for producing professional stuff, due to it's severe time limitations for rendering clips. Maybe I'll make a video about it, I I think I can add a valuable contribution to this topic.

@coloryvr Жыл бұрын

BIG FANX for this great, helpful complex, motivating and inspiring video!

@-RenderRealm- Жыл бұрын

Thanks :-)

@RobertWildling 5 ай бұрын

Great intro to that topic. But did you also manage to get a good video result, where the costume stays the same, as well as the background? If so, is there any chance for a follow-up video?

@-RenderRealm- 5 ай бұрын

I'm just working on that issue! There've been a lot of developments in StableDiffusion since I've posted this video, and now it has become possible to produce stable, flicker-free animations of any person, just by using a Blender animation or a video input and a single facial photo of that person. I'm planning to post another tutorial in the coming week, this time using ComfyUI instead of Auto1111, as it's more versatile... just need to solve a few minor issues before I'm ready.

@RobertWildling 5 ай бұрын

@@-RenderRealm- Looking very much forward to that one. Time to learn ComfyUI!

@PRepublicOfChina Жыл бұрын

This is incredible. But also insanely long and complicated. I hope someday they can make an AI that can generate this video all in 1 app. Something like ModelScope text to video synthesis 2. Or stable warpfusion. Or Runway Gen 1. Also you made this very long and complex. I think you could just use a video game like Blade & Soul to create the avatar, dance, background scene, and just record a video of the video game. Then input the video into stable diffusion. Using a video game could have saved you 100 of those steps.

@-RenderRealm- Жыл бұрын

I know, it's still a long process, but that's where we stand now. I'm working on a video to compare the pros and cons of Runway Gen 1 and Automatic1111 in video creation in a quick step by step guide, also giving some ideas how to deal with the Auto1111 temporal coherence issues. You are right in the sense that using a video game as an input could accelerate the process, but since I'm not so much into video-gaming, I had to go the more tedious way and create the input-footage myself. But it's a good advice that you gave!

@keleborn9151 Жыл бұрын

Such a program has already been created by Epic Games. The Meta Human mobile application allows to record your movements as animation and voice acting in 5 minutes.

@marassisportsinc.9195 Жыл бұрын

👍

@EditArtDesign Жыл бұрын

😎

@pieterkirkham5555 Жыл бұрын

You could export the depth map in blender to get an even better output.

@-RenderRealm- Жыл бұрын

That's a good idea, thanks! I'll try it out.

@alexi_space Жыл бұрын

can you make another version where you show how to create custom person model, for example from midjourney image?

@gatomio9739 Жыл бұрын

Interesting 🤔

@avatarjoker9402 Жыл бұрын

Bro which mac you use? M1 or intel.

@tobygilbert-sl7ew Жыл бұрын

dude that was so helpful but for i have Q i start learning blender but i want see if my pc i suitable for it or not the cpu is 10400f core i5 and gpu is 6900 xt 16g is that good for making animation or shod i upgrade it??

@-RenderRealm- Жыл бұрын

Well, Blender requires some efforts in order to get started, but once you're getting familiar with it, it's a wonderful tool for producing 3d renders of any kind. I'm also using UnrealEngine for cinematic 3d renders and, while it's a monster in terms of memory and GPU-requirements, it's also on top of my list. And both, Blender and Unreal are completely free. Yet, you don't need any of these tools for StableDiffusion, but can create your input-videos in any other way you like, be it a simple smartphone-camera, or even by recording scenes from a game, if you are into computer gaming. StableDiffusion / Automatic1111 is very well connected with NVIDIA RTX graphic cards. Besides my MacBook Pro M1 Max, I also own a middle-class PC with 32GB of RAM and an NVIDIA RTX3060, 12GB VRAM GPU, so nothing fancy, and that works pretty, pretty well, even beating my Mac in terms of performance. To my knowledge, AMD-cards are not so well supported as NVIDIA, but should also be able to get along with Automatic1111. Here's an article I found on Github, addressing this topic: github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Install-and-Run-on-AMD-GPUs I would just give it a try with what you have, and consider an upgrade only if you are not satisfied with it.

@tobygilbert-sl7ew Жыл бұрын

@@-RenderRealm-your

@tobygilbert-sl7ew Жыл бұрын

@@-RenderRealm- thank for your time to respond and it was so helpful

@santhoshmg2319 11 ай бұрын

Awesome👍 please tell me how to create face expression in my model

@thetest4514 Жыл бұрын

Results are little bit mehh but i like the idea. Thanks.

@omnigeddon Жыл бұрын

Nice but I figure a 3d scan can work better and how does face dance make their stuff???

@-RenderRealm- Жыл бұрын

Well, the Deform extension still has some issues with temporal inconsistencies. You can get better results, if you set the Strength and CFG-Scale even lower, so StableDiffusion sticks closer to the original video. Also I found that batch img2img and ControlNet, with low strength and scale make the scene to stay more consistent.

@11305205219 Жыл бұрын

*Runway Gen2 maybe can do it easy*

@BRAVEN32m12 Жыл бұрын

I think it Won't be to much longer Before text videos

@shadowprince7149 Жыл бұрын

Why the background is changing? Is it possible to make it with a constant background?

@-RenderRealm- Жыл бұрын

Yes, that's possible, it just requires a little trick. In Blender, render the scene only with the character on a green-screen background, hiding the rest of the scene and feed this video into StableDiffusion ( If you don't know how to create a green screen effect in Blender, take a short look into my tutorial about creating an audio-reactive music video at timeframe 9:25.). Next unhide the scene in Blender again, but hide the character and render it again. Then import the background render from Blender and the SD-rendered video with the character into your video-editing software (FinalCut, DaVinci, Premiere) with the character placed in front and remove the green screen from the character with a keyer - and it's done. Hope that explanation has been understandable, if you have any further questions, just ask!

@maertscisum7243 Жыл бұрын

Is it possible to remove the defective frames?

@-RenderRealm- Жыл бұрын

Sure! If only single frames are defected, either just delete them from your output sequence, or replace them with the frame before or after it. If there are more defected frames in a sequence, import all frames into your video editing software as an image sequence, then delete the defected frames and interpolate the missing parts. Depends on which software you are using, but I think that most video editing apps are capable of doing that.

@yajuvendrasinghrajpurohit7888 Жыл бұрын

Its same as the corridor crew video right ?

@-RenderRealm- Жыл бұрын

Sorry, I'm not familiar with a corridor crew video. If you'd like me to take a look, please send me a link.

@yajuvendrasinghrajpurohit7888 Жыл бұрын

@@-RenderRealm- kzbin.info/www/bejne/lWqviWx-iLaejdE this one ,I ain't criticizing your video just curious.

@-RenderRealm- Жыл бұрын

Thanks for the link! Looks like a professional tool for SD video creation, definitely worth taking a closer look, though I'm not sure about the real costs for using that tool. They seem to be working a lot with green-screening the characters, which surely helps keeping them more consistent. I've also done that in one of my previous videos about creating an audio-reactive music video, you then just need 2 render passes, one for the character and one for the background and then merging them together in a video-editing software, like FinalCut or Premiere. Again, thanks, I'm going to play around with it for a bit and see what I can do with it!

@homer3189 Жыл бұрын

very cool, but the end result is still unwatchable, no? Of course this technology is the start of something amazing once the glitches are worked out.

@-RenderRealm- Жыл бұрын

Yes, we're still at the beginning, but it's just amazing how fast the technology and the tools are developing. It's just great fun being part of it already at this early stage, watching it grow and adding some humble contributions to it.

@ferreroman2913 Жыл бұрын

not a great result but still amazing on learning how to do animations

@curiaproduction2.020 Жыл бұрын

sorry but when I go down it no longer goes down to control net to put the image in sequence

@-RenderRealm- Жыл бұрын

Please take a look at the ControlNet settings in the Settings Tab: Settings->Controlnet and make sure, that the "Do not append detectmap to output" box is checked. If not, please check it and restart of the webui. Also make sure, that the latest ControlNet version is installed (Extensions->Check for Updates). If nothing helps, try to remove the whole ControlNet Folder from your stable-diffusion-webui/extensions folder and reinstall ControlNet (Extensions->Available->Load From->ControlNet->Install). Hope that helps, if not, leave me another note!

@dandelionk3779 Жыл бұрын

I still cant generate any asset in blender using stable diffusion,, even tho i already following the installation.. after i input the prompt it not generate anything.. can you help

@-RenderRealm- Жыл бұрын

Can you tell me a bit more about the problem, please. Is it that you can't create and export an image sequence in Blender, or is the problem that you can't render this image sequence in StableDiffusion/Automatic1111? I will like to help you, if I can!

@dandelionk3779 Жыл бұрын

@@-RenderRealm- you have dm or something that i can connect you withh...

@-RenderRealm- Жыл бұрын

No, I'm not using dms, but if you want you can simply send me an email to my channel address (blndrrndr@gmail.com) and attach the blender file, so I can take a look at it.

@cris4529 Жыл бұрын

You could just render the animations directly on blender.

@charlesneely Жыл бұрын

Would have been nice if you gave us the full clip dude 😎 that's like waving jenna Jameson in front of us retired pornstar and didn't tell us to go find

@patrickmros Жыл бұрын

I'm sorry but this is terrible. There are much better solutions for this then using deforum. You should take a look at the img2img batch processing. You can use that with multiple controlnets like depth, canny, pose and landmark all at the same time. And no need for generating a video file first, img2img takes a folder with images as an input. That should give you a good consistency. And then look at some flicker remove tutorial for the free version of Davinci-Resolve. You will be amazed of how much better the result is.

@-RenderRealm- Жыл бұрын

Batch img2img can give you slightly better results with low strength and scale, but temporal consistency is still a big issue with all these tools. I still think that Deforum is a great extension with many possibilities, like math functions and prompt shifting, but it's fairly complex to turn this great variety of functions into meaningful results. ControlNet has been a great improvement, no matter if you use it together with batch img2img or with Deforum - I think it's a must-have for most use-cases. For reducing temporal inconsistency it also seems to help using less detailed background scenes, or even separating the character from the background scene by green-screening it and using separate render-passes with slightly different settings for the character and the background, before putting them together again in a video editing software. DaVinci Resolve is great for pre- and post-processing, though I tend to prefer FinalCut Pro, as long as I'm on my Mac, as it has some pretty good tools for stabilizing and improving the optical flow of a clip. Still, my main focus at this time is at tweaking the settings in StableDiffusion in order to improve the temporal consistency. I'm also looking into some new scripts and extensions, and there are some promising concepts and ideas coming up, that try to address these issues, but I still haven't been able to find a convincing overall solution. Well, it's a steep learning curve and all the available tools still have flaws, but I think it's worth dealing with them, as well as sharing your thoughts and ideas with others, no matter how imperfect they still may be.

@Zaroak Жыл бұрын

that´s Jennifer Lawrence?....

@-RenderRealm- Жыл бұрын

Yes, I tried to use a celebrity name for improving the overall temporal consistence of the character. It's not about her as a person, just as a stronger guidance for StableDiffusion than, for example, (a beautiful blonde woman).

@bomar920 Жыл бұрын

That’s long process to have few seconds video . We got a long way to go 😢

@-RenderRealm- Жыл бұрын

Well, it could also be done by simply feeding a dancing video into StableDiffusion, instead of creating one in Mixamo and Blender, but this tutorial was also meant to describe how to combine different technologies and tools for creating something new. Yes, still a long way to go, but the StableDiffusion tools are advancing so rapidly, so it makes me very confident that it's going to get a lot easier as we move forward.

@bomar920 Жыл бұрын

@@-RenderRealm- direct me if there is tutorial for that as I am new to stable diffusion . This feels like when the iphone first came out . There is excitement all over

@-RenderRealm- Жыл бұрын

There are some good basic tutorials about StableDiffusion on KZbin, that I would recommend watching as a starting point: kzbin.info/www/bejne/ennEfWhshZuZa68 kzbin.info/www/bejne/mYfOfqGpoMicfrc kzbin.info/www/bejne/aZTZgWqvf9Sni68 I've also listed some good channels covering various Stable Diffusion topics at the end of my video description. If you have any specific questions, don't hesitate asking me!

@bomar920 Жыл бұрын

@@-RenderRealm- Thanks, I appreciate your reply . I’ll Go and check out the starter videos. I have already learned the basics but of-course , took me forever.

@NightRiderCZ Жыл бұрын

where is the uber-realistic video animation????... maybe you mean totally under-realistic... maybe clickbait???

@-RenderRealm- Жыл бұрын

No, sure no clickbait, I think the model I used produces very realistic images, the only issue with Deforum is the rather low temporal consistency, which is especially visible at the background scene. I'm just trying to find a way how to fix this issue, by green-screening the character and rendering it separately from the background, with the background scene render at very low scale and strength. Then putting them together again in my video editing software and remove the green screen from the character. I wish there were some buillt-in tools in StableDiffusion for keeping a higher consistence across subsequent frames, but still the tools seem to get better and better with each new version, so I'm pretty confident that we're on the right path. Just a few months ago nothing like this would have been possible to make, and the technology is advancing rapidly.

@eduacademia Жыл бұрын

very complex method and still too early for animation, , but nice try congrats

@-RenderRealm- Жыл бұрын

Yes, the temporal consistency still needs some improvement, but I'm working on it!

@ExplorerOfTheGalaxy 2 ай бұрын

not realistic and neither uber

@matbeedotcom Жыл бұрын

Until the generation ai supports static-ish scenes We need an alayzer that can tell us "its too different" and keep generating

@-RenderRealm- Жыл бұрын

That would be a great step forward in improving temporal consistency in Stable Diffusion videos. Until then, we need do figure out some creative workarounds for this topic. I'm just trying to use frame-interpolation for creating a SD-animation... if it turns out to be a viable solution, I might post another tutorial describing this method.

@matbeedotcom Жыл бұрын

@@-RenderRealm- yes--- we need a LangChain for Generative AI

@aegisgfx Жыл бұрын

I see where we're going with this but I have to say until we have temporal cohesion it's kind of useless.. what's the good of a video where the dress and face changes 60 times per second?

@-RenderRealm- Жыл бұрын

Right, the temporal consistency is still an issue with StableDiffusion, but the tools are getting better rapidly, so I guess these issues will be only temporal, too ;-) I'm just working on another video, where I'm trying to deal with temporal inconsistencies by interpolating the input frames from a video with low strength and scale, hoping it will be more consistent - the Deforum extension doesn't provide this feature yet, but maybe it can be done with the video-input mode and a frame-to-frame interpolation with mathematical functions in the prompts (just like the Deforum interpolation mode works in the background, but with a video input and not just interpolating a series of prompts). Well, let's see how it will work out... the whole technology is still work in progress, but I believe we'll be getting there rather sooner than later.

@aegisgfx Жыл бұрын

@@-RenderRealm- Well I think they're going to solve the temporal issue pretty quickly here and then what's going to happen is 3D programs are going to become a thing of the past, programs like blender and Maya we will literally look at them and say "yeah that's the way we used to do things...", They will be nothing but relics of the past

@-RenderRealm- Жыл бұрын

True :-) I still love my "old" 3d tools, like Blender and Unreal (never worked with Maja) and hope they will integrate the new AI-technologies in a meaningful form some time in the future... but maybe they will just become relics of the past, as you said. No matter how it will turn out, the way how we will create digital artworks will change dramatically. These are fascinating times we're living in!