It looks great, the question is whether the final animation is quality provided by the rtsp port or some web socket. do you have to do something yourself?
@Jeet3D3 жыл бұрын
This looks great! I wonder if the underlying FLAME model's vertex deformation for the speech animation can be translated to a real-time engine friendly blendshape values per frame in real-time. Like have a FACS based blendshape model be driven with VOCA under the hood.
@MichaelBlackMPI3 жыл бұрын
I honestly think our learned blendshapes are preferable to FACS unless you need FACS for some reason. But to animate speech, there is no need for FACS. Unfortunately the inference model is not designed for real time. It takes a chunk of speech and processes it all at once. It is not implemented in an autoregressive way.
@Jeet3D3 жыл бұрын
@@MichaelBlackMPI Ah thanks for the reply! It's a bit disappointing to know that it is not real-time (yet 😉). So you mean to say there is actually a blendshape curve data available once the speech output is processed? So can that be mapped directly to another custom character that has those exact blendshapes to use?
@MichaelBlackMPI3 жыл бұрын
@@Jeet3D The blendshapes are those of the FLAME model. These are learned from 3D face scans. So, yes, you get blendshape curves but they are in FLAME format. Transferring these to a model with different blendshapes would be a bit of an effort.
@Jeet3D3 жыл бұрын
@@MichaelBlackMPI I see. Thanks for your clarification. I think I'll have to give it a try to see how the FLAME blendshapes are. And see if there is a way to retarget those to a custom model. My idea is to make use of VOCA to get realtime avatar have its speech animation done on a custom model (can be done sentence by sentence, doesn't need to be purely realtime and can have a delay of anything about 0.5-1 sec). We already have other animations going on in additive mode for the rest of the face, so if we can just map the blendshape curve data for the speech, we can play it additively so they all can work simultaneously.
@DieterV3 Жыл бұрын
Is there any way to create these animations online to use as video for your website?🤔
@Will_Huff4 жыл бұрын
Does this work in real time? Or does the audio need to be analyzed/processed first?
@MichaelBlackMPI4 жыл бұрын
This version is not an on-line method. The audio needs to be processed in advance.
@idkidk17743 жыл бұрын
Sir i am not able to install it on ubantu wsl windows 10 please help
@EmmanuelMadu5 жыл бұрын
This is incredible, You guys are so underrated!!!
@radioreactivity35612 жыл бұрын
Does it take into account toungue movement and generate it?
@MichaelBlackMPI2 жыл бұрын
No, unfortunately not. We did not have a method to scan/capture the tongue.
@actually_romanoff5 жыл бұрын
Nice work, but can it do Arnold's "I'll be back" ?
@0609Bhuwan5 жыл бұрын
Simply Amazing !!! Congratulations to the team !! this is a real breakthrough
@fennadikketetten19904 жыл бұрын
I always find it quite strange that the teeth and tongue are not included in 3d models of speech as they are quite important for the sounds that are made and therefore for realism.
@JoshPurple5 жыл бұрын
Exceptional!! Congrats! That's HUGE :) !
@GotUpLateWithMoon2 жыл бұрын
Thank you so much!
@mridulsharma77404 жыл бұрын
i noticed the Winston Churchill mesh was from turbosquid. does this mean I can use .obj/mesh files generated from other algorithms provided by the insutite? like creating a mesh from Simply-x and then using VOCA to generate a synthesis of a speaking style for the avatar? I was wondering if this is possible?
@MichaelBlackMPI4 жыл бұрын
We fit our mesh to the bust of Churchill so that it is in FLAME topology. FLAME is consistent with SMPL-X. So yes, if you register FLAME to your mesh, you can animate with VOCA. We will be releasing some code to help people do mesh registration.
@oaom57344 жыл бұрын
@@MichaelBlackMPI I would love to use my own avatar as well. I was wondering could you please tell me the link If you put the mesh registration code somewhere.
@bijoyboban5 жыл бұрын
Great work team, we are trying to do something like what Kurdo Bakur mentioned; to get real time response from VOCA driven model based up on an AI ChatBot in python, real time is expensive and tuf. Do you have any suggestions?
@MichaelBlackMPI5 жыл бұрын
What you will need is a realtime version of DeepSpeech (or equivalent) that streams features from audio. Animating the mesh in realtime is no problem. So if you have a realtime method to extract deep features from audio, you could retrain everything to achieve your goal. Our code (including training code) is here github.com/TimoBolkart/voca so it should be possible.
@shoemakerleve93 жыл бұрын
Bijoy do you have any updates on this?
@kurdobakur70475 жыл бұрын
Can I use my Python built A.I (Chatbot) on VOCA?
@MichaelBlackMPI5 жыл бұрын
You would need to put your model into correspondence with our FLAME head model. If you do that, then you should be able to use VOCA to drive your model.
@weima39085 жыл бұрын
@@MichaelBlackMPIAmazing work! I notice you share codes to convert even a head image to FLAME model. Oppositely, can I create a real-looking face back from FLAME head model?
@MichaelBlackMPI5 жыл бұрын
@@weima3908 Do you mean a rendered image with realistic texture? Not yet. We are working on providing people with a high-quality texture model also. So stay tuned.
@Lacuna-x9l4 жыл бұрын
Can I merge python AI chatbot and facial expression controller model with VOCA to create intelligent digital human?
@MichaelBlackMPI4 жыл бұрын
You would need to do some work to make the audio processing on-line. Right now we process the audio first.
@manleonardo5 жыл бұрын
I love your work, but every time I watched, it looks so real that it's creepy...