VOCA: Capture, Learning, and Synthesis of 3D Speaking Styles

Рет қаралды 34,628

Күн бұрын

Пікірлер: 32

@kondi06 2 жыл бұрын

It looks great, the question is whether the final animation is quality provided by the rtsp port or some web socket. do you have to do something yourself?

@Jeet3D 3 жыл бұрын

This looks great! I wonder if the underlying FLAME model's vertex deformation for the speech animation can be translated to a real-time engine friendly blendshape values per frame in real-time. Like have a FACS based blendshape model be driven with VOCA under the hood.

@MichaelBlackMPI 3 жыл бұрын

I honestly think our learned blendshapes are preferable to FACS unless you need FACS for some reason. But to animate speech, there is no need for FACS. Unfortunately the inference model is not designed for real time. It takes a chunk of speech and processes it all at once. It is not implemented in an autoregressive way.

@Jeet3D 3 жыл бұрын

@@MichaelBlackMPI Ah thanks for the reply! It's a bit disappointing to know that it is not real-time (yet 😉). So you mean to say there is actually a blendshape curve data available once the speech output is processed? So can that be mapped directly to another custom character that has those exact blendshapes to use?

@MichaelBlackMPI 3 жыл бұрын

@@Jeet3D The blendshapes are those of the FLAME model. These are learned from 3D face scans. So, yes, you get blendshape curves but they are in FLAME format. Transferring these to a model with different blendshapes would be a bit of an effort.

@Jeet3D 3 жыл бұрын

@@MichaelBlackMPI I see. Thanks for your clarification. I think I'll have to give it a try to see how the FLAME blendshapes are. And see if there is a way to retarget those to a custom model. My idea is to make use of VOCA to get realtime avatar have its speech animation done on a custom model (can be done sentence by sentence, doesn't need to be purely realtime and can have a delay of anything about 0.5-1 sec). We already have other animations going on in additive mode for the rest of the face, so if we can just map the blendshape curve data for the speech, we can play it additively so they all can work simultaneously.

@DieterV3 Жыл бұрын

Is there any way to create these animations online to use as video for your website?🤔

@Will_Huff 4 жыл бұрын

Does this work in real time? Or does the audio need to be analyzed/processed first?

@MichaelBlackMPI 4 жыл бұрын

This version is not an on-line method. The audio needs to be processed in advance.

@idkidk1774 3 жыл бұрын

Sir i am not able to install it on ubantu wsl windows 10 please help

@EmmanuelMadu 5 жыл бұрын

This is incredible, You guys are so underrated!!!

@radioreactivity3561 2 жыл бұрын

Does it take into account toungue movement and generate it?

@MichaelBlackMPI 2 жыл бұрын

No, unfortunately not. We did not have a method to scan/capture the tongue.

@actually_romanoff 5 жыл бұрын

Nice work, but can it do Arnold's "I'll be back" ?

@0609Bhuwan 5 жыл бұрын

Simply Amazing !!! Congratulations to the team !! this is a real breakthrough

@fennadikketetten1990 4 жыл бұрын

I always find it quite strange that the teeth and tongue are not included in 3d models of speech as they are quite important for the sounds that are made and therefore for realism.

@JoshPurple 5 жыл бұрын

Exceptional!! Congrats! That's HUGE :) !

@GotUpLateWithMoon 2 жыл бұрын

Thank you so much!

@mridulsharma7740 4 жыл бұрын

i noticed the Winston Churchill mesh was from turbosquid. does this mean I can use .obj/mesh files generated from other algorithms provided by the insutite? like creating a mesh from Simply-x and then using VOCA to generate a synthesis of a speaking style for the avatar? I was wondering if this is possible?

@MichaelBlackMPI 4 жыл бұрын

We fit our mesh to the bust of Churchill so that it is in FLAME topology. FLAME is consistent with SMPL-X. So yes, if you register FLAME to your mesh, you can animate with VOCA. We will be releasing some code to help people do mesh registration.

@oaom5734 4 жыл бұрын

@@MichaelBlackMPI I would love to use my own avatar as well. I was wondering could you please tell me the link If you put the mesh registration code somewhere.

@bijoyboban 5 жыл бұрын

Great work team, we are trying to do something like what Kurdo Bakur mentioned; to get real time response from VOCA driven model based up on an AI ChatBot in python, real time is expensive and tuf. Do you have any suggestions?

@MichaelBlackMPI 5 жыл бұрын

What you will need is a realtime version of DeepSpeech (or equivalent) that streams features from audio. Animating the mesh in realtime is no problem. So if you have a realtime method to extract deep features from audio, you could retrain everything to achieve your goal. Our code (including training code) is here github.com/TimoBolkart/voca so it should be possible.

@shoemakerleve9 3 жыл бұрын

Bijoy do you have any updates on this?

@kurdobakur7047 5 жыл бұрын

Can I use my Python built A.I (Chatbot) on VOCA?

@MichaelBlackMPI 5 жыл бұрын

You would need to put your model into correspondence with our FLAME head model. If you do that, then you should be able to use VOCA to drive your model.

@weima3908 5 жыл бұрын

@@MichaelBlackMPIAmazing work! I notice you share codes to convert even a head image to FLAME model. Oppositely, can I create a real-looking face back from FLAME head model?

@MichaelBlackMPI 5 жыл бұрын

@@weima3908 Do you mean a rendered image with realistic texture? Not yet. We are working on providing people with a high-quality texture model also. So stay tuned.