VOCA: Capture, Learning, and Synthesis of 3D Speaking Styles

  Рет қаралды 34,628

Michael Black

Michael Black

Күн бұрын

Пікірлер: 32
@kondi06
@kondi06 2 жыл бұрын
It looks great, the question is whether the final animation is quality provided by the rtsp port or some web socket. do you have to do something yourself?
@Jeet3D
@Jeet3D 3 жыл бұрын
This looks great! I wonder if the underlying FLAME model's vertex deformation for the speech animation can be translated to a real-time engine friendly blendshape values per frame in real-time. Like have a FACS based blendshape model be driven with VOCA under the hood.
@MichaelBlackMPI
@MichaelBlackMPI 3 жыл бұрын
I honestly think our learned blendshapes are preferable to FACS unless you need FACS for some reason. But to animate speech, there is no need for FACS. Unfortunately the inference model is not designed for real time. It takes a chunk of speech and processes it all at once. It is not implemented in an autoregressive way.
@Jeet3D
@Jeet3D 3 жыл бұрын
@@MichaelBlackMPI Ah thanks for the reply! It's a bit disappointing to know that it is not real-time (yet 😉). So you mean to say there is actually a blendshape curve data available once the speech output is processed? So can that be mapped directly to another custom character that has those exact blendshapes to use?
@MichaelBlackMPI
@MichaelBlackMPI 3 жыл бұрын
@@Jeet3D The blendshapes are those of the FLAME model. These are learned from 3D face scans. So, yes, you get blendshape curves but they are in FLAME format. Transferring these to a model with different blendshapes would be a bit of an effort.
@Jeet3D
@Jeet3D 3 жыл бұрын
@@MichaelBlackMPI I see. Thanks for your clarification. I think I'll have to give it a try to see how the FLAME blendshapes are. And see if there is a way to retarget those to a custom model. My idea is to make use of VOCA to get realtime avatar have its speech animation done on a custom model (can be done sentence by sentence, doesn't need to be purely realtime and can have a delay of anything about 0.5-1 sec). We already have other animations going on in additive mode for the rest of the face, so if we can just map the blendshape curve data for the speech, we can play it additively so they all can work simultaneously.
@DieterV3
@DieterV3 Жыл бұрын
Is there any way to create these animations online to use as video for your website?🤔
@Will_Huff
@Will_Huff 4 жыл бұрын
Does this work in real time? Or does the audio need to be analyzed/processed first?
@MichaelBlackMPI
@MichaelBlackMPI 4 жыл бұрын
This version is not an on-line method. The audio needs to be processed in advance.
@idkidk1774
@idkidk1774 3 жыл бұрын
Sir i am not able to install it on ubantu wsl windows 10 please help
@EmmanuelMadu
@EmmanuelMadu 5 жыл бұрын
This is incredible, You guys are so underrated!!!
@radioreactivity3561
@radioreactivity3561 2 жыл бұрын
Does it take into account toungue movement and generate it?
@MichaelBlackMPI
@MichaelBlackMPI 2 жыл бұрын
No, unfortunately not. We did not have a method to scan/capture the tongue.
@actually_romanoff
@actually_romanoff 5 жыл бұрын
Nice work, but can it do Arnold's "I'll be back" ?
@0609Bhuwan
@0609Bhuwan 5 жыл бұрын
Simply Amazing !!! Congratulations to the team !! this is a real breakthrough
@fennadikketetten1990
@fennadikketetten1990 4 жыл бұрын
I always find it quite strange that the teeth and tongue are not included in 3d models of speech as they are quite important for the sounds that are made and therefore for realism.
@JoshPurple
@JoshPurple 5 жыл бұрын
Exceptional!! Congrats! That's HUGE :) !
@GotUpLateWithMoon
@GotUpLateWithMoon 2 жыл бұрын
Thank you so much!
@mridulsharma7740
@mridulsharma7740 4 жыл бұрын
i noticed the Winston Churchill mesh was from turbosquid. does this mean I can use .obj/mesh files generated from other algorithms provided by the insutite? like creating a mesh from Simply-x and then using VOCA to generate a synthesis of a speaking style for the avatar? I was wondering if this is possible?
@MichaelBlackMPI
@MichaelBlackMPI 4 жыл бұрын
We fit our mesh to the bust of Churchill so that it is in FLAME topology. FLAME is consistent with SMPL-X. So yes, if you register FLAME to your mesh, you can animate with VOCA. We will be releasing some code to help people do mesh registration.
@oaom5734
@oaom5734 4 жыл бұрын
@@MichaelBlackMPI I would love to use my own avatar as well. I was wondering could you please tell me the link If you put the mesh registration code somewhere.
@bijoyboban
@bijoyboban 5 жыл бұрын
Great work team, we are trying to do something like what Kurdo Bakur mentioned; to get real time response from VOCA driven model based up on an AI ChatBot in python, real time is expensive and tuf. Do you have any suggestions?
@MichaelBlackMPI
@MichaelBlackMPI 5 жыл бұрын
What you will need is a realtime version of DeepSpeech (or equivalent) that streams features from audio. Animating the mesh in realtime is no problem. So if you have a realtime method to extract deep features from audio, you could retrain everything to achieve your goal. Our code (including training code) is here github.com/TimoBolkart/voca so it should be possible.
@shoemakerleve9
@shoemakerleve9 3 жыл бұрын
Bijoy do you have any updates on this?
@kurdobakur7047
@kurdobakur7047 5 жыл бұрын
Can I use my Python built A.I (Chatbot) on VOCA?
@MichaelBlackMPI
@MichaelBlackMPI 5 жыл бұрын
You would need to put your model into correspondence with our FLAME head model. If you do that, then you should be able to use VOCA to drive your model.
@weima3908
@weima3908 5 жыл бұрын
@@MichaelBlackMPIAmazing work! I notice you share codes to convert even a head image to FLAME model. Oppositely, can I create a real-looking face back from FLAME head model?
@MichaelBlackMPI
@MichaelBlackMPI 5 жыл бұрын
@@weima3908 Do you mean a rendered image with realistic texture? Not yet. We are working on providing people with a high-quality texture model also. So stay tuned.
@Lacuna-x9l
@Lacuna-x9l 4 жыл бұрын
Can I merge python AI chatbot and facial expression controller model with VOCA to create intelligent digital human?
@MichaelBlackMPI
@MichaelBlackMPI 4 жыл бұрын
You would need to do some work to make the audio processing on-line. Right now we process the audio first.
@manleonardo
@manleonardo 5 жыл бұрын
I love your work, but every time I watched, it looks so real that it's creepy...
@Qubot
@Qubot 5 жыл бұрын
Skynet soon
[SIGGRAPH ASIA 2022] Video-driven Neural Physically-based Facial Asset for Production
5:14
Facegood 4D Facial Animation
6:14
JSFILMZ
Рет қаралды 4,5 М.
VIP ACCESS
00:47
Natan por Aí
Рет қаралды 30 МЛН
Chain Game Strong ⛓️
00:21
Anwar Jibawi
Рет қаралды 41 МЛН
FLAME: Learned face model from 4D scans (SIGGRAPH Asia, 2017)
6:01
Michael Black
Рет қаралды 31 М.
Neural Voice Puppetry: Audio-driven Facial Reenactment (ECCV 2020)
5:29
Matthias Niessner
Рет қаралды 21 М.
The man who triggered the AI explosion
10:34
NewsPicks /ニューズピックス
Рет қаралды 6 М.
Stunning New Universe Fly-Through Really Puts Things Into Perspective
5:45
Voice2Face: Audio-Driven Facial and Tongue Rig Animations
5:13
SEED – Electronic Arts
Рет қаралды 6 М.
The Future of Facial Animation with AI | Unreal Engine Metahuman
11:36
Accurate Markerless Jaw Tracking for Facial Performance Capture
3:17
DisneyResearchHub
Рет қаралды 18 М.
The Dome Paradox: A Loophole in Newton's Laws
22:59
Up and Atom
Рет қаралды 1 МЛН
VIP ACCESS
00:47
Natan por Aí
Рет қаралды 30 МЛН