DECA: Learning an Animatable Detailed 3D Face Model from In-the-Wild Images (SIGGRAPH 2021)

Рет қаралды 16,611

3 жыл бұрын

While current monocular 3D face reconstruction methods can recover fine geometric details, they suffer several limitations. Some methods produce faces that cannot be realistically animated because they do not model how wrinkles vary with expression. Other methods are trained on high-quality face scans and do not generalize well to in-the-wild images. We present the first approach that regresses 3D faces shape and animatable details that are specific to an individual but change with expression. Our model, DECA (Detailed Expression Capture and Animation), is trained to robustly produce a UV displacement map from a low-dimensional latent representation that consists of person-specific detail parameters and generic expression parameters, while a regressor is trained to predict detail, shape, albedo, expression, pose and illumination parameters from a single image. To enable this, we introduce a novel detail-consistency loss that disentangles person-specific details from expression-dependent wrinkles. This disentanglement allows us to synthesize realistic person-specific wrinkles by controlling expression parameters while keeping person-specific details unchanged. DECA is learned from in-the-wild images with no paired 3D supervision and achieves state-of-the-art shape reconstruction accuracy on two benchmarks. Qualitative results on in-the-wild data demonstrate DECA's robustness and its ability to disentangle identity- and expression-dependent details enabling animation of reconstructed faces.
Authors: Yao Feng*, Haiwen Feng*, Michael J. Black, and Timo Bolkart (*authors contributed equally)
Project: deca.is.tue.mpg.de
Code: github.com/YadiraF/DECA
pdf: files.is.tue.mpg.de/black/pape...
Supplemental: files.is.tue.mpg.de/black/pape...
Bibtex:
@article{Feng:SIGGRAPH:2021,
title = {Learning an Animatable Detailed {3D} Face Model from In-the-Wild Images},
author = {Feng, Yao and Feng, Haiwen and Black, Michael J. and Bolkart, Timo},
journal = {ACM Trans. Graphics (ToG), Proc. SIGGRAPH},
volume = {40},
number = {4},
pages = {88:1--88:13},
month = aug,
year = {2021},
month_numeric = {8}
}

Пікірлер: 17

@Abcd-kw4cp Ай бұрын

I try to test code but while running code in colab. What should i upload in colab as input image to see output? By the way, Nice work!!

@drpchankh 2 жыл бұрын

Great work! Would love to see those reconstructed 3D face feature used as additional input into existing face GAN decoder model.

@MichaelBlackMPI 2 жыл бұрын

It's a great idea and would be easy to do. There are already several works that use such ideas to condition GANs, including our work on GIF (ps.is.tuebingen.mpg.de/publications/gif-3dv-2020).

@sarvagyagupta1744 3 жыл бұрын

Instead of the displacement map, have you tried normal maps? If you did, was there any difference in the results?

@3dgiftz 2 жыл бұрын

Awesome

@funsensei Жыл бұрын

is there any tutorial on how to implement this on a persnal image? i have no background in the feild but i am curious about it. i'd like to have a first hand experience.

@bobthornton9280 2 жыл бұрын

What could it do with this scene? kzbin.info/www/bejne/imG8ioiVjLyChcU What I'd like to see is just the shots where Locke (Terry O'quinn) and Jack (Mathew Fox) are yelling at each other. I'd love to see old movies and television shows converted into light fields using reconstruction software like this. 01:25-02:40 in the link. I'd just like to see if this software could reconstruct the performances. I think the litmus test would be using it on movies, and then you'd see a drastic improvement. You have a billion lighting scenarios, you have samples of all expressions. This would be a way to preserve movies, and convert them into 3d better or even lightfields. Since the body and facial reconstruction software would provide highly accurate depth maps.

@ericmlevy 11 ай бұрын

Michael, are you aware if anyone has retrained DECA on synthetic data?

@MichaelBlackMPI 11 ай бұрын

No. Sorry. Note that we don't use any 3D data here in training (academic exercise). I am sure that a bit of 3D data will improve accuracy significantly. For a real-world system, I'd definitely do this.

@ericmlevy 11 ай бұрын

@@MichaelBlackMPI I'd love to see how DECA would do using SPIGA instead of FAN for the landmarks on the real world dataset. I reached out to some vendors of synthetic data, I expect it'll be prohibitively expensive for our small operation.

@OmarHesham 3 жыл бұрын

Good effort but the resulting mesh does not resemble the input image at all. The likeness is just not there. To verify this, test your result against a 3D scan of the same person as a ground truth.

@MichaelBlackMPI 3 жыл бұрын

We do exactly that! We evaluate on the NoW challenge, which has ground truth 3D scans. DECA is currently the most accurate method on NoW. What may be surprising is that some methods look good when rendered in 2D but the 3D is actually quite wrong. DECA may look less detailed but the 3D is more accurate than other recent methods.

@OmarHesham 3 жыл бұрын

@@MichaelBlackMPI True! You guys did it best for sure. It's a step in the right direction. I hope generating correct likeness becomes accurate in the future though. It is a lot to ask to generate 3D accuracy only from one image. These amazing innovations definitely take time :)

@ericmlevy Жыл бұрын

@@MichaelBlackMPI What would you recommend as the latest and greatest for fitting a FLAME model to a video with temporal stability? I’m looking for something that can be used commercially.

@MichaelBlackMPI Жыл бұрын

@@ericmlevy The latest works that are good for this include EMOCA and MICA. Both are available for commercial use through Meshcapade.com. Contact sales@meshcapade.com for more info.

@ericmlevy Жыл бұрын

@@MichaelBlackMPI thank you.