Large-Scale High-Quality 3D Gaussian Head Reconstruction from Multi-View Captures

May 5, 20262605.04035

Evangelos Ntavelis, Sean Wu, Mohamad Shahbazi, Fabio Maninchedda, Dmitry Kostiaev + 18 more

cs.CVcs.LG

TLDR

HeadsUp reconstructs high-quality 3D Gaussian heads from multi-view captures using a scalable feed-forward method, generalizing to new identities.

Key contributions

Proposes HeadsUp, a scalable feed-forward method for high-quality 3D Gaussian head reconstruction.
Employs UV-parameterized 3D Gaussians, decoupling input resolution from Gaussian count for efficiency.
Trained on a massive 10,000+ subject dataset, an order of magnitude larger than prior work.
Achieves SOTA quality and generalizes to novel identities without test-time optimization.

Why it matters

This paper introduces a highly scalable and efficient method for 3D head reconstruction, overcoming limitations of prior work with a massive dataset. It achieves state-of-the-art quality and generalization, enabling new applications like generating and animating 3D identities. This advances realistic digital human creation.

Original Abstract

We propose HeadsUp, a scalable feed-forward method for reconstructing high-quality 3D Gaussian heads from large-scale multi-camera setups. Our method employs an efficient encoder-decoder architecture that compresses input views into a compact latent representation. This latent representation is then decoded into a set of UV-parameterized 3D Gaussians anchored to a neutral head template. This UV representation decouples the number of 3D Gaussians from the number and resolution of input images, enabling training with many high-resolution input views. We train and evaluate our model on an internal dataset with more than 10,000 subjects, which is an order of magnitude larger than existing multi-view human head datasets. HeadsUp achieves state-of-the-art reconstruction quality and generalizes to novel identities without test-time optimization. We extensively analyze the scaling behavior of our model across identities, views, and model capacity, revealing practical insights for quality-compute trade-offs. Finally, we highlight the strength of our latent space by showcasing two downstream applications: generating novel 3D identities and animating the 3D heads with expression blendshapes.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers