EG3D

  • AI Model

Unsupervised generation of high-quality multi-view-consistent images and 3D shapes using only collections of single-view 2D photographs has been a long-standing challenge. Existing 3D GANs are either compute-intensive or make approximations that are not 3D-consistent; the former limits quality and resolution of the generated images and the latter adversely affects multi-view consistency and shape quality. In this work, we improve the computational efficiency and image quality of 3D GANs without overly relying on these approximations. For this purpose, we introduce an expressive hybrid explicit-implicit network architecture that, together with other design choices, synthesizes not only high-resolution multi-view-consistent images in real time but also produces high-quality 3D geometry. By decoupling feature generation and neural rendering, our framework is able to leverage state-of-the-art 2D CNN generators, such as StyleGAN2, and inherit their efficiency and expressiveness. We demonstrate state-of-the-art 3D-aware synthesis with FFHQ and AFHQ Cats, among other experiments.

Training

Full instructions on training the models can be found at https://github.com/NVlabs/eg3d.

Dataset

FFHQ (512x512): We realigned and recropped the original FFHQ dataset, with light augmentations.

FFHQ-Rebalanced (512x512): On top of our previous cropping and augmentations for FFHQ, we also include several duplications of images for less-frequently seen poses.

AFHQv2 (512x512): We use the cats-train subset of the AFHQv2 dataset released by stylegan3. We also apply further augmentation in training.

Shapenet Cars (128x128): We render randomly sampled views of many synthetic cars in the Shapenet dataset.

The datasets as well as more information can be found at https://github.com/NVlabs/eg3d.

Performance

Results, as well as estimated training duration vary greatly depending on available resources and training options. See the Readme for several examples and base recommendations for training.

Requirements:

Linux is required for performance and compatibility reasons.

1–8 high-end NVIDIA GPUs with at least 12 GB of memory. We have done all testing and development using RTX 3090 and A100 GPUs.

64-bit Python 3.9 and PyTorch 1.11.0 (or later). See https://pytorch.org for PyTorch install instructions.

CUDA toolkit 11.1 or later.

GCC 7 or later (Linux) or Visual Studio (Windows) compilers. Recommended GCC version depends on CUDA version.

Input

Datasets are stored as uncompressed ZIP archives containing uncompressed PNG files and a metadata file dataset.json for labels. See dataset_tool.py for specific options and instructions for dataset creation.

Output

See information and examples of the output at our project page here: https://nvlabs.github.io/eg3d/.


Name

EG3D

Description

Unsupervised generation of high-quality multi-view-consistent images and 3D shapes using only collections of single-view 2D photographs has been a long-standing challenge. Existing 3D GANs are either compute-intensive or make approximations that are not 3D-consistent; the former limits quality and resolution of the generated images and the latter adversely affects multi-view consistency and shape quality. In this work, we improve the computational efficiency and image quality of 3D GANs without overly relying on these approximations. For this purpose, we introduce an expressive hybrid explicit-implicit network architecture that, together with other design choices, synthesizes not only high-resolution multi-view-consistent images in real time but also produces high-quality 3D geometry. By decoupling feature generation and neural rendering, our framework is able to leverage state-of-the-art 2D CNN generators, such as StyleGAN2, and inherit their efficiency and expressiveness. We demonstrate state-of-the-art 3D-aware synthesis with FFHQ and AFHQ Cats, among other experiments.

Author

Programming Language

Sub-Fields