GeoWizard: Unleashing the Diffusion Priors for 3D Geometry Estimation from a Single Image

Project Page | Paper | Hugging Face

GeoWizard: Unleashing the Diffusion Priors for 3D Geometry Estimation from a Single Image

Xiao Fu*, Wei Yin*, Mu Hu*, Kaixuan Wang, Yuexin Ma, Ping Tan, Shaojie Shen, Dahua Lin , Xiaoxiao Long
ECCV, 2024

Point Cloud Rendering Using Depth Image Relighting Using Normal Hair-level Details

News

[2024/7/05] Check out our Metric3D v2, a sota depth and normal model in terms of accuracy.
[2024/7/02] Paper accepted to ECCV'24.
[2024/4/16] Release GeoWizard V2, a version with more robust and three-dimensional normal.
[2024/3/25] Thanks to Kijai for incorporating GeoWizard into ComfyUI Version.
[2024/3/19] Release paper, project page, and code.

🛠️ Setup

We test our codes under the following environment: Ubuntu 22.04, Python 3.9.18, CUDA 11.8.

Clone this repository.

git clone [email protected]:fuxiao0719/GeoWizard.git
cd GeoWizard

Install packages

conda create -n geowizard python=3.9
conda activate geowizard
pip install -r requirements.txt
cd geowizard

🤖 Usage

Run inference for depth & normal

Place your images in a directory input/example (for example, where we have prepared several cases), and run the following inference. The depth and normal outputs will be stored in output/example.

python run_infer.py \
    --input_dir ${input path} \
    --output_dir ${output path} \
    --ensemble_size ${ensemble size} \
    --denoise_steps ${denoising steps} \
    --seed ${seed} \
    --domain ${data type}
# e.g.
python run_infer.py \
    --input_dir input/example \
    --output_dir output \
    --ensemble_size 3 \
    --denoise_steps 10 \
    --seed 0 \
    --domain "indoor"

Inference settings: --domain: Data type. Options: "indoor", "outdoor", and "object". Note that "object" is best for background-free objects, like that in objaverse. We find that "indoor" will suit in most scenarios. Default: "indoor". --ensemble_size and --denoise_steps: trade-off arguments for speed and performance, more ensembles and denoising steps to get higher accuracy. Default: 3 and 10 (For academic comparison, please set 10 and 50, respectively).

Run inference for depth & normal (v2)

We additionally train a v2-model with some architecture modifications (replace image CLIP with three types of text embeddings). Now it can generate more realistic and three-dimensional normal maps on some rare images (e.g., cartoon style, see below).

python run_infer_v2.py \
    --input_dir ${input path} \
    --output_dir ${output path} \
    --ensemble_size ${ensemble size} \
    --denoise_steps ${denoising steps} \
    --seed ${seed} \
    --domain "indoor"
# e.g.
python run_infer_v2.py \
    --input_dir input/example \
    --output_dir output \
    --ensemble_size 3 \
    --denoise_steps 10 \
    --seed 0 \
    --domain "indoor"

python run_infer_v2.py \
    --input_dir input/example_object \
    --output_dir output_object \
    --ensemble_size 3 \
    --denoise_steps 10 \
    --seed 0 \
    --domain "object"

Run inference for 3D reconstruction using BiNI algorithm

First put the generated depth & normal npy files under the folder bini/data along with the segmented foreground mask (mask.png. If not set, it will utilize the whole image as mask). We provide two examples for the data structure. Then run the command as follow.

cd ../bini

python bilateral_normal_integration_numpy.py \
    --path ${input path} \
    -k ${k} \
    --iter ${iterations} \
    --tol ${tol}

# e.g. (paper setting)
python bilateral_normal_integration_numpy.py --path data/test_1 -k 2 --iter 50 --tol 1e-5

Training

Here we provide two training scripts train_depth_normal.py and train_depth_normal_v2.py. You need to modify the configs accordingly. We use 8GPUs for training as default, and you can switch 8gpu.yaml to 1gpu.yaml with fewer computing resources. We provide our dataloader format in dataloader/mix_loader.py and encourage you to train it on your own customized datasets.

cd training/scripts

# v1 model
sh train_depth_normal.sh

# v2 model
sh train_depth_normal_v2.sh

📚 Related Work

We also encourage readers to follow these concurrent exciting works.

Marigold: a finetuned diffusion model for estimating monocular depth.
Wonder3D: generate multi-view normal maps and color images and reconstruct high-fidelity textured mesh.
HyperHuman: a latent structural diffusion and a structure-guided refiner for high-resolution human generation.
GenPercept: a finetuned UNet for a lot of downstream image understanding tasks.
Metric3D v2: a discriminative metric depth and surface normal estimator.
IC-Light: text-conditioned relighting model and background-conditioned relighting model.

🔗 Citation & License

@inproceedings{fu2024geowizard,
  title={GeoWizard: Unleashing the Diffusion Priors for 3D Geometry Estimation from a Single Image},
  author={Fu, Xiao and Yin, Wei and Hu, Mu and Wang, Kaixuan and Ma, Yuexin and Tan, Ping and Shen, Shaojie and Lin, Dahua and Long, Xiaoxiao},
  booktitle={ECCV},
  year={2024}
}

The GeoWizard project is released under the CC BY 4.0 License.

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
assets		assets
bini		bini
geowizard		geowizard
.DS_Store		.DS_Store
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GeoWizard: Unleashing the Diffusion Priors for 3D Geometry Estimation from a Single Image

Project Page | Paper | Hugging Face

News

🛠️ Setup

🤖 Usage

Run inference for depth & normal

Run inference for depth & normal (v2)

Run inference for 3D reconstruction using BiNI algorithm

Training

📚 Related Work

🔗 Citation & License

About

Releases

Packages

Contributors 4

Languages

fuxiao0719/GeoWizard

Folders and files

Latest commit

History

Repository files navigation

GeoWizard: Unleashing the Diffusion Priors for 3D Geometry Estimation from a Single Image

Project Page | Paper | Hugging Face

News

🛠️ Setup

🤖 Usage

Run inference for depth & normal

Run inference for depth & normal (v2)

Run inference for 3D reconstruction using BiNI algorithm

Training

📚 Related Work

🔗 Citation & License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages