marigoldmonodepth.github.io - Marigold: Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation

Description: Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation

scene (396) estimation (262) reconstruction (235) depth (49) monocular (29)

Example domain paragraphs

We present Marigold , a diffusion model and associated fine-tuning protocol for monocular depth estimation. Its core principle is to leverage the rich visual knowledge stored in modern generative image models. Our model, derived from Stable Diffusion and fine-tuned with synthetic data, can zero-shot transfer to unseen data, offering state-of-the-art monocular depth estimation results.

The gallery below presents several images from the internet and a comparison of Marigold with the previous state-of-the-art method LeRes. Use the slider and gestures to reveal details on both sides.

Starting from a pretrained Stable Diffusion, we encode the image $x$ and depth $d$ into the latent space using the original Stable Diffusion VAE. We fine-tune just the U-Net by optimizing the standard diffusion objective relative to the depth latent code. Image conditioning is achieved by concatenating the two latent codes before feeding them into the U-Net. The first layer of the U-Net is modified to accept concatenated latent codes.

Links to marigoldmonodepth.github.io (4)