DATENeRF: Depth-Aware Text-based Editing of NeRFs

a red buffalo plaid shirt

Instruct-NeRF2NeRF Ours

a starry night canvas

Instruct-NeRF2NeRF Ours

a metallic vase

Instruct-NeRF2NeRF Ours

a b&w checkered pattern table

Instruct-NeRF2NeRF Ours

a teddy bear with a rainbow

Instruct-NeRF2NeRF Ours

a Corgi

Instruct-NeRF2NeRF Ours

Recent diffusion models have demonstrated impressive capabilities for text-based 2D image editing. Applying similar ideas to edit a NeRF scene remains challenging as editing 2D frames individually does not produce multiview-consistent results.

We make the key observation that the geometry of a NeRF scene provides a way to unify these 2D edits. We leverage this geometry in depth-conditioned ControlNet to improve the consistency of individual 2D image edits. Furthermore, we propose an inpainting scheme that uses the NeRF scene depth to propagate 2D edits across images while staying robust to errors and resampling issues.

We demonstrate that this leads to more consistent, realistic and detailed editing results compared to previous state-of-the-art text-based NeRF editing methods.

Overview: Our input is a NeRF (with its posed input images) and per-view editing masks and an edit text prompt. We use the NeRF depth to condition the masked region inpainting. We reproject this edited result to a subsequent viewpoint and using a hybrid inpainting scheme that first only inpaints disoccluded regions and then refines the entire masked region. This is done by changing the inpainting masks (indicated by the blue and orange blocks on the right side) during diffusion.

Instruct-NeRF2NeRF + masks