Guide-and-Rescale
Описание
Official Implementation for "Guide-and-Rescale: Self-Guidance Mechanism for Effective Tuning-Free Real Image Editing"
Языки
- Jupyter Notebook93,4%
- Python6,6%
Guide-and-Rescale: Self-Guidance Mechanism for Effective Tuning-Free Real Image Editing (ECCV 2024)
Despite recent advances in large-scale text-to-image generative models, manipulating real images with these models remains a challenging problem. The main limitations of existing editing methods are that they either fail to perform with consistent quality on a wide range of image edits, or require time-consuming hyperparameter tuning or fine-tuning of the diffusion model to preserve the image-specific appearance of the input image. Most of these approaches utilize source image information via intermediate feature caching which is inserted in generation process as itself. However, such technique produce feature misalignment of the model that leads to inconsistent results. We propose a novel approach that is built upon modified diffusion sampling process via guidance mechanism. In this work, we explore self-guidance technique to preserve the overall structure of the input image and its local regions appearance that should not be edited. In particular, we explicitly introduce layout preserving energy functions that are aimed to save local and global structures of the source image. Additionally, we propose a noise rescaling mechanism that allows to preserve noise distribution by balancing the norms of classifier-free guidance and our proposed guiders during generation. It leads to more consistent and better editing results. Such guiding approach does not require fine-tuning diffusion model and exact inversion process. As a result, the proposed method provides a fast and high quality editing mechanism. In our experiments, we show through human evaluation and quantitative analysis that the proposed method allows to produce desired editing which is more preferable by the human and also achieves a better trade-off between editing quality and preservation of the original image.

Setup
This code uses a pre-trained Stable Diffusion from Diffusers library. We ran our code with Python 3.8.5, PyTorch 2.3.0, Diffuser 0.17.1 on NVIDIA A100 GPU with 40GB RAM.
In order to setup the environment, run:
conda env create -f sd_env.yaml
Conda environment will be created and you can use it.
Quickstart
We provide examples of applying our pipeline to real image editing in Colab .
You can try Grardio demo in HF Spaces .
We also provide a jupyter notebook to try Guide-and-Rescale pipeline on your own server.
Method Diagram
Overall scheme of the proposed method Guide-and-Rescale. First, our method uses a classic ddim inversion of the source real image. Then the method performs real image editing via classical denoising process. For every denoising step the noise term is modified by guider that utilizes latents $z_t$ from current generation process and time-aligned ddim latents $z^*_t$.
Guiders
In our work we propose specific guiders, i.e. guidance signals suitable for editing. The code for these guiders can be found in diffusion_core/guiders/opt_guiders.py.
Every guider is defined as a separate class, that inherits from the parent class . A template for defining a new guider class looks as follows:
class SomeGuider(BaseGuider):
patched: bool
forward_hooks: list
def [grad_fn or calc_energy](self, data_dict):
...
def model_patch(self, model):
...
def single_output_clear(self):
...
grad_fn or calc_energy
The class contains a property . This property is , when the guider does not require any backpropagation over its outputs for retrieving the gradient w.r.t. the current latent (for example, as in classifier-free guidance). In this case, the child class contains a function , where the gradient w.r.t. the current latent is estimated algorithmically.
When the gradient has to be estimated with backpropagation and is (for example, as when using the norm of the difference of attention maps for guidance), the child class contains a function , where the desired energy function output is calculated. This output is further used for backpropagation.
The and functions receive a dictionary () as input. In this dictionary we store all objects (the diffusion model instance, prompts, current latent, etc.) that might be usefull for the guiders in the current pipeline.
model_patch and patched
When the guider requires outputs of intermediate layers of the diffusion model to estimate the energy function/gradient, we define a function in this guider's class and set property equal . We will further refer to such guiders as patched guiders.
This function patches the desired layers of the diffusion model, an retrieves the necesarry output from these layers. This output is then stored in the property of the guider class object. This way it can be accessed by the editing pipeline an stored in for further use in / functions.
forward_hooks
In the editing pipeline we conduct 4 diffusion model forward passes:
- unconditional, from the current latent
: conditional on the initial prompt, from the current latentcur_inv: conditional on the initial prompt, from the corresponding inversion latentinv_inv: conditional on the prompt describing the editing result, from the current latentcur_trg
We store the unconditional prediction in , as well as the ouputs of and forward passes for further use in classifier-free guidance.
However, when the guider is patched, we also have its to store in . In property of the guider class we define the list of forward passes (from the range: , , ), for which we need to store the .
After the specific forward pass is conducted we can access the of the guider and store it in , if the forward pass is listed in . We store it with a key, specifying the current forward pass.
This way we can avoid storing unnecesary s in , as well as distinguish s from different forward passes by their keys.
single_output_clear
This is only relevant for patched guiders.
When the data from the property of the guiders class object is stored in , we need to empty the to avoid exceeding memory limit. For this purpose we define a function. It returns an empty , for example , or an empty list .
References & Acknowledgments
The repository was started from Prompt-to-Prompt.