RetroSearch Browse

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from https://diffengine.readthedocs.io/en/latest/autoapi/diffengine/models/editors/index.html below:

Website Navigation

diffengine

diffengine.models.editors — diffengine 1.0.0 documentation

diffengine.models.editors¶ Subpackages¶ Package Contents¶ Classes¶

class diffengine.models.editors.AMUSEd(tokenizer, text_encoder, vae, transformer, model='amused/amused-512', loss=None, transformer_lora_config=None, text_encoder_lora_config=None, prior_loss_weight=1.0, data_preprocessor=None, vae_batch_size=8, *, finetune_text_encoder=False, gradient_checkpointing=False, enable_xformers=False)[source]¶

Bases: mmengine.model.BaseModel

aMUSEd.

Args:¶

tokenizer (dict): Config of tokenizer. text_encoder (dict): Config of text encoder. vae (dict): Config of vae. transformer (dict): Config of transformer. model (str): pretrained model name.

Defaults to “amused/amused-512”.

loss (dict): Config of loss. Defaults to

dict(type='L2Loss', loss_weight=1.0).

transformer_lora_config (dict, optional): The LoRA config dict for

Transformer. example. dict(type=”LoRA”, r=4). type is chosen from LoRA, LoHa, LoKr. Other config are same as the config of PEFT. https://github.com/huggingface/peft Defaults to None.

text_encoder_lora_config (dict, optional): The LoRA config dict for

Text Encoder. example. dict(type=”LoRA”, r=4). type is chosen from LoRA, LoHa, LoKr. Other config are same as the config of PEFT. https://github.com/huggingface/peft Defaults to None.

prior_loss_weight (float): The weight of prior preservation loss.

It works when training dreambooth with class images.

data_preprocessor (dict, optional): The pre-process config of

SDDataPreprocessor.

vae_batch_size (int): The batch size of vae. Defaults to 8. finetune_text_encoder (bool, optional): Whether to fine-tune text

encoder. Defaults to False.

gradient_checkpointing (bool): Whether or not to use gradient

checkpointing to save memory at the expense of slower backward pass. Defaults to False.

enable_xformers (bool): Whether or not to enable memory efficient

attention. Defaults to False.

property device: torch.device¶

Get device information.

Returns:: torch.device
Return type:: device.

set_lora()[source]¶

Set LORA for model.

Return type:: None

prepare_model()[source]¶

Prepare model for training.

Disable gradient for some models.

Return type:: None

set_xformers()[source]¶

Set xformers for model.

Return type:: None

infer(prompt, negative_prompt=None, height=None, width=None, num_inference_steps=12, output_type='pil', **kwargs)[source]¶

Inference function.

Args:¶

prompt (List[str]):

The prompt or prompts to guide the image generation.

negative_prompt (Optional[str]):

The prompt or prompts to guide the image generation. Defaults to None.

height (int, optional):

The height in pixels of the generated image. Defaults to None.

width (int, optional):

The width in pixels of the generated image. Defaults to None.

num_inference_steps (int): Number of inference steps.

Defaults to 12.

output_type (str): The output format of the generate image.

Choose between ‘pil’ and ‘latent’. Defaults to ‘pil’.

**kwargs: Other arguments.

Parameters:

prompt (list[str]) –
negative_prompt (str | None) –
height (int | None) –
width (int | None) –
num_inference_steps (int) –
output_type (str) –

Return type:

list[numpy.ndarray]

val_step(data)[source]¶

Val step.

Parameters:: data (Union[tuple, dict, list]) –
Return type:: list

test_step(data)[source]¶

Test step.

Parameters:: data (Union[tuple, dict, list]) –
Return type:: list

_forward_vae(img, num_batches)[source]¶

Forward vae.

Parameters:

img (torch.Tensor) –
num_batches (int) –

Return type:

torch.Tensor

forward(inputs, data_samples=None, mode='loss')[source]¶

Forward function.

Args:¶

inputs (dict): The input dict. data_samples (Optional[list], optional): The data samples.

Defaults to None.

mode (str, optional): The mode. Defaults to “loss”.

Returns:¶

dict: The loss dict.

Parameters:

inputs (dict) –
data_samples (Optional[list]) –
mode (str) –

Return type:

dict

Parameters:

tokenizer (dict) –
text_encoder (dict) –
vae (dict) –
transformer (dict) –
model (str) –
loss (dict | None) –
transformer_lora_config (dict | None) –
text_encoder_lora_config (dict | None) –
prior_loss_weight (float) –
data_preprocessor (dict | torch.nn.Module | None) –
vae_batch_size (int) –
finetune_text_encoder (bool) –
gradient_checkpointing (bool) –
enable_xformers (bool) –

class diffengine.models.editors.AMUSEdPreprocessor(non_blocking=False)[source]¶

Bases: mmengine.model.base_model.data_preprocessor.BaseDataPreprocessor

AMUSEdPreprocessor.

Parameters:: non_blocking (Optional[bool]) –

forward(data, training=False)[source]¶

Preprocesses the data into the model input format.

After the data pre-processing of cast_data(), forward will stack the input tensor list to a batch tensor at the first dimension.

Args:¶

data (dict): Data returned by dataloader training (bool): Whether to enable training time augmentation.

Returns:¶

dict or list: Data in the same format as the model input.

Parameters:

data (dict) –
training (bool) –

Return type:

dict | list

class diffengine.models.editors.DeepFloydIF(tokenizer, scheduler, text_encoder, unet, model='DeepFloyd/IF-I-XL-v1.0', loss=None, unet_lora_config=None, text_encoder_lora_config=None, prior_loss_weight=1.0, tokenizer_max_length=77, prediction_type=None, data_preprocessor=None, noise_generator=None, timesteps_generator=None, input_perturbation_gamma=0.0, *, finetune_text_encoder=False, gradient_checkpointing=False, enable_xformers=False)[source]¶

Bases: mmengine.model.BaseModel

DeepFloyd/IF.

Args:¶

tokenizer (dict): Config of tokenizer. scheduler (dict): Config of scheduler. text_encoder (dict): Config of text encoder. unet (dict): Config of unet. model (str): pretrained model name of stable diffusion.

Defaults to ‘DeepFloyd/IF-I-XL-v1.0’.

loss (dict): Config of loss. Defaults to

dict(type='L2Loss', loss_weight=1.0).

unet_lora_config (dict, optional): The LoRA config dict for Unet.

example. dict(type=”LoRA”, r=4). type is chosen from LoRA, LoHa, LoKr. Other config are same as the config of PEFT. https://github.com/huggingface/peft Defaults to None.

text_encoder_lora_config (dict, optional): The LoRA config dict for

Text Encoder. example. dict(type=”LoRA”, r=4). type is chosen from LoRA, LoHa, LoKr. Other config are same as the config of PEFT. https://github.com/huggingface/peft Defaults to None.

prior_loss_weight (float): The weight of prior preservation loss.

It works when training dreambooth with class images.

tokenizer_max_length (int): The max length of tokenizer.

Defaults to 77.

prediction_type (str): The prediction_type that shall be used for

training. Choose between ‘epsilon’ or ‘v_prediction’ or leave None. If left to None the default prediction type of the scheduler: noise_scheduler.config.prediciton_type is chosen. Defaults to None.

data_preprocessor (dict, optional): The pre-process config of

SDDataPreprocessor.

noise_generator (dict, optional): The noise generator config.

Defaults to dict(type='WhiteNoise').

timesteps_generator (dict, optional): The timesteps generator config.

Defaults to dict(type='TimeSteps').

input_perturbation_gamma (float): The gamma of input perturbation.

The recommended value is 0.1 for Input Perturbation. Defaults to 0.0.

finetune_text_encoder (bool, optional): Whether to fine-tune text

encoder. Defaults to False.

gradient_checkpointing (bool): Whether or not to use gradient

checkpointing to save memory at the expense of slower backward pass. Defaults to False.

enable_xformers (bool): Whether or not to enable memory efficient

attention. Defaults to False.

property device: torch.device¶

Get device information.

Returns:: torch.device
Return type:: device.

set_lora()[source]¶

Set LORA for model.

Return type:: None

prepare_model()[source]¶

Prepare model for training.

Disable gradient for some models.

Return type:: None

set_xformers()[source]¶

Set xformers for model.

Return type:: None

infer(prompt, negative_prompt=None, height=None, width=None, num_inference_steps=50, output_type='pil', **kwargs)[source]¶

Inference function.

Args:¶

prompt (List[str]):

The prompt or prompts to guide the image generation.

negative_prompt (Optional[str]):

The prompt or prompts to guide the image generation. Defaults to None.

height (int, optional):

The height in pixels of the generated image. Defaults to None.

width (int, optional):

The width in pixels of the generated image. Defaults to None.

num_inference_steps (int): Number of inference steps.

Defaults to 50.

output_type (str): The output format of the generate image.

Choose between ‘pil’ and ‘pt’. Defaults to ‘pil’.

**kwargs: Other arguments.

Parameters:

prompt (list[str]) –
negative_prompt (str | None) –
height (int | None) –
width (int | None) –
num_inference_steps (int) –
output_type (str) –

Return type:

list[numpy.ndarray]

val_step(data)[source]¶

Val step.

Parameters:: data (Union[tuple, dict, list]) –
Return type:: list

test_step(data)[source]¶

Test step.

Parameters:: data (Union[tuple, dict, list]) –
Return type:: list

loss(model_pred, noise, latents, timesteps, weight=None)[source]¶

Calculate loss.

Parameters:

model_pred (torch.Tensor) –
noise (torch.Tensor) –
latents (torch.Tensor) –
timesteps (torch.Tensor) –
weight (torch.Tensor | None) –

Return type:

dict[str, torch.Tensor]

_preprocess_model_input(latents, noise, timesteps)[source]¶

Preprocess model input.

Parameters:

latents (torch.Tensor) –
noise (torch.Tensor) –
timesteps (torch.Tensor) –

Return type:

torch.Tensor

forward(inputs, data_samples=None, mode='loss')[source]¶

Forward function.

Args:¶

inputs (dict): The input dict. data_samples (Optional[list], optional): The data samples.

Defaults to None.

mode (str, optional): The mode. Defaults to “loss”.

Returns:¶

dict: The loss dict.

Parameters:

inputs (dict) –
data_samples (Optional[list]) –
mode (str) –

Return type:

dict

Parameters:

tokenizer (dict) –
scheduler (dict) –
text_encoder (dict) –
unet (dict) –
model (str) –
loss (dict | None) –
unet_lora_config (dict | None) –
text_encoder_lora_config (dict | None) –
prior_loss_weight (float) –
tokenizer_max_length (int) –
prediction_type (str | None) –
data_preprocessor (dict | torch.nn.Module | None) –
noise_generator (dict | None) –
timesteps_generator (dict | None) –
input_perturbation_gamma (float) –
finetune_text_encoder (bool) –
gradient_checkpointing (bool) –
enable_xformers (bool) –

class diffengine.models.editors.DistillSDXL(*args, model_type, unet_lora_config=None, text_encoder_lora_config=None, finetune_text_encoder=False, **kwargs)[source]¶

Bases: diffengine.models.editors.stable_diffusion_xl.StableDiffusionXL

Distill Stable Diffusion XL.

Args:¶

model_type (str): The type of model to use. Choice from sd_tiny,

sd_small.

unet_lora_config (dict, optional): The LoRA config dict for Unet.

example. dict(type=”LoRA”, r=4). type is chosen from LoRA, LoHa, LoKr. Other config are same as the config of PEFT. https://github.com/huggingface/peft Defaults to None.

text_encoder_lora_config (dict, optional): The LoRA config dict for

Text Encoder. example. dict(type=”LoRA”, r=4). type is chosen from LoRA, LoHa, LoKr. Other config are same as the config of PEFT. https://github.com/huggingface/peft Defaults to None.

finetune_text_encoder (bool, optional): Whether to fine-tune text

encoder. This should be False when training ControlNet. Defaults to False.

set_lora()[source]¶

Set LORA for model.

Return type:: None

prepare_model()[source]¶

Prepare model for training.

Disable gradient for some models.

Return type:: None

_prepare_student()[source]¶

Return type:: None

_cast_hook()[source]¶

Return type:: None

set_xformers()[source]¶

Set xformers for model.

Return type:: None

forward(inputs, data_samples=None, mode='loss')[source]¶

Forward function.

Args:¶

inputs (dict): The input dict. data_samples (Optional[list], optional): The data samples.

Defaults to None.

mode (str, optional): The mode. Defaults to “loss”.

Returns:¶

dict: The loss dict.

Parameters:

inputs (dict) –
data_samples (Optional[list]) –
mode (str) –

Return type:

dict

Parameters:

model_type (str) –
unet_lora_config (dict | None) –
text_encoder_lora_config (dict | None) –
finetune_text_encoder (bool) –

class diffengine.models.editors.ESDXL(*args, finetune_text_encoder=False, pre_compute_text_embeddings=True, height=1024, width=1024, negative_guidance=1.0, train_method='full', prediction_type=None, data_preprocessor=None, **kwargs)[source]¶

Bases: diffengine.models.editors.stable_diffusion_xl.StableDiffusionXL

Stable Diffusion XL Erasing Concepts from Diffusion Models.

Args:¶

height (int): Image height. Defaults to 1024. width (int): Image width. Defaults to 1024. negative_guidance (float): Negative guidance for loss. Defaults to 1.0. train_method (str): Training method. Choice from full, xattn,

noxattn, selfattn. Defaults to full

prepare_model()[source]¶

Prepare model for training.

Disable gradient for some models.

Return type:: None

_freeze_unet()[source]¶

Return type:: None

set_xformers()[source]¶

Set xformers for model.

Return type:: None

train(*, mode=True)[source]¶

Convert the model into training mode.

Parameters:: mode (bool) –
Return type:: None

abstract _preprocess_model_input(latents, noise, timesteps)[source]¶

Preprocess model input.

Parameters:

latents (torch.Tensor) –
noise (torch.Tensor) –
timesteps (torch.Tensor) –

Return type:

torch.Tensor

forward(inputs, data_samples=None, mode='loss')[source]¶

Forward function.

Args:¶

inputs (dict): The input dict. data_samples (Optional[list], optional): The data samples.

Defaults to None.

mode (str, optional): The mode. Defaults to “loss”.

Returns:¶

dict: The loss dict.

Parameters:

inputs (dict) –
data_samples (Optional[list]) –
mode (str) –

Return type:

dict

Parameters:

finetune_text_encoder (bool) –
pre_compute_text_embeddings (bool) –
height (int) –
width (int) –
negative_guidance (float) –
train_method (str) –
prediction_type (str | None) –
data_preprocessor (dict | torch.nn.Module | None) –

class diffengine.models.editors.ESDXLDataPreprocessor(non_blocking=False)[source]¶

Bases: mmengine.model.base_model.data_preprocessor.BaseDataPreprocessor

ESDXLDataPreprocessor.

Parameters:: non_blocking (Optional[bool]) –

forward(data, training=False)[source]¶

Preprocesses the data into the model input format.

After the data pre-processing of cast_data(), forward will stack the input tensor list to a batch tensor at the first dimension.

Args:¶

data (dict): Data returned by dataloader training (bool): Whether to enable training time augmentation.

Returns:¶

dict or list: Data in the same format as the model input.

Parameters:

data (dict) –
training (bool) –

Return type:

Union[dict, list]

class diffengine.models.editors.StableDiffusionXLInstructPix2Pix(*args, zeros_image_embeddings_prob=0.1, unet_lora_config=None, text_encoder_lora_config=None, finetune_text_encoder=False, data_preprocessor=None, **kwargs)[source]¶

Bases: diffengine.models.editors.stable_diffusion_xl.StableDiffusionXL

Stable Diffusion XL Instruct Pix2Pix.

Args:¶

zeros_image_embeddings_prob (float): The probabilities to

generate zeros image embeddings. Defaults to 0.1.

unet_lora_config (dict, optional): The LoRA config dict for Unet.

example. dict(type=”LoRA”, r=4). type is chosen from LoRA, LoHa, LoKr. Other config are same as the config of PEFT. https://github.com/huggingface/peft Defaults to None.

text_encoder_lora_config (dict, optional): The LoRA config dict for

Text Encoder. example. dict(type=”LoRA”, r=4). type is chosen from LoRA, LoHa, LoKr. Other config are same as the config of PEFT. https://github.com/huggingface/peft Defaults to None.

finetune_text_encoder (bool, optional): Whether to fine-tune text

encoder. This should be False when training ControlNet. Defaults to False.

data_preprocessor (dict, optional): The pre-process config of

SDControlNetDataPreprocessor.

set_lora()[source]¶

Set LORA for model.

Return type:: None

prepare_model()[source]¶

Prepare model for training.

Disable gradient for some models.

Return type:: None

infer(prompt, condition_image, negative_prompt=None, height=None, width=None, num_inference_steps=50, output_type='pil', **kwargs)[source]¶

Inference function.

Args:¶

prompt (List[str]):

The prompt or prompts to guide the image generation.

condition_image (List[Union[str, Image.Image]]):

The condition image for ControlNet.

negative_prompt (Optional[str]):

The prompt or prompts to guide the image generation. Defaults to None.

height (int, optional):

The height in pixels of the generated image. Defaults to None.

width (int, optional):

The width in pixels of the generated image. Defaults to None.

num_inference_steps (int): Number of inference steps.

Defaults to 50.

output_type (str): The output format of the generate image.

Choose between ‘pil’ and ‘latent’. Defaults to ‘pil’.

**kwargs: Other arguments.

Parameters:

prompt (list[str]) –
condition_image (list[str | PIL.Image.Image]) –
negative_prompt (str | None) –
height (int | None) –
width (int | None) –
num_inference_steps (int) –
output_type (str) –

Return type:

list[numpy.ndarray]

forward(inputs, data_samples=None, mode='loss')[source]¶

Forward function.

Args:¶

inputs (dict): The input dict. data_samples (Optional[list], optional): The data samples.

Defaults to None.

mode (str, optional): The mode. Defaults to “loss”.

Returns:¶

dict: The loss dict.

Parameters:

inputs (dict) –
data_samples (Optional[list]) –
mode (str) –

Return type:

dict

Parameters:

zeros_image_embeddings_prob (float) –
unet_lora_config (dict | None) –
text_encoder_lora_config (dict | None) –
finetune_text_encoder (bool) –
data_preprocessor (dict | torch.nn.Module | None) –

class diffengine.models.editors.IPAdapterXL(*args, image_encoder, image_projection, feature_extractor, pretrained_adapter=None, pretrained_adapter_subfolder='', pretrained_adapter_weights_name='', unet_lora_config=None, text_encoder_lora_config=None, finetune_text_encoder=False, zeros_image_embeddings_prob=0.1, data_preprocessor=None, hidden_states_idx=-2, **kwargs)[source]¶

Bases: diffengine.models.editors.stable_diffusion_xl.StableDiffusionXL

Stable Diffusion XL IP-Adapter.

Args:¶

image_encoder (dict): The image encoder config. image_projection (dict): The image projection config. feature_extractor (dict): The feature extractor config. pretrained_adapter (str, optional): Path to pretrained IP-Adapter.

Defaults to None.

pretrained_adapter_subfolder (str, optional): Sub folder of pretrained

IP-Adapter. Defaults to ‘’.

pretrained_adapter_weights_name (str, optional): Weights name of

pretrained IP-Adapter. Defaults to ‘’.

unet_lora_config (dict, optional): The LoRA config dict for Unet.

example. dict(type=”LoRA”, r=4). type is chosen from LoRA, LoHa, LoKr. Other config are same as the config of PEFT. https://github.com/huggingface/peft Defaults to None.

text_encoder_lora_config (dict, optional): The LoRA config dict for

Text Encoder. example. dict(type=”LoRA”, r=4). type is chosen from LoRA, LoHa, LoKr. Other config are same as the config of PEFT. https://github.com/huggingface/peft Defaults to None.

finetune_text_encoder (bool, optional): Whether to fine-tune text

encoder. This should be False when training ControlNet. Defaults to False.

zeros_image_embeddings_prob (float): The probabilities to

generate zeros image embeddings. Defaults to 0.1.

data_preprocessor (dict, optional): The pre-process config of

SDControlNetDataPreprocessor.

hidden_states_idx (int): Index of the hidden states to be used.

Defaults to -2.

set_lora()[source]¶

Set LORA for model.

Return type:: None

prepare_model()[source]¶

Prepare model for training.

Disable gradient for some models.

Return type:: None

set_ip_adapter()[source]¶

Set IP-Adapter for model.

Return type:: None

infer(prompt, example_image, negative_prompt=None, height=None, width=None, num_inference_steps=50, output_type='pil', **kwargs)[source]¶

Inference function.

Args:¶

prompt (List[str]):

The prompt or prompts to guide the image generation.

example_image (List[Union[str, Image.Image]]):

The image prompt or prompts to guide the image generation.

negative_prompt (Optional[str]):

The prompt or prompts to guide the image generation. Defaults to None.

height (int, optional):

The height in pixels of the generated image. Defaults to None.

width (int, optional):

The width in pixels of the generated image. Defaults to None.

num_inference_steps (int): Number of inference steps.

Defaults to 50.

output_type (str): The output format of the generate image.

Choose between ‘pil’ and ‘latent’. Defaults to ‘pil’.

**kwargs: Other arguments.

Parameters:

prompt (list[str]) –
example_image (list[str | PIL.Image.Image]) –
negative_prompt (str | None) –
height (int | None) –
width (int | None) –
num_inference_steps (int) –
output_type (str) –

Return type:

list[numpy.ndarray]

forward(inputs, data_samples=None, mode='loss')[source]¶

Forward function.

Args:¶

inputs (dict): The input dict. data_samples (Optional[list], optional): The data samples.

Defaults to None.

mode (str, optional): The mode. Defaults to “loss”.

Returns:¶

dict: The loss dict.

Parameters:

inputs (dict) –
data_samples (Optional[list]) –
mode (str) –

Return type:

dict

Parameters:

image_encoder (dict) –
image_projection (dict) –
feature_extractor (dict) –
pretrained_adapter (str | None) –
pretrained_adapter_subfolder (str) –
pretrained_adapter_weights_name (str) –
unet_lora_config (dict | None) –
text_encoder_lora_config (dict | None) –
finetune_text_encoder (bool) –
zeros_image_embeddings_prob (float) –
data_preprocessor (dict | torch.nn.Module | None) –
hidden_states_idx (int) –

class diffengine.models.editors.IPAdapterXLPlus(*args, image_encoder, image_projection, feature_extractor, pretrained_adapter=None, pretrained_adapter_subfolder='', pretrained_adapter_weights_name='', unet_lora_config=None, text_encoder_lora_config=None, finetune_text_encoder=False, zeros_image_embeddings_prob=0.1, data_preprocessor=None, hidden_states_idx=-2, **kwargs)[source]¶

Bases: IPAdapterXL

Stable Diffusion XL IP-Adapter Plus.

Parameters:

image_encoder (dict) –
image_projection (dict) –
feature_extractor (dict) –
pretrained_adapter (str | None) –
pretrained_adapter_subfolder (str) –
pretrained_adapter_weights_name (str) –
unet_lora_config (dict | None) –
text_encoder_lora_config (dict | None) –
finetune_text_encoder (bool) –
zeros_image_embeddings_prob (float) –
data_preprocessor (dict | torch.nn.Module | None) –
hidden_states_idx (int) –

prepare_model()[source]¶

Prepare model for training.

Disable gradient for some models.

Return type:: None

forward(inputs, data_samples=None, mode='loss')[source]¶

Forward function.

Args:¶

inputs (dict): The input dict. data_samples (Optional[list], optional): The data samples.

Defaults to None.

mode (str, optional): The mode. Defaults to “loss”.

Returns:¶

dict: The loss dict.

Parameters:

inputs (dict) –
data_samples (Optional[list]) –
mode (str) –

Return type:

dict

class diffengine.models.editors.IPAdapterXLDataPreprocessor(non_blocking=False)[source]¶

Bases: mmengine.model.base_model.data_preprocessor.BaseDataPreprocessor

IPAdapterXLDataPreprocessor.

Parameters:: non_blocking (Optional[bool]) –

forward(data, training=False)[source]¶

Preprocesses the data into the model input format.

After the data pre-processing of cast_data(), forward will stack the input tensor list to a batch tensor at the first dimension.

Args:¶

data (dict): Data returned by dataloader training (bool): Whether to enable training time augmentation.

Returns:¶

dict or list: Data in the same format as the model input.

Parameters:

data (dict) –
training (bool) –

Return type:

dict | list

class diffengine.models.editors.TimmIPAdapterXLPlus(*args, image_encoder, image_projection, feature_extractor, pretrained_adapter=None, pretrained_adapter_subfolder='', pretrained_adapter_weights_name='', unet_lora_config=None, text_encoder_lora_config=None, finetune_text_encoder=False, zeros_image_embeddings_prob=0.1, data_preprocessor=None, hidden_states_idx=-2, **kwargs)[source]¶

Bases: diffengine.models.editors.ip_adapter.ip_adapter_xl.IPAdapterXLPlus

Stable Diffusion XL IP-Adapter Plus.

Parameters:

image_encoder (dict) –
image_projection (dict) –
feature_extractor (dict) –
pretrained_adapter (str | None) –
pretrained_adapter_subfolder (str) –
pretrained_adapter_weights_name (str) –
unet_lora_config (dict | None) –
text_encoder_lora_config (dict | None) –
finetune_text_encoder (bool) –
zeros_image_embeddings_prob (float) –
data_preprocessor (dict | torch.nn.Module | None) –
hidden_states_idx (int) –

prepare_model()[source]¶

Prepare model for training.

Disable gradient for some models.

Return type:: None

infer(prompt, example_image, negative_prompt=None, height=None, width=None, num_inference_steps=50, output_type='pil', **kwargs)[source]¶

Inference function.

Args:¶

prompt (List[str]):

The prompt or prompts to guide the image generation.

example_image (List[Union[str, Image.Image]]):

The image prompt or prompts to guide the image generation.

negative_prompt (Optional[str]):

The prompt or prompts to guide the image generation. Defaults to None.

height (int, optional):

The height in pixels of the generated image. Defaults to None.

width (int, optional):

The width in pixels of the generated image. Defaults to None.

num_inference_steps (int): Number of inference steps.

Defaults to 50.

output_type (str): The output format of the generate image.

Choose between ‘pil’ and ‘latent’. Defaults to ‘pil’.

**kwargs: Other arguments.

Parameters:

prompt (list[str]) –
example_image (list[str | PIL.Image.Image]) –
negative_prompt (str | None) –
height (int | None) –
width (int | None) –
num_inference_steps (int) –
output_type (str) –

Return type:

list[numpy.ndarray]

forward(inputs, data_samples=None, mode='loss')[source]¶

Forward function.

Args:¶

inputs (dict): The input dict. data_samples (Optional[list], optional): The data samples.

Defaults to None.

mode (str, optional): The mode. Defaults to “loss”.

Returns:¶

dict: The loss dict.

Parameters:

inputs (dict) –
data_samples (list | None) –
mode (str) –

Return type:

dict

class diffengine.models.editors.KandinskyV22Prior(tokenizer, scheduler, text_encoder, image_encoder, prior, decoder_model='kandinsky-community/kandinsky-2-2-decoder', prior_model='kandinsky-community/kandinsky-2-2-prior', loss=None, prior_lora_config=None, prior_loss_weight=1.0, data_preprocessor=None, noise_generator=None, timesteps_generator=None, input_perturbation_gamma=0.0, *, gradient_checkpointing=False, enable_xformers=False)[source]¶

Bases: mmengine.model.BaseModel

KandinskyV22 Prior.

Args:¶

tokenizer (dict): Config of tokenizer. scheduler (dict): Config of scheduler. text_encoder (dict): Config of text encoder. image_encoder (dict): Config of image encoder. prior (dict): Config of prior. decoder_model (str): pretrained model name of decoder.

Defaults to “kandinsky-community/kandinsky-2-2-decoder”.

prior_model (str): pretrained model name of prior.

Defaults to “kandinsky-community/kandinsky-2-2-prior”.

loss (dict): Config of loss. Defaults to

dict(type='L2Loss', loss_weight=1.0).

prior_lora_config (dict, optional): The LoRA config dict for Prior.

example. dict(type=”LoRA”, r=4). type is chosen from LoRA, LoHa, LoKr. Other config are same as the config of PEFT. https://github.com/huggingface/peft Defaults to None.

prior_loss_weight (float): The weight of prior preservation loss.

It works when training dreambooth with class images.

data_preprocessor (dict, optional): The pre-process config of

SDDataPreprocessor.

noise_generator (dict, optional): The noise generator config.

Defaults to dict(type='WhiteNoise').

timesteps_generator (dict, optional): The timesteps generator config.

Defaults to dict(type='TimeSteps').

input_perturbation_gamma (float): The gamma of input perturbation.

The recommended value is 0.1 for Input Perturbation. Defaults to 0.0.

gradient_checkpointing (bool): Whether or not to use gradient

checkpointing to save memory at the expense of slower backward pass. Defaults to False.

enable_xformers (bool): Whether or not to enable memory efficient

attention. Defaults to False.

property device: torch.device¶

Get device information.

Returns:: torch.device
Return type:: device.

set_lora()[source]¶

Set LORA for model.

Return type:: None

prepare_model()[source]¶

Prepare model for training.

Disable gradient for some models.

Return type:: None

set_xformers()[source]¶

Set xformers for model.

Return type:: None

infer(prompt, negative_prompt=None, height=None, width=None, num_inference_steps=50, output_type='pil', **kwargs)[source]¶

Inference function.

Args:¶

prompt (List[str]):

The prompt or prompts to guide the image generation.

negative_prompt (Optional[str]):

The prompt or prompts to guide the image generation. Defaults to None.

height (int, optional):

The height in pixels of the generated image. Defaults to None.

width (int, optional):

The width in pixels of the generated image. Defaults to None.

num_inference_steps (int): Number of inference steps.

Defaults to 50.

output_type (str): The output format of the generate image.

Choose between ‘pil’ and ‘latent’. Defaults to ‘pil’.

**kwargs: Other arguments.

Parameters:

prompt (list[str]) –
negative_prompt (str | None) –
height (int | None) –
width (int | None) –
num_inference_steps (int) –
output_type (str) –

Return type:

list[numpy.ndarray]

val_step(data)[source]¶

Val step.

Parameters:: data (Union[tuple, dict, list]) –
Return type:: list

test_step(data)[source]¶

Test step.

Parameters:: data (Union[tuple, dict, list]) –
Return type:: list

loss(model_pred, noise, latents, timesteps, weight=None)[source]¶

Calculate loss.

Parameters:

model_pred (torch.Tensor) –
noise (torch.Tensor) –
latents (torch.Tensor) –
timesteps (torch.Tensor) –
weight (torch.Tensor | None) –

Return type:

dict[str, torch.Tensor]

_preprocess_model_input(latents, noise, timesteps)[source]¶

Preprocess model input.

Parameters:

latents (torch.Tensor) –
noise (torch.Tensor) –
timesteps (torch.Tensor) –

Return type:

torch.Tensor

forward(inputs, data_samples=None, mode='loss')[source]¶

Forward function.

Args:¶

inputs (dict): The input dict. data_samples (Optional[list], optional): The data samples.

Defaults to None.

mode (str, optional): The mode. Defaults to “loss”.

Returns:¶

dict: The loss dict.

Parameters:

inputs (dict) –
data_samples (Optional[list]) –
mode (str) –

Return type:

dict

Parameters:

tokenizer (dict) –
scheduler (dict) –
text_encoder (dict) –
image_encoder (dict) –
prior (dict) –
decoder_model (str) –
prior_model (str) –
loss (dict | None) –
prior_lora_config (dict | None) –
prior_loss_weight (float) –
data_preprocessor (dict | torch.nn.Module | None) –
noise_generator (dict | None) –
timesteps_generator (dict | None) –
input_perturbation_gamma (float) –
gradient_checkpointing (bool) –
enable_xformers (bool) –

class diffengine.models.editors.KandinskyV22Decoder(scheduler, image_encoder, vae, unet, decoder_model='kandinsky-community/kandinsky-2-2-decoder', prior_model='kandinsky-community/kandinsky-2-2-prior', loss=None, unet_lora_config=None, prior_loss_weight=1.0, prediction_type=None, data_preprocessor=None, noise_generator=None, timesteps_generator=None, input_perturbation_gamma=0.0, vae_batch_size=8, *, gradient_checkpointing=False, enable_xformers=False)[source]¶

Bases: mmengine.model.BaseModel

KandinskyV22 Decoder.

Args:¶

scheduler (dict): Config of scheduler. image_encoder (dict): Config of image encoder. vae (dict): Config of vae. unet (dict): Config of unet. decoder_model (str): pretrained model name of decoder.

Defaults to “kandinsky-community/kandinsky-2-2-decoder”.

prior_model (str): pretrained model name of prior.

Defaults to “kandinsky-community/kandinsky-2-2-prior”.

loss (dict): Config of loss. Defaults to

dict(type='L2Loss', loss_weight=1.0).

unet_lora_config (dict, optional): The LoRA config dict for Unet.

example. dict(type=”LoRA”, r=4). type is chosen from LoRA, LoHa, LoKr. Other config are same as the config of PEFT. https://github.com/huggingface/peft Defaults to None.

prior_loss_weight (float): The weight of prior preservation loss.

It works when training dreambooth with class images.

prediction_type (str): The prediction_type that shall be used for

training. Choose between ‘epsilon’ or ‘v_prediction’ or leave None. If left to None the default prediction type of the scheduler will be used. Defaults to None.

data_preprocessor (dict, optional): The pre-process config of

SDDataPreprocessor.

noise_generator (dict, optional): The noise generator config.

Defaults to dict(type='WhiteNoise').

timesteps_generator (dict, optional): The timesteps generator config.

Defaults to dict(type='TimeSteps').

input_perturbation_gamma (float): The gamma of input perturbation.

The recommended value is 0.1 for Input Perturbation. Defaults to 0.0.

vae_batch_size (int): The batch size of vae. Defaults to 8. gradient_checkpointing (bool): Whether or not to use gradient

checkpointing to save memory at the expense of slower backward pass. Defaults to False.

enable_xformers (bool): Whether or not to enable memory efficient

attention. Defaults to False.

property device: torch.device¶

Get device information.

Returns:: torch.device
Return type:: device.

set_lora()[source]¶

Set LORA for model.

Return type:: None

prepare_model()[source]¶

Prepare model for training.

Disable gradient for some models.

Return type:: None

set_xformers()[source]¶

Set xformers for model.

Return type:: None

infer(prompt, negative_prompt=None, height=None, width=None, num_inference_steps=50, output_type='pil', **kwargs)[source]¶

Inference function.

Args:¶

prompt (List[str]):

The prompt or prompts to guide the image generation.

negative_prompt (Optional[str]):

The prompt or prompts to guide the image generation. Defaults to None.

height (int, optional):

The height in pixels of the generated image. Defaults to None.

width (int, optional):

The width in pixels of the generated image. Defaults to None.

num_inference_steps (int): Number of inference steps.

Defaults to 50.

output_type (str): The output format of the generate image.

Choose between ‘pil’ and ‘latent’. Defaults to ‘pil’.

**kwargs: Other arguments.

Parameters:

prompt (list[str]) –
negative_prompt (str | None) –
height (int | None) –
width (int | None) –
num_inference_steps (int) –
output_type (str) –

Return type:

list[numpy.ndarray]

val_step(data)[source]¶

Val step.

Parameters:: data (Union[tuple, dict, list]) –
Return type:: list

test_step(data)[source]¶

Test step.

Parameters:: data (Union[tuple, dict, list]) –
Return type:: list

loss(model_pred, noise, latents, timesteps, weight=None)[source]¶

Calculate loss.

Parameters:

model_pred (torch.Tensor) –
noise (torch.Tensor) –
latents (torch.Tensor) –
timesteps (torch.Tensor) –
weight (torch.Tensor | None) –

Return type:

dict[str, torch.Tensor]

_preprocess_model_input(latents, noise, timesteps)[source]¶

Preprocess model input.

Parameters:

latents (torch.Tensor) –
noise (torch.Tensor) –
timesteps (torch.Tensor) –

Return type:

torch.Tensor

_forward_vae(img, num_batches)[source]¶

Forward vae.

Parameters:

img (torch.Tensor) –
num_batches (int) –

Return type:

torch.Tensor

forward(inputs, data_samples=None, mode='loss')[source]¶

Forward function.

Args:¶

inputs (dict): The input dict. data_samples (Optional[list], optional): The data samples.

Defaults to None.

mode (str, optional): The mode. Defaults to “loss”.

Returns:¶

dict: The loss dict.

Parameters:

inputs (dict) –
data_samples (Optional[list]) –
mode (str) –

Return type:

dict

Parameters:

scheduler (dict) –
image_encoder (dict) –
vae (dict) –
unet (dict) –
decoder_model (str) –
prior_model (str) –
loss (dict | None) –
unet_lora_config (dict | None) –
prior_loss_weight (float) –
prediction_type (str | None) –
data_preprocessor (dict | torch.nn.Module | None) –
noise_generator (dict | None) –
timesteps_generator (dict | None) –
input_perturbation_gamma (float) –
vae_batch_size (int) –
gradient_checkpointing (bool) –
enable_xformers (bool) –

class diffengine.models.editors.KandinskyV22DecoderDataPreprocessor(non_blocking=False)[source]¶

Bases: mmengine.model.base_model.data_preprocessor.BaseDataPreprocessor

KandinskyV22DecoderDataPreprocessor.

Parameters:: non_blocking (Optional[bool]) –

forward(data, training=False)[source]¶

Preprocesses the data into the model input format.

After the data pre-processing of cast_data(), forward will stack the input tensor list to a batch tensor at the first dimension.

Args:¶

data (dict): Data returned by dataloader training (bool): Whether to enable training time augmentation.

Returns:¶

dict or list: Data in the same format as the model input.

Parameters:

data (dict) –
training (bool) –

Return type:

dict | list

class diffengine.models.editors.KandinskyV3(tokenizer, scheduler, text_encoder, vae, unet, model='kandinsky-community/kandinsky-3', loss=None, unet_lora_config=None, prior_loss_weight=1.0, tokenizer_max_length=128, prediction_type=None, data_preprocessor=None, noise_generator=None, timesteps_generator=None, input_perturbation_gamma=0.0, vae_batch_size=8, *, gradient_checkpointing=False, enable_xformers=False)[source]¶

Bases: mmengine.model.BaseModel

KandinskyV3.

Args:¶

tokenizer (dict): Config of tokenizer. scheduler (dict): Config of scheduler. text_encoder (dict): Config of text encoder. vae (dict): Config of vae. unet (dict): Config of unet. model (str): pretrained model name.

Defaults to “kandinsky-community/kandinsky-3”.

loss (dict): Config of loss. Defaults to

dict(type='L2Loss', loss_weight=1.0).

unet_lora_config (dict, optional): The LoRA config dict for Unet.

example. dict(type=”LoRA”, r=4). type is chosen from LoRA, LoHa, LoKr. Other config are same as the config of PEFT. https://github.com/huggingface/peft Defaults to None.

prior_loss_weight (float): The weight of prior preservation loss.

It works when training dreambooth with class images.

tokenizer_max_length (int): The max length of tokenizer.

Defaults to 128.

prediction_type (str): The prediction_type that shall be used for

training. Choose between ‘epsilon’ or ‘v_prediction’ or leave None. If left to None the default prediction type of the scheduler will be used. Defaults to None.

data_preprocessor (dict, optional): The pre-process config of

SDDataPreprocessor.

noise_generator (dict, optional): The noise generator config.

Defaults to dict(type='WhiteNoise').

timesteps_generator (dict, optional): The timesteps generator config.

Defaults to dict(type='TimeSteps').

input_perturbation_gamma (float): The gamma of input perturbation.

The recommended value is 0.1 for Input Perturbation. Defaults to 0.0.

vae_batch_size (int): The batch size of vae. Defaults to 8. gradient_checkpointing (bool): Whether or not to use gradient

checkpointing to save memory at the expense of slower backward pass. Defaults to False.

enable_xformers (bool): Whether or not to enable memory efficient

attention. Defaults to False.

property device: torch.device¶

Get device information.

Returns:: torch.device
Return type:: device.

set_lora()[source]¶

Set LORA for model.

Return type:: None

prepare_model()[source]¶

Prepare model for training.

Disable gradient for some models.

Return type:: None

set_xformers()[source]¶

Set xformers for model.

Return type:: None

infer(prompt, negative_prompt=None, height=None, width=None, num_inference_steps=50, output_type='pil', **kwargs)[source]¶

Inference function.

Args:¶

prompt (List[str]):

The prompt or prompts to guide the image generation.

negative_prompt (Optional[str]):

The prompt or prompts to guide the image generation. Defaults to None.

height (int, optional):

The height in pixels of the generated image. Defaults to None.

width (int, optional):

The width in pixels of the generated image. Defaults to None.

num_inference_steps (int): Number of inference steps.

Defaults to 50.

output_type (str): The output format of the generate image.

Choose between ‘pil’ and ‘latent’. Defaults to ‘pil’.

**kwargs: Other arguments.

Parameters:

prompt (list[str]) –
negative_prompt (str | None) –
height (int | None) –
width (int | None) –
num_inference_steps (int) –
output_type (str) –

Return type:

list[numpy.ndarray]

val_step(data)[source]¶

Val step.

Parameters:: data (Union[tuple, dict, list]) –
Return type:: list

test_step(data)[source]¶

Test step.

Parameters:: data (Union[tuple, dict, list]) –
Return type:: list

loss(model_pred, noise, latents, timesteps, weight=None)[source]¶

Calculate loss.

Parameters:

model_pred (torch.Tensor) –
noise (torch.Tensor) –
latents (torch.Tensor) –
timesteps (torch.Tensor) –
weight (torch.Tensor | None) –

Return type:

dict[str, torch.Tensor]

_preprocess_model_input(latents, noise, timesteps)[source]¶

Preprocess model input.

Parameters:

latents (torch.Tensor) –
noise (torch.Tensor) –
timesteps (torch.Tensor) –

Return type:

torch.Tensor

_forward_vae(img, num_batches)[source]¶

Forward vae.

Parameters:

img (torch.Tensor) –
num_batches (int) –

Return type:

torch.Tensor

forward(inputs, data_samples=None, mode='loss')[source]¶

Forward function.

Args:¶

inputs (dict): The input dict. data_samples (Optional[list], optional): The data samples.

Defaults to None.

mode (str, optional): The mode. Defaults to “loss”.

Returns:¶

dict: The loss dict.

Parameters:

inputs (dict) –
data_samples (Optional[list]) –
mode (str) –

Return type:

dict

Parameters:

tokenizer (dict) –
scheduler (dict) –
text_encoder (dict) –
vae (dict) –
unet (dict) –
model (str) –
loss (dict | None) –
unet_lora_config (dict | None) –
prior_loss_weight (float) –
tokenizer_max_length (int) –
prediction_type (str | None) –
data_preprocessor (dict | torch.nn.Module | None) –
noise_generator (dict | None) –
timesteps_generator (dict | None) –
input_perturbation_gamma (float) –
vae_batch_size (int) –
gradient_checkpointing (bool) –
enable_xformers (bool) –

class diffengine.models.editors.LatentConsistencyModelsXL(*args, timesteps_generator=None, num_ddim_timesteps=50, w_min=3.0, w_max=15.0, ema_type='ExponentialMovingAverage', ema_momentum=0.05, **kwargs)[source]¶

Bases: diffengine.models.editors.stable_diffusion_xl.StableDiffusionXL

Stable Diffusion XL Latent Consistency Models.

Args:¶

timesteps_generator (dict, optional): The timesteps generator config.

Defaults to dict(type='DDIMTimeSteps').

num_ddim_timesteps (int): Number of DDIM timesteps. Defaults to 50. w_min (float): Minimum guidance scale. Defaults to 3.0. w_max (float): Maximum guidance scale. Defaults to 15.0. ema_type (str): The type of EMA.

Defaults to ‘ExponentialMovingAverage’.

ema_momentum (float): The EMA momentum. Defaults to 0.05.

prepare_model()[source]¶

Prepare model for training.

Disable gradient for some models.

Return type:: None

set_xformers()[source]¶

Set xformers for model.

Return type:: None

infer(prompt, height=None, width=None, num_inference_steps=4, guidance_scale=1.0, output_type='pil', **kwargs)[source]¶

Inference function.

Args:¶

prompt (List[str]):

The prompt or prompts to guide the image generation.

negative_prompt (Optional[str]):

The prompt or prompts to guide the image generation. Defaults to None.

height (int, optional):

The height in pixels of the generated image. Defaults to None.

width (int, optional):

The width in pixels of the generated image. Defaults to None.

num_inference_steps (int): Number of inference steps.

Defaults to 50.

guidance_scale (float): The guidance scale. Defaults to 1.0. output_type (str): The output format of the generate image.

Choose between ‘pil’ and ‘latent’. Defaults to ‘pil’.

**kwargs: Other arguments.

Parameters:

prompt (list[str]) –
height (int | None) –
width (int | None) –
num_inference_steps (int) –
guidance_scale (float) –
output_type (str) –

Return type:

list[numpy.ndarray]

loss(model_pred, gt, timesteps, weight=None)[source]¶

Calculate loss.

Parameters:

model_pred (torch.Tensor) –
gt (torch.Tensor) –
timesteps (torch.Tensor) –
weight (torch.Tensor | None) –

Return type:

dict[str, torch.Tensor]

forward(inputs, data_samples=None, mode='loss')[source]¶

Forward function.

Args:¶

inputs (dict): The input dict. data_samples (Optional[list], optional): The data samples.

Defaults to None.

mode (str, optional): The mode. Defaults to “loss”.

Returns:¶

dict: The loss dict.

Parameters:

inputs (dict) –
data_samples (Optional[list]) –
mode (str) –

Return type:

dict

_predicted_origin(model_output, timesteps, sample)[source]¶

Predict the origin of the model output.

Args:¶

model_output (torch.Tensor): The model output. timesteps (torch.Tensor): The timesteps. sample (torch.Tensor): The sample.

Parameters:

model_output (torch.Tensor) –
timesteps (torch.Tensor) –
sample (torch.Tensor) –

Return type:

torch.Tensor

Parameters:

timesteps_generator (dict | None) –
num_ddim_timesteps (int) –
w_min (float) –
w_max (float) –
ema_type (str) –
ema_momentum (float) –

class diffengine.models.editors.PixArtAlpha(tokenizer, scheduler, text_encoder, vae, transformer, model='PixArt-alpha/PixArt-XL-2-1024-MS', loss=None, transformer_lora_config=None, text_encoder_lora_config=None, prior_loss_weight=1.0, tokenizer_max_length=120, prediction_type=None, data_preprocessor=None, noise_generator=None, timesteps_generator=None, input_perturbation_gamma=0.0, vae_batch_size=8, *, finetune_text_encoder=False, gradient_checkpointing=False, enable_xformers=False)[source]¶

Bases: mmengine.model.BaseModel

PixArt Alpha.

Args:¶

tokenizer (dict): Config of tokenizer. scheduler (dict): Config of scheduler. text_encoder (dict): Config of text encoder. vae (dict): Config of vae. transformer (dict): Config of transformer. model (str): pretrained model name of stable diffusion.

Defaults to ‘PixArt-alpha/PixArt-XL-2-1024-MS’.

loss (dict): Config of loss. Defaults to

dict(type='L2Loss', loss_weight=1.0).

transformer_lora_config (dict, optional): The LoRA config dict for

Transformer. example. dict(type=”LoRA”, r=4). type is chosen from LoRA, LoHa, LoKr. Other config are same as the config of PEFT. https://github.com/huggingface/peft Defaults to None.

text_encoder_lora_config (dict, optional): The LoRA config dict for

Text Encoder. example. dict(type=”LoRA”, r=4). type is chosen from LoRA, LoHa, LoKr. Other config are same as the config of PEFT. https://github.com/huggingface/peft Defaults to None.

prior_loss_weight (float): The weight of prior preservation loss.

It works when training dreambooth with class images.

tokenizer_max_length (int): The max length of tokenizer.

Defaults to 120.

prediction_type (str): The prediction_type that shall be used for

training. Choose between ‘epsilon’ or ‘v_prediction’ or leave None. If left to None the default prediction type of the scheduler will be used. Defaults to None.

data_preprocessor (dict, optional): The pre-process config of

PixArtAlphaDataPreprocessor.

noise_generator (dict, optional): The noise generator config.

Defaults to dict(type='WhiteNoise').

timesteps_generator (dict, optional): The timesteps generator config.

Defaults to dict(type='TimeSteps').

input_perturbation_gamma (float): The gamma of input perturbation.

The recommended value is 0.1 for Input Perturbation. Defaults to 0.0.

vae_batch_size (int): The batch size of vae. Defaults to 8. finetune_text_encoder (bool, optional): Whether to fine-tune text

encoder. Defaults to False.

gradient_checkpointing (bool): Whether or not to use gradient

checkpointing to save memory at the expense of slower backward pass. Defaults to False.

enable_xformers (bool): Whether or not to enable memory efficient

attention. Defaults to False.

property device: torch.device¶

Get device information.

Returns:: torch.device
Return type:: device.

set_lora()[source]¶

Set LORA for model.

Return type:: None

prepare_model()[source]¶

Prepare model for training.

Disable gradient for some models.

Return type:: None

set_xformers()[source]¶

Set xformers for model.

Return type:: None

infer(prompt, negative_prompt=None, height=None, width=None, num_inference_steps=50, output_type='pil', **kwargs)[source]¶

Inference function.

Args:¶

prompt (List[str]):

The prompt or prompts to guide the image generation.

negative_prompt (Optional[str]):

The prompt or prompts to guide the image generation. Defaults to None.

height (int, optional):

The height in pixels of the generated image. Defaults to None.

width (int, optional):

The width in pixels of the generated image. Defaults to None.

num_inference_steps (int): Number of inference steps.

Defaults to 50.

output_type (str): The output format of the generate image.

Choose between ‘pil’ and ‘latent’. Defaults to ‘pil’.

**kwargs: Other arguments.

Parameters:

prompt (list[str]) –
negative_prompt (str | None) –
height (int | None) –
width (int | None) –
num_inference_steps (int) –
output_type (str) –

Return type:

list[numpy.ndarray]

val_step(data)[source]¶

Val step.

Parameters:: data (Union[tuple, dict, list]) –
Return type:: list

test_step(data)[source]¶

Test step.

Parameters:: data (Union[tuple, dict, list]) –
Return type:: list

loss(model_pred, noise, latents, timesteps, weight=None)[source]¶

Calculate loss.

Parameters:

model_pred (torch.Tensor) –
noise (torch.Tensor) –
latents (torch.Tensor) –
timesteps (torch.Tensor) –
weight (torch.Tensor | None) –

Return type:

dict[str, torch.Tensor]

_preprocess_model_input(latents, noise, timesteps)[source]¶

Preprocess model input.

Parameters:

latents (torch.Tensor) –
noise (torch.Tensor) –
timesteps (torch.Tensor) –

Return type:

torch.Tensor

_forward_vae(img, num_batches)[source]¶

Forward vae.

Parameters:

img (torch.Tensor) –
num_batches (int) –

Return type:

torch.Tensor

forward(inputs, data_samples=None, mode='loss')[source]¶

Forward function.

Args:¶

inputs (dict): The input dict. data_samples (Optional[list], optional): The data samples.

Defaults to None.

mode (str, optional): The mode. Defaults to “loss”.

Returns:¶

dict: The loss dict.

Parameters:

inputs (dict) –
data_samples (Optional[list]) –
mode (str) –

Return type:

dict

Parameters:

tokenizer (dict) –
scheduler (dict) –
text_encoder (dict) –
vae (dict) –
transformer (dict) –
model (str) –
loss (dict | None) –
transformer_lora_config (dict | None) –
text_encoder_lora_config (dict | None) –
prior_loss_weight (float) –
tokenizer_max_length (int) –
prediction_type (str | None) –
data_preprocessor (dict | torch.nn.Module | None) –
noise_generator (dict | None) –
timesteps_generator (dict | None) –
input_perturbation_gamma (float) –
vae_batch_size (int) –
finetune_text_encoder (bool) –
gradient_checkpointing (bool) –
enable_xformers (bool) –

class diffengine.models.editors.PixArtAlphaDataPreprocessor(non_blocking=False)[source]¶

Bases: mmengine.model.base_model.data_preprocessor.BaseDataPreprocessor

PixArtAlphaDataPreprocessor.

Parameters:: non_blocking (Optional[bool]) –

forward(data, training=False)[source]¶

Preprocesses the data into the model input format.

After the data pre-processing of cast_data(), forward will stack the input tensor list to a batch tensor at the first dimension.

Args:¶

data (dict): Data returned by dataloader training (bool): Whether to enable training time augmentation.

Returns:¶

dict or list: Data in the same format as the model input.

Parameters:

data (dict) –
training (bool) –

Return type:

dict | list

class diffengine.models.editors.SSD1B(tokenizer_one, tokenizer_two, scheduler, text_encoder_one, text_encoder_two, vae, teacher_unet, student_unet, model='stabilityai/stable-diffusion-xl-base-1.0', loss=None, unet_lora_config=None, text_encoder_lora_config=None, prior_loss_weight=1.0, prediction_type=None, data_preprocessor=None, noise_generator=None, timesteps_generator=None, input_perturbation_gamma=0.0, vae_batch_size=8, *, finetune_text_encoder=False, gradient_checkpointing=False, pre_compute_text_embeddings=False, enable_xformers=False, student_weight_from_teacher=False)[source]¶

Bases: diffengine.models.editors.stable_diffusion_xl.StableDiffusionXL

SSD1B.

Refer to official implementation: https://github.com/segmind/SSD-1B/blob/main/distill_sdxl.py

Args:¶

tokenizer_one (dict): Config of tokenizer one. tokenizer_two (dict): Config of tokenizer two. scheduler (dict): Config of scheduler. text_encoder_one (dict): Config of text encoder one. text_encoder_two (dict): Config of text encoder two. vae (dict): Config of vae. teacher_unet (dict): Config of teacher unet. student_unet (dict): Config of student unet. model (str): pretrained model name of stable diffusion xl.

Defaults to ‘stabilityai/stable-diffusion-xl-base-1.0’.

vae_model (str, optional): Path to pretrained VAE model with better

numerical stability. More details: https://github.com/huggingface/diffusers/pull/4038. Defaults to None.

loss (dict): Config of loss. Defaults to

dict(type='L2Loss', loss_weight=1.0).

unet_lora_config (dict, optional): The LoRA config dict for Unet.

example. dict(type=”LoRA”, r=4). type is chosen from LoRA, LoHa, LoKr. Other config are same as the config of PEFT. https://github.com/huggingface/peft Defaults to None.

text_encoder_lora_config (dict, optional): The LoRA config dict for

Text Encoder. example. dict(type=”LoRA”, r=4). type is chosen from LoRA, LoHa, LoKr. Other config are same as the config of PEFT. https://github.com/huggingface/peft Defaults to None.

prior_loss_weight (float): The weight of prior preservation loss.

It works when training dreambooth with class images.

prediction_type (str): The prediction_type that shall be used for

training. Choose between ‘epsilon’ or ‘v_prediction’ or leave None. If left to None the default prediction type of the scheduler: noise_scheduler.config.prediciton_type is chosen. Defaults to None.

data_preprocessor (dict, optional): The pre-process config of

SDXLDataPreprocessor.

noise_generator (dict, optional): The noise generator config.

Defaults to dict(type='WhiteNoise').

timesteps_generator (dict, optional): The timesteps generator config.

Defaults to dict(type='TimeSteps').

input_perturbation_gamma (float): The gamma of input perturbation.

The recommended value is 0.1 for Input Perturbation. Defaults to 0.0.

vae_batch_size (int): The batch size of vae. Defaults to 8. finetune_text_encoder (bool, optional): Whether to fine-tune text

encoder. Defaults to False.

gradient_checkpointing (bool): Whether or not to use gradient

checkpointing to save memory at the expense of slower backward pass. Defaults to False.

pre_compute_text_embeddings(bool): Whether or not to pre-compute text

embeddings to save memory. Defaults to False.

enable_xformers (bool): Whether or not to enable memory efficient

attention. Defaults to False.

student_weight_from_teacher (bool): Whether or not to initialize

student model with teacher model. Defaults to False.

set_lora()[source]¶

Set LORA for model.

Return type:: None

prepare_model()[source]¶

Prepare model for training.

Disable gradient for some models.

Return type:: None

_cast_hook()[source]¶

Return type:: None

set_xformers()[source]¶

Set xformers for model.

Return type:: None

_forward_vae(img, num_batches)[source]¶

Forward vae.

Parameters:

img (torch.Tensor) –
num_batches (int) –

Return type:

torch.Tensor

forward(inputs, data_samples=None, mode='loss')[source]¶

Forward function.

Args:¶

inputs (dict): The input dict. data_samples (Optional[list], optional): The data samples.

Defaults to None.

mode (str, optional): The mode. Defaults to “loss”.

Returns:¶

dict: The loss dict.

Parameters:

inputs (dict) –
data_samples (Optional[list]) –
mode (str) –

Return type:

dict

Parameters:

tokenizer_one (dict) –
tokenizer_two (dict) –
scheduler (dict) –
text_encoder_one (dict) –
text_encoder_two (dict) –
vae (dict) –
teacher_unet (dict) –
student_unet (dict) –
model (str) –
loss (dict | None) –
unet_lora_config (dict | None) –
text_encoder_lora_config (dict | None) –
prior_loss_weight (float) –
prediction_type (str | None) –
data_preprocessor (dict | torch.nn.Module | None) –
noise_generator (dict | None) –
timesteps_generator (dict | None) –
input_perturbation_gamma (float) –
vae_batch_size (int) –
finetune_text_encoder (bool) –
gradient_checkpointing (bool) –
pre_compute_text_embeddings (bool) –
enable_xformers (bool) –
student_weight_from_teacher (bool) –

class diffengine.models.editors.StableDiffusion(tokenizer, scheduler, text_encoder, vae, unet, model='runwayml/stable-diffusion-v1-5', loss=None, unet_lora_config=None, text_encoder_lora_config=None, prior_loss_weight=1.0, prediction_type=None, data_preprocessor=None, noise_generator=None, timesteps_generator=None, input_perturbation_gamma=0.0, vae_batch_size=8, *, finetune_text_encoder=False, gradient_checkpointing=False, enable_xformers=False)[source]¶

Bases: mmengine.model.BaseModel

Stable Diffusion.

Args:¶

tokenizer (dict): Config of tokenizer. scheduler (dict): Config of scheduler. text_encoder (dict): Config of text encoder. vae (dict): Config of vae. unet (dict): Config of unet. model (str): pretrained model name of stable diffusion.

Defaults to ‘runwayml/stable-diffusion-v1-5’.

loss (dict): Config of loss. Defaults to

dict(type='L2Loss', loss_weight=1.0).

unet_lora_config (dict, optional): The LoRA config dict for Unet.

example. dict(type=”LoRA”, r=4). type is chosen from LoRA, LoHa, LoKr. Other config are same as the config of PEFT. https://github.com/huggingface/peft Defaults to None.

text_encoder_lora_config (dict, optional): The LoRA config dict for

Text Encoder. example. dict(type=”LoRA”, r=4). type is chosen from LoRA, LoHa, LoKr. Other config are same as the config of PEFT. https://github.com/huggingface/peft Defaults to None.

prior_loss_weight (float): The weight of prior preservation loss.

It works when training dreambooth with class images.

prediction_type (str): The prediction_type that shall be used for

training. Choose between ‘epsilon’ or ‘v_prediction’ or leave None. If left to None the default prediction type of the scheduler will be used. Defaults to None.

data_preprocessor (dict, optional): The pre-process config of

SDDataPreprocessor.

noise_generator (dict, optional): The noise generator config.

Defaults to dict(type='WhiteNoise').

timesteps_generator (dict, optional): The timesteps generator config.

Defaults to dict(type='TimeSteps').

input_perturbation_gamma (float): The gamma of input perturbation.

The recommended value is 0.1 for Input Perturbation. Defaults to 0.0.

vae_batch_size (int): The batch size of vae. Defaults to 8. finetune_text_encoder (bool, optional): Whether to fine-tune text

encoder. Defaults to False.

gradient_checkpointing (bool): Whether or not to use gradient

checkpointing to save memory at the expense of slower backward pass. Defaults to False.

enable_xformers (bool): Whether or not to enable memory efficient

attention. Defaults to False.

property device: torch.device¶

Get device information.

Returns:: torch.device
Return type:: device.

set_lora()[source]¶

Set LORA for model.

Return type:: None

prepare_model()[source]¶

Prepare model for training.

Disable gradient for some models.

Return type:: None

set_xformers()[source]¶

Set xformers for model.

Return type:: None

infer(prompt, negative_prompt=None, height=None, width=None, num_inference_steps=50, output_type='pil', **kwargs)[source]¶

Inference function.

Args:¶

prompt (List[str]):

The prompt or prompts to guide the image generation.

negative_prompt (Optional[str]):

The prompt or prompts to guide the image generation. Defaults to None.

height (int, optional):

The height in pixels of the generated image. Defaults to None.

width (int, optional):

The width in pixels of the generated image. Defaults to None.

num_inference_steps (int): Number of inference steps.

Defaults to 50.

output_type (str): The output format of the generate image.

Choose between ‘pil’ and ‘latent’. Defaults to ‘pil’.

**kwargs: Other arguments.

Parameters:

prompt (list[str]) –
negative_prompt (str | None) –
height (int | None) –
width (int | None) –
num_inference_steps (int) –
output_type (str) –

Return type:

list[numpy.ndarray]

val_step(data)[source]¶

Val step.

Parameters:: data (Union[tuple, dict, list]) –
Return type:: list

test_step(data)[source]¶

Test step.

Parameters:: data (Union[tuple, dict, list]) –
Return type:: list

loss(model_pred, noise, latents, timesteps, weight=None)[source]¶

Calculate loss.

Parameters:

model_pred (torch.Tensor) –
noise (torch.Tensor) –
latents (torch.Tensor) –
timesteps (torch.Tensor) –
weight (torch.Tensor | None) –

Return type:

dict[str, torch.Tensor]

_preprocess_model_input(latents, noise, timesteps)[source]¶

Preprocess model input.

Parameters:

latents (torch.Tensor) –
noise (torch.Tensor) –
timesteps (torch.Tensor) –

Return type:

torch.Tensor

_forward_vae(img, num_batches)[source]¶

Forward vae.

Parameters:

img (torch.Tensor) –
num_batches (int) –

Return type:

torch.Tensor

forward(inputs, data_samples=None, mode='loss')[source]¶

Forward function.

Args:¶

inputs (dict): The input dict. data_samples (Optional[list], optional): The data samples.

Defaults to None.

mode (str, optional): The mode. Defaults to “loss”.

Returns:¶

dict: The loss dict.

Parameters:

inputs (dict) –
data_samples (Optional[list]) –
mode (str) –

Return type:

dict

Parameters:

tokenizer (dict) –
scheduler (dict) –
text_encoder (dict) –
vae (dict) –
unet (dict) –
model (str) –
loss (dict | None) –
unet_lora_config (dict | None) –
text_encoder_lora_config (dict | None) –
prior_loss_weight (float) –
prediction_type (str | None) –
data_preprocessor (dict | torch.nn.Module | None) –
noise_generator (dict | None) –
timesteps_generator (dict | None) –
input_perturbation_gamma (float) –
vae_batch_size (int) –
finetune_text_encoder (bool) –
gradient_checkpointing (bool) –
enable_xformers (bool) –

class diffengine.models.editors.SDDataPreprocessor(non_blocking=False)[source]¶

Bases: mmengine.model.base_model.data_preprocessor.BaseDataPreprocessor

SDDataPreprocessor.

Parameters:: non_blocking (Optional[bool]) –

forward(data, training=False)[source]¶

Preprocesses the data into the model input format.

After the data pre-processing of cast_data(), forward will stack the input tensor list to a batch tensor at the first dimension.

Args:¶

data (dict): Data returned by dataloader training (bool): Whether to enable training time augmentation.

Returns:¶

dict or list: Data in the same format as the model input.

Parameters:

data (dict) –
training (bool) –

Return type:

dict | list

class diffengine.models.editors.StableDiffusionControlNet(*args, controlnet_model=None, transformer_layers_per_block=None, unet_lora_config=None, text_encoder_lora_config=None, finetune_text_encoder=False, data_preprocessor=None, **kwargs)[source]¶

Bases: diffengine.models.editors.stable_diffusion.StableDiffusion

Stable Diffusion ControlNet.

Args:¶

controlnet_model (str, optional): Path to pretrained ControlNet model.

If None, use the default ControlNet model from Unet. Defaults to None.

transformer_layers_per_block (List[int], optional):

The number of layers per block in the transformer. More details: https://huggingface.co/diffusers/controlnet-canny-sdxl-1.0-small. Defaults to None.

unet_lora_config (dict, optional): The LoRA config dict for Unet.

example. dict(type=”LoRA”, r=4). type is chosen from LoRA, LoHa, LoKr. Other config are same as the config of PEFT. https://github.com/huggingface/peft Defaults to None.

text_encoder_lora_config (dict, optional): The LoRA config dict for

Text Encoder. example. dict(type=”LoRA”, r=4). type is chosen from LoRA, LoHa, LoKr. Other config are same as the config of PEFT. https://github.com/huggingface/peft Defaults to None.

finetune_text_encoder (bool, optional): Whether to fine-tune text

encoder. This should be False when training ControlNet. Defaults to False.

data_preprocessor (dict, optional): The pre-process config of

SDControlNetDataPreprocessor.

set_lora()[source]¶

Set LORA for model.

Return type:: None

prepare_model()[source]¶

Prepare model for training.

Disable gradient for some models.

Return type:: None

set_xformers()[source]¶

Set xformers for model.

Return type:: None

infer(prompt, condition_image, negative_prompt=None, height=None, width=None, num_inference_steps=50, output_type='pil', **kwargs)[source]¶

Inference function.

Args:¶

prompt (List[str]):

The prompt or prompts to guide the image generation.

condition_image (List[Union[str, Image.Image]]):

The condition image for ControlNet.

negative_prompt (Optional[str]):

The prompt or prompts to guide the image generation. Defaults to None.

height (int, optional):

The height in pixels of the generated image. Defaults to None.

width (int, optional):

The width in pixels of the generated image. Defaults to None.

num_inference_steps (int): Number of inference steps.

Defaults to 50.

output_type (str): The output format of the generate image.

Choose between ‘pil’ and ‘latent’. Defaults to ‘pil’.

**kwargs: Other arguments.

Parameters:

prompt (list[str]) –
condition_image (list[str | PIL.Image.Image]) –
negative_prompt (str | None) –
height (int | None) –
width (int | None) –
num_inference_steps (int) –
output_type (str) –

Return type:

list[numpy.ndarray]

_forward_compile(noisy_latents, timesteps, encoder_hidden_states, inputs)[source]¶

Forward function for torch.compile.

Parameters:

noisy_latents (torch.Tensor) –
timesteps (torch.Tensor) –
encoder_hidden_states (torch.Tensor) –
inputs (dict) –

Return type:

torch.Tensor

forward(inputs, data_samples=None, mode='loss')[source]¶

Forward function.

Args:¶

inputs (dict): The input dict. data_samples (Optional[list], optional): The data samples.

Defaults to None.

mode (str, optional): The mode. Defaults to “loss”.

Returns:¶

dict: The loss dict.

Parameters:

inputs (dict) –
data_samples (Optional[list]) –
mode (str) –

Return type:

dict

Parameters:

controlnet_model (str | None) –
transformer_layers_per_block (list[int] | None) –
unet_lora_config (dict | None) –
text_encoder_lora_config (dict | None) –
finetune_text_encoder (bool) –
data_preprocessor (dict | torch.nn.Module | None) –

class diffengine.models.editors.SDControlNetDataPreprocessor(non_blocking=False)[source]¶

Bases: mmengine.model.base_model.data_preprocessor.BaseDataPreprocessor

SDControlNetDataPreprocessor.

Parameters:: non_blocking (Optional[bool]) –

forward(data, training=False)[source]¶

Preprocesses the data into the model input format.

After the data pre-processing of cast_data(), forward will stack the input tensor list to a batch tensor at the first dimension.

Args:¶

data (dict): Data returned by dataloader training (bool): Whether to enable training time augmentation.

Returns:¶

dict or list: Data in the same format as the model input.

Parameters:

data (dict) –
training (bool) –

Return type:

dict | list

class diffengine.models.editors.SDInpaintDataPreprocessor(non_blocking=False)[source]¶

Bases: mmengine.model.base_model.data_preprocessor.BaseDataPreprocessor

SDInpaintDataPreprocessor.

Parameters:: non_blocking (Optional[bool]) –

forward(data, training=False)[source]¶

Preprocesses the data into the model input format.

After the data pre-processing of cast_data(), forward will stack the input tensor list to a batch tensor at the first dimension.

Args:¶

data (dict): Data returned by dataloader training (bool): Whether to enable training time augmentation.

Returns:¶

dict or list: Data in the same format as the model input.

Parameters:

data (dict) –
training (bool) –

Return type:

dict | list

class diffengine.models.editors.StableDiffusionInpaint(*args, model='runwayml/stable-diffusion-inpainting', data_preprocessor=None, **kwargs)[source]¶

Bases: diffengine.models.editors.stable_diffusion.StableDiffusion

Stable Diffusion Inpaint.

Args:¶

model (str): pretrained model name of stable diffusion.

Defaults to ‘runwayml/stable-diffusion-v1-5’.

data_preprocessor (dict, optional): The pre-process config of

SDInpaintDataPreprocessor.

prepare_model()[source]¶

Prepare model for training.

Disable gradient for some models.

Return type:: None

infer(prompt, image, mask, negative_prompt=None, height=None, width=None, num_inference_steps=50, output_type='pil', **kwargs)[source]¶

Inference function.

Args:¶

prompt (List[str]):

The prompt or prompts to guide the image generation.

image (List[Union[str, Image.Image]]):

The image for inpainting.

mask (List[Union[str, Image.Image]]):

The mask for inpainting.

negative_prompt (Optional[str]):

The prompt or prompts to guide the image generation. Defaults to None.

height (int, optional):

The height in pixels of the generated image. Defaults to None.

width (int, optional):

The width in pixels of the generated image. Defaults to None.

num_inference_steps (int): Number of inference steps.

Defaults to 50.

output_type (str): The output format of the generate image.

Choose between ‘pil’ and ‘latent’. Defaults to ‘pil’.

**kwargs: Other arguments.

Parameters:

prompt (list[str]) –
image (list[str | PIL.Image.Image]) –
mask (list[str | PIL.Image.Image]) –
negative_prompt (str | None) –
height (int | None) –
width (int | None) –
num_inference_steps (int) –
output_type (str) –

Return type:

list[numpy.ndarray]

forward(inputs, data_samples=None, mode='loss')[source]¶

Forward function.

Args:¶

inputs (dict): The input dict. data_samples (Optional[list], optional): The data samples.

Defaults to None.

mode (str, optional): The mode. Defaults to “loss”.

Returns:¶

dict: The loss dict.

Parameters:

inputs (dict) –
data_samples (Optional[list]) –
mode (str) –

Return type:

dict

Parameters:

model (str) –
data_preprocessor (dict | torch.nn.Module | None) –

class diffengine.models.editors.StableDiffusionXL(tokenizer_one, tokenizer_two, scheduler, text_encoder_one, text_encoder_two, vae, unet, model='stabilityai/stable-diffusion-xl-base-1.0', loss=None, unet_lora_config=None, text_encoder_lora_config=None, prior_loss_weight=1.0, prediction_type=None, data_preprocessor=None, noise_generator=None, timesteps_generator=None, input_perturbation_gamma=0.0, vae_batch_size=8, *, finetune_text_encoder=False, gradient_checkpointing=False, pre_compute_text_embeddings=False, enable_xformers=False)[source]¶

Bases: mmengine.model.BaseModel

`Stable Diffusion XL.

<https://huggingface.co/papers/2307.01952>`_

Args:¶

tokenizer_one (dict): Config of tokenizer one. tokenizer_two (dict): Config of tokenizer two. scheduler (dict): Config of scheduler. text_encoder_one (dict): Config of text encoder one. text_encoder_two (dict): Config of text encoder two. vae (dict): Config of vae. unet (dict): Config of unet. model (str): pretrained model name of stable diffusion xl.

Defaults to ‘stabilityai/stable-diffusion-xl-base-1.0’.

loss (dict): Config of loss. Defaults to

dict(type='L2Loss', loss_weight=1.0).

unet_lora_config (dict, optional): The LoRA config dict for Unet.

example. dict(type=”LoRA”, r=4). type is chosen from LoRA, LoHa, LoKr. Other config are same as the config of PEFT. https://github.com/huggingface/peft Defaults to None.

text_encoder_lora_config (dict, optional): The LoRA config dict for

Text Encoder. example. dict(type=”LoRA”, r=4). type is chosen from LoRA, LoHa, LoKr. Other config are same as the config of PEFT. https://github.com/huggingface/peft Defaults to None.

prior_loss_weight (float): The weight of prior preservation loss.

It works when training dreambooth with class images.

prediction_type (str): The prediction_type that shall be used for

training. Choose between ‘epsilon’ or ‘v_prediction’ or leave None. If left to None the default prediction type of the scheduler: noise_scheduler.config.prediciton_type is chosen. Defaults to None.

data_preprocessor (dict, optional): The pre-process config of

SDXLDataPreprocessor.

noise_generator (dict, optional): The noise generator config.

Defaults to dict(type='WhiteNoise').

timesteps_generator (dict, optional): The timesteps generator config.

Defaults to dict(type='TimeSteps').

input_perturbation_gamma (float): The gamma of input perturbation.

The recommended value is 0.1 for Input Perturbation. Defaults to 0.0.

vae_batch_size (int): The batch size of vae. Defaults to 8. finetune_text_encoder (bool, optional): Whether to fine-tune text

encoder. Defaults to False.

gradient_checkpointing (bool): Whether or not to use gradient

checkpointing to save memory at the expense of slower backward pass. Defaults to False.

pre_compute_text_embeddings (bool): Whether or not to pre-compute text

embeddings to save memory. Defaults to False.

enable_xformers (bool): Whether or not to enable memory efficient

attention. Defaults to False.

property device: torch.device¶

Get device information.

Returns:: torch.device
Return type:: device.

set_lora()[source]¶

Set LORA for model.

Return type:: None

prepare_model()[source]¶

Prepare model for training.

Disable gradient for some models.

Return type:: None

set_xformers()[source]¶

Set xformers for model.

Return type:: None

infer(prompt, negative_prompt=None, height=None, width=None, num_inference_steps=50, output_type='pil', **kwargs)[source]¶

Inference function.

Args:¶

prompt (List[str]):

The prompt or prompts to guide the image generation.

negative_prompt (Optional[str]):

The prompt or prompts to guide the image generation. Defaults to None.

height (int, optional):

The height in pixels of the generated image. Defaults to None.

width (int, optional):

The width in pixels of the generated image. Defaults to None.

num_inference_steps (int): Number of inference steps.

Defaults to 50.

output_type (str): The output format of the generate image.

Choose between ‘pil’ and ‘latent’. Defaults to ‘pil’.

**kwargs: Other arguments.

Parameters:

prompt (list[str]) –
negative_prompt (str | None) –
height (int | None) –
width (int | None) –
num_inference_steps (int) –
output_type (str) –

Return type:

list[numpy.ndarray]

encode_prompt(text_one, text_two)[source]¶

Encode prompt.

Args:¶

text_one (torch.Tensor): Token ids from tokenizer one. text_two (torch.Tensor): Token ids from tokenizer two.

Returns:¶

tuple[torch.Tensor, torch.Tensor]: Prompt embeddings

Parameters:

text_one (torch.Tensor) –
text_two (torch.Tensor) –

Return type:

tuple[torch.Tensor, torch.Tensor]

val_step(data)[source]¶

Val step.

Parameters:: data (Union[tuple, dict, list]) –
Return type:: list

test_step(data)[source]¶

Test step.

Parameters:: data (Union[tuple, dict, list]) –
Return type:: list

loss(model_pred, noise, latents, timesteps, weight=None)[source]¶

Calculate loss.

Parameters:

model_pred (torch.Tensor) –
noise (torch.Tensor) –
latents (torch.Tensor) –
timesteps (torch.Tensor) –
weight (torch.Tensor | None) –

Return type:

dict[str, torch.Tensor]

_preprocess_model_input(latents, noise, timesteps)[source]¶

Preprocess model input.

Parameters:

latents (torch.Tensor) –
noise (torch.Tensor) –
timesteps (torch.Tensor) –

Return type:

torch.Tensor

_forward_vae(img, num_batches)[source]¶

Forward vae.

Parameters:

img (torch.Tensor) –
num_batches (int) –

Return type:

torch.Tensor

forward(inputs, data_samples=None, mode='loss')[source]¶

Forward function.

Args:¶

inputs (dict): The input dict. data_samples (Optional[list], optional): The data samples.

Defaults to None.

mode (str, optional): The mode. Defaults to “loss”.

Returns:¶

dict: The loss dict.

Parameters:

inputs (dict) –
data_samples (Optional[list]) –
mode (str) –

Return type:

dict

Parameters:

tokenizer_one (dict) –
tokenizer_two (dict) –
scheduler (dict) –
text_encoder_one (dict) –
text_encoder_two (dict) –
vae (dict) –
unet (dict) –
model (str) –
loss (dict | None) –
unet_lora_config (dict | None) –
text_encoder_lora_config (dict | None) –
prior_loss_weight (float) –
prediction_type (str | None) –
data_preprocessor (dict | torch.nn.Module | None) –
noise_generator (dict | None) –
timesteps_generator (dict | None) –
input_perturbation_gamma (float) –
vae_batch_size (int) –
finetune_text_encoder (bool) –
gradient_checkpointing (bool) –
pre_compute_text_embeddings (bool) –
enable_xformers (bool) –

class diffengine.models.editors.SDXLDataPreprocessor(non_blocking=False)[source]¶

Bases: mmengine.model.base_model.data_preprocessor.BaseDataPreprocessor

SDXLDataPreprocessor.

Parameters:: non_blocking (Optional[bool]) –

forward(data, training=False)[source]¶

Preprocesses the data into the model input format.

After the data pre-processing of cast_data(), forward will stack the input tensor list to a batch tensor at the first dimension.

Args:¶

data (dict): Data returned by dataloader training (bool): Whether to enable training time augmentation.

Returns:¶

dict or list: Data in the same format as the model input.

Parameters:

data (dict) –
training (bool) –

Return type:

dict | list

class diffengine.models.editors.SDXLControlNetDataPreprocessor(non_blocking=False)[source]¶

Bases: mmengine.model.base_model.data_preprocessor.BaseDataPreprocessor

SDXLControlNetDataPreprocessor.

Parameters:: non_blocking (Optional[bool]) –

forward(data, training=False)[source]¶

Preprocesses the data into the model input format.

After the data pre-processing of cast_data(), forward will stack the input tensor list to a batch tensor at the first dimension.

Args:¶

data (dict): Data returned by dataloader training (bool): Whether to enable training time augmentation.

Returns:¶

dict or list: Data in the same format as the model input.

Parameters:

data (dict) –
training (bool) –

Return type:

dict | list

class diffengine.models.editors.StableDiffusionXLControlNet(*args, controlnet_model=None, transformer_layers_per_block=None, unet_lora_config=None, text_encoder_lora_config=None, finetune_text_encoder=False, data_preprocessor=None, **kwargs)[source]¶

Bases: diffengine.models.editors.stable_diffusion_xl.StableDiffusionXL

Stable Diffusion XL ControlNet.

Args:¶

controlnet_model (str, optional): Path to pretrained ControlNet model.

If None, use the default ControlNet model from Unet. Defaults to None.

transformer_layers_per_block (List[int], optional):

The number of layers per block in the transformer. More details: https://huggingface.co/diffusers/controlnet-canny-sdxl-1.0-small. Defaults to None.

unet_lora_config (dict, optional): The LoRA config dict for Unet.

example. dict(type=”LoRA”, r=4). type is chosen from LoRA, LoHa, LoKr. Other config are same as the config of PEFT. https://github.com/huggingface/peft Defaults to None.

text_encoder_lora_config (dict, optional): The LoRA config dict for

Text Encoder. example. dict(type=”LoRA”, r=4). type is chosen from LoRA, LoHa, LoKr. Other config are same as the config of PEFT. https://github.com/huggingface/peft Defaults to None.

finetune_text_encoder (bool, optional): Whether to fine-tune text

encoder. This should be False when training ControlNet. Defaults to False.

data_preprocessor (dict, optional): The pre-process config of

SDControlNetDataPreprocessor.

set_lora()[source]¶

Set LORA for model.

Return type:: None

prepare_model()[source]¶

Prepare model for training.

Disable gradient for some models.

Return type:: None

set_xformers()[source]¶

Set xformers for model.

Return type:: None

infer(prompt, condition_image, negative_prompt=None, height=None, width=None, num_inference_steps=50, output_type='pil', **kwargs)[source]¶

Inference function.

Args:¶

prompt (List[str]):

The prompt or prompts to guide the image generation.

condition_image (List[Union[str, Image.Image]]):

The condition image for ControlNet.

negative_prompt (Optional[str]):

The prompt or prompts to guide the image generation. Defaults to None.

height (int, optional):

The height in pixels of the generated image. Defaults to None.

width (int, optional):

The width in pixels of the generated image. Defaults to None.

num_inference_steps (int): Number of inference steps.

Defaults to 50.

output_type (str): The output format of the generate image.

Choose between ‘pil’ and ‘latent’. Defaults to ‘pil’.

**kwargs: Other arguments.

Parameters:

prompt (list[str]) –
condition_image (list[str | PIL.Image.Image]) –
negative_prompt (str | None) –
height (int | None) –
width (int | None) –
num_inference_steps (int) –
output_type (str) –

Return type:

list[numpy.ndarray]

_forward_compile(noisy_latents, timesteps, prompt_embeds, unet_added_conditions, inputs)[source]¶

Forward function for torch.compile.

Parameters:

noisy_latents (torch.Tensor) –
timesteps (torch.Tensor) –
prompt_embeds (torch.Tensor) –
unet_added_conditions (dict) –
inputs (dict) –

Return type:

torch.Tensor

forward(inputs, data_samples=None, mode='loss')[source]¶

Forward function.

Args:¶

inputs (dict): The input dict. data_samples (Optional[list], optional): The data samples.

Defaults to None.

mode (str, optional): The mode. Defaults to “loss”.

Returns:¶

dict: The loss dict.

Parameters:

inputs (dict) –
data_samples (Optional[list]) –
mode (str) –

Return type:

dict

Parameters:

controlnet_model (str | None) –
transformer_layers_per_block (list[int] | None) –
unet_lora_config (dict | None) –
text_encoder_lora_config (dict | None) –
finetune_text_encoder (bool) –
data_preprocessor (dict | torch.nn.Module | None) –

class diffengine.models.editors.StableDiffusionXLDPO(*args, beta_dpo=5000, loss=None, data_preprocessor=None, **kwargs)[source]¶

Bases: diffengine.models.editors.stable_diffusion_xl.StableDiffusionXL

Stable Diffusion XL DPO.

Args:¶

beta_dpo (int): DPO KL Divergence penalty. Defaults to 5000. loss (dict, optional): The loss config. Defaults to None. data_preprocessor (dict, optional): The pre-process config of

SDXLDPODataPreprocessor.

prepare_model()[source]¶

Prepare model for training.

Disable gradient for some models.

Return type:: None

loss(model_pred, ref_pred, noise, latents, timesteps, weight=None)[source]¶

Calculate loss.

Parameters:

model_pred (torch.Tensor) –
ref_pred (torch.Tensor) –
noise (torch.Tensor) –
latents (torch.Tensor) –
timesteps (torch.Tensor) –
weight (torch.Tensor | None) –

Return type:

dict[str, torch.Tensor]

forward(inputs, data_samples=None, mode='loss')[source]¶

Forward function.

Args:¶

inputs (dict): The input dict. data_samples (Optional[list], optional): The data samples.

Defaults to None.

mode (str, optional): The mode. Defaults to “loss”.

Returns:¶

dict: The loss dict.

Parameters:

inputs (dict) –
data_samples (Optional[list]) –
mode (str) –

Return type:

dict

Parameters:

beta_dpo (int) –
loss (dict | None) –
data_preprocessor (dict | torch.nn.Module | None) –

class diffengine.models.editors.SDXLDPODataPreprocessor(non_blocking=False)[source]¶

Bases: mmengine.model.base_model.data_preprocessor.BaseDataPreprocessor

SDXLDataPreprocessor.

Parameters:: non_blocking (Optional[bool]) –

forward(data, training=False)[source]¶

Preprocesses the data into the model input format.

After the data pre-processing of cast_data(), forward will stack the input tensor list to a batch tensor at the first dimension.

Args:¶

data (dict): Data returned by dataloader training (bool): Whether to enable training time augmentation.

Returns:¶

dict or list: Data in the same format as the model input.

Parameters:

data (dict) –
training (bool) –

Return type:

dict | list

class diffengine.models.editors.StableDiffusionXLInpaint(*args, model='diffusers/stable-diffusion-xl-1.0-inpainting-0.1', data_preprocessor=None, **kwargs)[source]¶

Bases: diffengine.models.editors.stable_diffusion_xl.StableDiffusionXL

Stable Diffusion XL Inpaint.

Args:¶

model (str): pretrained model name of stable diffusion.

Defaults to ‘diffusers/stable-diffusion-xl-1.0-inpainting-0.1’.

data_preprocessor (dict, optional): The pre-process config of

SDXLInpaintDataPreprocessor.

prepare_model()[source]¶

Prepare model for training.

Disable gradient for some models.

Return type:: None

infer(prompt, image, mask, negative_prompt=None, height=None, width=None, num_inference_steps=50, output_type='pil', **kwargs)[source]¶

Inference function.

Args:¶

prompt (List[str]):

The prompt or prompts to guide the image generation.

image (List[Union[str, Image.Image]]):

The image for inpainting.

mask (List[Union[str, Image.Image]]):

The mask for inpainting.

negative_prompt (Optional[str]):

The prompt or prompts to guide the image generation. Defaults to None.

height (int, optional):

The height in pixels of the generated image. Defaults to None.

width (int, optional):

The width in pixels of the generated image. Defaults to None.

num_inference_steps (int): Number of inference steps.

Defaults to 50.

output_type (str): The output format of the generate image.

Choose between ‘pil’ and ‘latent’. Defaults to ‘pil’.

**kwargs: Other arguments.

Parameters:

prompt (list[str]) –
image (list[str | PIL.Image.Image]) –
mask (list[str | PIL.Image.Image]) –
negative_prompt (str | None) –
height (int | None) –
width (int | None) –
num_inference_steps (int) –
output_type (str) –

Return type:

list[numpy.ndarray]

forward(inputs, data_samples=None, mode='loss')[source]¶

Forward function.

Args:¶

inputs (dict): The input dict. data_samples (Optional[list], optional): The data samples.

Defaults to None.

mode (str, optional): The mode. Defaults to “loss”.

Returns:¶

dict: The loss dict.

Parameters:

inputs (dict) –
data_samples (Optional[list]) –
mode (str) –

Return type:

dict

Parameters:

model (str) –
data_preprocessor (dict | torch.nn.Module | None) –

class diffengine.models.editors.SDXLInpaintDataPreprocessor(non_blocking=False)[source]¶

Bases: mmengine.model.base_model.data_preprocessor.BaseDataPreprocessor

SDXLInpaintDataPreprocessor.

Parameters:: non_blocking (Optional[bool]) –

forward(data, training=False)[source]¶

Preprocesses the data into the model input format.

After the data pre-processing of cast_data(), forward will stack the input tensor list to a batch tensor at the first dimension.

Args:¶

data (dict): Data returned by dataloader training (bool): Whether to enable training time augmentation.

Returns:¶

dict or list: Data in the same format as the model input.

Parameters:

data (dict) –
training (bool) –

Return type:

dict | list

class diffengine.models.editors.StableDiffusionXLT2IAdapter(*args, adapter, unet_lora_config=None, text_encoder_lora_config=None, finetune_text_encoder=False, timesteps_generator=None, data_preprocessor=None, **kwargs)[source]¶

Bases: diffengine.models.editors.stable_diffusion_xl.StableDiffusionXL

Stable Diffusion XL T2I Adapter.

Args:¶

adapter (dict): The adapter config. unet_lora_config (dict, optional): The LoRA config dict for Unet.

example. dict(type=”LoRA”, r=4). type is chosen from LoRA, LoHa, LoKr. Other config are same as the config of PEFT. https://github.com/huggingface/peft Defaults to None.

text_encoder_lora_config (dict, optional): The LoRA config dict for

Text Encoder. example. dict(type=”LoRA”, r=4). type is chosen from LoRA, LoHa, LoKr. Other config are same as the config of PEFT. https://github.com/huggingface/peft Defaults to None.

finetune_text_encoder (bool, optional): Whether to fine-tune text

encoder. This should be False when training ControlNet. Defaults to False.

timesteps_generator (dict, optional): The timesteps generator config.

Defaults to dict(type='CubicSamplingTimeSteps').

data_preprocessor (dict, optional): The pre-process config of

SDControlNetDataPreprocessor.

set_lora()[source]¶

Set LORA for model.

Return type:: None

prepare_model()[source]¶

Prepare model for training.

Disable gradient for some models.

Return type:: None

set_xformers()[source]¶

Set xformers for model.

Return type:: None

infer(prompt, condition_image, negative_prompt=None, height=None, width=None, num_inference_steps=50, output_type='pil', **kwargs)[source]¶

Inference function.

Args:¶

prompt (List[str]):

The prompt or prompts to guide the image generation.

condition_image (List[Union[str, Image.Image]]):

The condition image for ControlNet.

negative_prompt (Optional[str]):

The prompt or prompts to guide the image generation. Defaults to None.

height (int, optional):

The height in pixels of the generated image. Defaults to None.

width (int, optional):

The width in pixels of the generated image. Defaults to None.

num_inference_steps (int): Number of inference steps.

Defaults to 50.

output_type (str): The output format of the generate image.

Choose between ‘pil’ and ‘latent’. Defaults to ‘pil’.

**kwargs: Other arguments.

Parameters:

prompt (list[str]) –
condition_image (list[str | PIL.Image.Image]) –
negative_prompt (str | None) –
height (int | None) –
width (int | None) –
num_inference_steps (int) –
output_type (str) –

Return type:

list[numpy.ndarray]

_forward_compile(noisy_latents, timesteps, prompt_embeds, unet_added_conditions, inputs)[source]¶

Forward function for torch.compile.

Parameters:

noisy_latents (torch.Tensor) –
timesteps (torch.Tensor) –
prompt_embeds (torch.Tensor) –
unet_added_conditions (dict) –
inputs (dict) –

Return type:

torch.Tensor

forward(inputs, data_samples=None, mode='loss')[source]¶

Forward function.

Args:¶

inputs (dict): The input dict. data_samples (Optional[list], optional): The data samples.

Defaults to None.

mode (str, optional): The mode. Defaults to “loss”.

Returns:¶

dict: The loss dict.

Parameters:

inputs (dict) –
data_samples (Optional[list]) –
mode (str) –

Return type:

dict

Parameters:

adapter (dict) –
unet_lora_config (dict | None) –
text_encoder_lora_config (dict | None) –
finetune_text_encoder (bool) –
timesteps_generator (dict | None) –
data_preprocessor (dict | torch.nn.Module | None) –

class diffengine.models.editors.WuerstchenPriorModel(tokenizer, scheduler, text_encoder, image_encoder, prior, decoder_model='warp-ai/wuerstchen', prior_model='warp-ai/wuerstchen-prior', loss=None, prior_lora_config=None, text_encoder_lora_config=None, prior_loss_weight=1.0, data_preprocessor=None, noise_generator=None, timesteps_generator=None, input_perturbation_gamma=0.0, *, finetune_text_encoder=False, gradient_checkpointing=False)[source]¶

Bases: mmengine.model.BaseModel

`Wuerstchen Prior.

<https://arxiv.org/abs/2306.00637>`_

Args:¶

tokenizer (dict): Config of tokenizer. scheduler (dict): Config of scheduler. text_encoder (dict): Config of text encoder. image_encoder (dict): Config of image encoder. prior (dict): Config of prior. decoder_model (str): pretrained decoder model name of Wuerstchen.

Defaults to ‘warp-ai/wuerstchen’.

prior_model (str): pretrained prior model name of Wuerstchen.

Defaults to ‘warp-ai/wuerstchen-prior’.

loss (dict): Config of loss. Defaults to

dict(type='L2Loss', loss_weight=1.0).

prior_lora_config (dict, optional): The LoRA config dict for Prior.

example. dict(type=”LoRA”, r=4). type is chosen from LoRA, LoHa, LoKr. Other config are same as the config of PEFT. https://github.com/huggingface/peft Defaults to None.

text_encoder_lora_config (dict, optional): The LoRA config dict for

Text Encoder. example. dict(type=”LoRA”, r=4). type is chosen from LoRA, LoHa, LoKr. Other config are same as the config of PEFT. https://github.com/huggingface/peft Defaults to None.

prior_loss_weight (float): The weight of prior preservation loss.

It works when training dreambooth with class images.

data_preprocessor (dict, optional): The pre-process config of

SDDataPreprocessor.

noise_generator (dict, optional): The noise generator config.

Defaults to dict(type='WhiteNoise').

timesteps_generator (dict, optional): The timesteps generator config.

Defaults to dict(type='WuerstchenRandomTimeSteps').

input_perturbation_gamma (float): The gamma of input perturbation.

The recommended value is 0.1 for Input Perturbation. Defaults to 0.0.

finetune_text_encoder (bool, optional): Whether to fine-tune text

encoder. Defaults to False.

gradient_checkpointing (bool): Whether or not to use gradient

checkpointing to save memory at the expense of slower backward pass. Defaults to False.

property device: torch.device¶

Get device information.

Returns:: torch.device
Return type:: device.

set_lora()[source]¶

Set LORA for model.

Return type:: None

prepare_model()[source]¶

Prepare model for training.

Disable gradient for some models.

Return type:: None

train(*, mode=True)[source]¶

Convert the model into training mode.

Parameters:: mode (bool) –
Return type:: None

infer(prompt, negative_prompt=None, height=None, width=None, num_inference_steps=50, output_type='pil', **kwargs)[source]¶

Inference function.

Args:¶

prompt (List[str]):

The prompt or prompts to guide the image generation.

negative_prompt (Optional[str]):

The prompt or prompts to guide the image generation. Defaults to None.

height (int, optional):

The height in pixels of the generated image. Defaults to None.

width (int, optional):

The width in pixels of the generated image. Defaults to None.

num_inference_steps (int): Number of inference steps.

Defaults to 50.

output_type (str): The output format of the generate image.

Choose between ‘pil’ and ‘latent’. Defaults to ‘pil’.

**kwargs: Other arguments.

Parameters:

prompt (list[str]) –
negative_prompt (str | None) –
height (int | None) –
width (int | None) –
num_inference_steps (int) –
output_type (str) –

Return type:

list[numpy.ndarray]

val_step(data)[source]¶

Val step.

Parameters:: data (Union[tuple, dict, list]) –
Return type:: list

test_step(data)[source]¶

Test step.

Parameters:: data (Union[tuple, dict, list]) –
Return type:: list

loss(model_pred, noise, timesteps, weight=None)[source]¶

Calculate loss.

Parameters:

model_pred (torch.Tensor) –
noise (torch.Tensor) –
timesteps (torch.Tensor) –
weight (torch.Tensor | None) –

Return type:

dict[str, torch.Tensor]

_preprocess_model_input(latents, noise, timesteps)[source]¶

Preprocess model input.

Parameters:

latents (torch.Tensor) –
noise (torch.Tensor) –
timesteps (torch.Tensor) –

Return type:

torch.Tensor

forward(inputs, data_samples=None, mode='loss')[source]¶

Forward function.

Args:¶

inputs (dict): The input dict. data_samples (Optional[list], optional): The data samples.

Defaults to None.

mode (str, optional): The mode. Defaults to “loss”.

Returns:¶

dict: The loss dict.

Parameters:

inputs (dict) –
data_samples (Optional[list]) –
mode (str) –

Return type:

dict

Parameters:

tokenizer (dict) –
scheduler (dict) –
text_encoder (dict) –
image_encoder (dict) –
prior (dict) –
decoder_model (str) –
prior_model (str) –
loss (dict | None) –
prior_lora_config (dict | None) –
text_encoder_lora_config (dict | None) –
prior_loss_weight (float) –
data_preprocessor (dict | torch.nn.Module | None) –
noise_generator (dict | None) –
timesteps_generator (dict | None) –
input_perturbation_gamma (float) –
finetune_text_encoder (bool) –
gradient_checkpointing (bool) –

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4