diffengine.models.editors
¶ Subpackages¶ Package Contents¶ Classes¶
Bases: mmengine.model.BaseModel
aMUSEd.
Args:¶tokenizer (dict): Config of tokenizer. text_encoder (dict): Config of text encoder. vae (dict): Config of vae. transformer (dict): Config of transformer. model (str): pretrained model name.
Defaults to “amused/amused-512”.
- loss (dict): Config of loss. Defaults to
dict(type='L2Loss', loss_weight=1.0)
.- transformer_lora_config (dict, optional): The LoRA config dict for
Transformer. example. dict(type=”LoRA”, r=4). type is chosen from LoRA, LoHa, LoKr. Other config are same as the config of PEFT. https://github.com/huggingface/peft Defaults to None.
- text_encoder_lora_config (dict, optional): The LoRA config dict for
Text Encoder. example. dict(type=”LoRA”, r=4). type is chosen from LoRA, LoHa, LoKr. Other config are same as the config of PEFT. https://github.com/huggingface/peft Defaults to None.
- prior_loss_weight (float): The weight of prior preservation loss.
It works when training dreambooth with class images.
- data_preprocessor (dict, optional): The pre-process config of
vae_batch_size (int): The batch size of vae. Defaults to 8. finetune_text_encoder (bool, optional): Whether to fine-tune text
encoder. Defaults to False.
- gradient_checkpointing (bool): Whether or not to use gradient
checkpointing to save memory at the expense of slower backward pass. Defaults to False.
- enable_xformers (bool): Whether or not to enable memory efficient
attention. Defaults to False.
Get device information.
torch.device
device.
Set LORA for model.
None
Prepare model for training.
Disable gradient for some models.
None
Set xformers for model.
None
Inference function.
Args:¶
- prompt (List[str]):
The prompt or prompts to guide the image generation.
- negative_prompt (Optional[str]):
The prompt or prompts to guide the image generation. Defaults to None.
- height (int, optional):
The height in pixels of the generated image. Defaults to None.
- width (int, optional):
The width in pixels of the generated image. Defaults to None.
- num_inference_steps (int): Number of inference steps.
Defaults to 12.
- output_type (str): The output format of the generate image.
Choose between ‘pil’ and ‘latent’. Defaults to ‘pil’.
**kwargs: Other arguments.
prompt (list[str]) –
negative_prompt (str | None) –
height (int | None) –
width (int | None) –
num_inference_steps (int) –
output_type (str) –
list[numpy.ndarray]
Val step.
data (Union[tuple, dict, list]) –
list
Test step.
data (Union[tuple, dict, list]) –
list
Forward vae.
img (torch.Tensor) –
num_batches (int) –
torch.Tensor
Forward function.
Args:¶Returns:¶inputs (dict): The input dict. data_samples (Optional[list], optional): The data samples.
Defaults to None.
mode (str, optional): The mode. Defaults to “loss”.
dict: The loss dict.
inputs (dict) –
data_samples (Optional[list]) –
mode (str) –
dict
tokenizer (dict) –
text_encoder (dict) –
vae (dict) –
transformer (dict) –
model (str) –
loss (dict | None) –
transformer_lora_config (dict | None) –
text_encoder_lora_config (dict | None) –
prior_loss_weight (float) –
data_preprocessor (dict | torch.nn.Module | None) –
vae_batch_size (int) –
finetune_text_encoder (bool) –
gradient_checkpointing (bool) –
enable_xformers (bool) –
Bases: mmengine.model.base_model.data_preprocessor.BaseDataPreprocessor
AMUSEdPreprocessor.
non_blocking (Optional[bool]) –
Preprocesses the data into the model input format.
After the data pre-processing of cast_data()
, forward
will stack the input tensor list to a batch tensor at the first dimension.
Returns:¶data (dict): Data returned by dataloader training (bool): Whether to enable training time augmentation.
dict or list: Data in the same format as the model input.
data (dict) –
training (bool) –
dict | list
Bases: mmengine.model.BaseModel
DeepFloyd/IF.
Args:¶tokenizer (dict): Config of tokenizer. scheduler (dict): Config of scheduler. text_encoder (dict): Config of text encoder. unet (dict): Config of unet. model (str): pretrained model name of stable diffusion.
Defaults to ‘DeepFloyd/IF-I-XL-v1.0’.
- loss (dict): Config of loss. Defaults to
dict(type='L2Loss', loss_weight=1.0)
.- unet_lora_config (dict, optional): The LoRA config dict for Unet.
example. dict(type=”LoRA”, r=4). type is chosen from LoRA, LoHa, LoKr. Other config are same as the config of PEFT. https://github.com/huggingface/peft Defaults to None.
- text_encoder_lora_config (dict, optional): The LoRA config dict for
Text Encoder. example. dict(type=”LoRA”, r=4). type is chosen from LoRA, LoHa, LoKr. Other config are same as the config of PEFT. https://github.com/huggingface/peft Defaults to None.
- prior_loss_weight (float): The weight of prior preservation loss.
It works when training dreambooth with class images.
- tokenizer_max_length (int): The max length of tokenizer.
Defaults to 77.
- prediction_type (str): The prediction_type that shall be used for
training. Choose between ‘epsilon’ or ‘v_prediction’ or leave None. If left to None the default prediction type of the scheduler: noise_scheduler.config.prediciton_type is chosen. Defaults to None.
- data_preprocessor (dict, optional): The pre-process config of
- noise_generator (dict, optional): The noise generator config.
Defaults to
dict(type='WhiteNoise')
.- timesteps_generator (dict, optional): The timesteps generator config.
Defaults to
dict(type='TimeSteps')
.- input_perturbation_gamma (float): The gamma of input perturbation.
The recommended value is 0.1 for Input Perturbation. Defaults to 0.0.
- finetune_text_encoder (bool, optional): Whether to fine-tune text
encoder. Defaults to False.
- gradient_checkpointing (bool): Whether or not to use gradient
checkpointing to save memory at the expense of slower backward pass. Defaults to False.
- enable_xformers (bool): Whether or not to enable memory efficient
attention. Defaults to False.
Get device information.
torch.device
device.
Set LORA for model.
None
Prepare model for training.
Disable gradient for some models.
None
Set xformers for model.
None
Inference function.
Args:¶
- prompt (List[str]):
The prompt or prompts to guide the image generation.
- negative_prompt (Optional[str]):
The prompt or prompts to guide the image generation. Defaults to None.
- height (int, optional):
The height in pixels of the generated image. Defaults to None.
- width (int, optional):
The width in pixels of the generated image. Defaults to None.
- num_inference_steps (int): Number of inference steps.
Defaults to 50.
- output_type (str): The output format of the generate image.
Choose between ‘pil’ and ‘pt’. Defaults to ‘pil’.
**kwargs: Other arguments.
prompt (list[str]) –
negative_prompt (str | None) –
height (int | None) –
width (int | None) –
num_inference_steps (int) –
output_type (str) –
list[numpy.ndarray]
Val step.
data (Union[tuple, dict, list]) –
list
Test step.
data (Union[tuple, dict, list]) –
list
Calculate loss.
model_pred (torch.Tensor) –
noise (torch.Tensor) –
latents (torch.Tensor) –
timesteps (torch.Tensor) –
weight (torch.Tensor | None) –
dict[str, torch.Tensor]
Preprocess model input.
latents (torch.Tensor) –
noise (torch.Tensor) –
timesteps (torch.Tensor) –
torch.Tensor
Forward function.
Args:¶Returns:¶inputs (dict): The input dict. data_samples (Optional[list], optional): The data samples.
Defaults to None.
mode (str, optional): The mode. Defaults to “loss”.
dict: The loss dict.
inputs (dict) –
data_samples (Optional[list]) –
mode (str) –
dict
tokenizer (dict) –
scheduler (dict) –
text_encoder (dict) –
unet (dict) –
model (str) –
loss (dict | None) –
unet_lora_config (dict | None) –
text_encoder_lora_config (dict | None) –
prior_loss_weight (float) –
tokenizer_max_length (int) –
prediction_type (str | None) –
data_preprocessor (dict | torch.nn.Module | None) –
noise_generator (dict | None) –
timesteps_generator (dict | None) –
input_perturbation_gamma (float) –
finetune_text_encoder (bool) –
gradient_checkpointing (bool) –
enable_xformers (bool) –
Bases: diffengine.models.editors.stable_diffusion_xl.StableDiffusionXL
Distill Stable Diffusion XL.
Args:¶
- model_type (str): The type of model to use. Choice from sd_tiny,
sd_small.
- unet_lora_config (dict, optional): The LoRA config dict for Unet.
example. dict(type=”LoRA”, r=4). type is chosen from LoRA, LoHa, LoKr. Other config are same as the config of PEFT. https://github.com/huggingface/peft Defaults to None.
- text_encoder_lora_config (dict, optional): The LoRA config dict for
Text Encoder. example. dict(type=”LoRA”, r=4). type is chosen from LoRA, LoHa, LoKr. Other config are same as the config of PEFT. https://github.com/huggingface/peft Defaults to None.
- finetune_text_encoder (bool, optional): Whether to fine-tune text
encoder. This should be False when training ControlNet. Defaults to False.
Set LORA for model.
None
Prepare model for training.
Disable gradient for some models.
None
None
None
Set xformers for model.
None
Forward function.
Args:¶Returns:¶inputs (dict): The input dict. data_samples (Optional[list], optional): The data samples.
Defaults to None.
mode (str, optional): The mode. Defaults to “loss”.
dict: The loss dict.
inputs (dict) –
data_samples (Optional[list]) –
mode (str) –
dict
model_type (str) –
unet_lora_config (dict | None) –
text_encoder_lora_config (dict | None) –
finetune_text_encoder (bool) –
Bases: diffengine.models.editors.stable_diffusion_xl.StableDiffusionXL
Stable Diffusion XL Erasing Concepts from Diffusion Models.
Args:¶height (int): Image height. Defaults to 1024. width (int): Image width. Defaults to 1024. negative_guidance (float): Negative guidance for loss. Defaults to 1.0. train_method (str): Training method. Choice from full, xattn,
noxattn, selfattn. Defaults to full
Prepare model for training.
Disable gradient for some models.
None
None
Set xformers for model.
None
Convert the model into training mode.
mode (bool) –
None
Preprocess model input.
latents (torch.Tensor) –
noise (torch.Tensor) –
timesteps (torch.Tensor) –
torch.Tensor
Forward function.
Args:¶Returns:¶inputs (dict): The input dict. data_samples (Optional[list], optional): The data samples.
Defaults to None.
mode (str, optional): The mode. Defaults to “loss”.
dict: The loss dict.
inputs (dict) –
data_samples (Optional[list]) –
mode (str) –
dict
finetune_text_encoder (bool) –
pre_compute_text_embeddings (bool) –
height (int) –
width (int) –
negative_guidance (float) –
train_method (str) –
prediction_type (str | None) –
data_preprocessor (dict | torch.nn.Module | None) –
Bases: mmengine.model.base_model.data_preprocessor.BaseDataPreprocessor
ESDXLDataPreprocessor.
non_blocking (Optional[bool]) –
Preprocesses the data into the model input format.
After the data pre-processing of cast_data()
, forward
will stack the input tensor list to a batch tensor at the first dimension.
Returns:¶data (dict): Data returned by dataloader training (bool): Whether to enable training time augmentation.
dict or list: Data in the same format as the model input.
data (dict) –
training (bool) –
Union[dict, list]
Bases: diffengine.models.editors.stable_diffusion_xl.StableDiffusionXL
Stable Diffusion XL Instruct Pix2Pix.
Args:¶
- zeros_image_embeddings_prob (float): The probabilities to
generate zeros image embeddings. Defaults to 0.1.
- unet_lora_config (dict, optional): The LoRA config dict for Unet.
example. dict(type=”LoRA”, r=4). type is chosen from LoRA, LoHa, LoKr. Other config are same as the config of PEFT. https://github.com/huggingface/peft Defaults to None.
- text_encoder_lora_config (dict, optional): The LoRA config dict for
Text Encoder. example. dict(type=”LoRA”, r=4). type is chosen from LoRA, LoHa, LoKr. Other config are same as the config of PEFT. https://github.com/huggingface/peft Defaults to None.
- finetune_text_encoder (bool, optional): Whether to fine-tune text
encoder. This should be False when training ControlNet. Defaults to False.
- data_preprocessor (dict, optional): The pre-process config of
Set LORA for model.
None
Prepare model for training.
Disable gradient for some models.
None
Inference function.
Args:¶
- prompt (List[str]):
The prompt or prompts to guide the image generation.
- condition_image (List[Union[str, Image.Image]]):
The condition image for ControlNet.
- negative_prompt (Optional[str]):
The prompt or prompts to guide the image generation. Defaults to None.
- height (int, optional):
The height in pixels of the generated image. Defaults to None.
- width (int, optional):
The width in pixels of the generated image. Defaults to None.
- num_inference_steps (int): Number of inference steps.
Defaults to 50.
- output_type (str): The output format of the generate image.
Choose between ‘pil’ and ‘latent’. Defaults to ‘pil’.
**kwargs: Other arguments.
prompt (list[str]) –
condition_image (list[str | PIL.Image.Image]) –
negative_prompt (str | None) –
height (int | None) –
width (int | None) –
num_inference_steps (int) –
output_type (str) –
list[numpy.ndarray]
Forward function.
Args:¶Returns:¶inputs (dict): The input dict. data_samples (Optional[list], optional): The data samples.
Defaults to None.
mode (str, optional): The mode. Defaults to “loss”.
dict: The loss dict.
inputs (dict) –
data_samples (Optional[list]) –
mode (str) –
dict
zeros_image_embeddings_prob (float) –
unet_lora_config (dict | None) –
text_encoder_lora_config (dict | None) –
finetune_text_encoder (bool) –
data_preprocessor (dict | torch.nn.Module | None) –
Bases: diffengine.models.editors.stable_diffusion_xl.StableDiffusionXL
Stable Diffusion XL IP-Adapter.
Args:¶image_encoder (dict): The image encoder config. image_projection (dict): The image projection config. feature_extractor (dict): The feature extractor config. pretrained_adapter (str, optional): Path to pretrained IP-Adapter.
Defaults to None.
- pretrained_adapter_subfolder (str, optional): Sub folder of pretrained
IP-Adapter. Defaults to ‘’.
- pretrained_adapter_weights_name (str, optional): Weights name of
pretrained IP-Adapter. Defaults to ‘’.
- unet_lora_config (dict, optional): The LoRA config dict for Unet.
example. dict(type=”LoRA”, r=4). type is chosen from LoRA, LoHa, LoKr. Other config are same as the config of PEFT. https://github.com/huggingface/peft Defaults to None.
- text_encoder_lora_config (dict, optional): The LoRA config dict for
Text Encoder. example. dict(type=”LoRA”, r=4). type is chosen from LoRA, LoHa, LoKr. Other config are same as the config of PEFT. https://github.com/huggingface/peft Defaults to None.
- finetune_text_encoder (bool, optional): Whether to fine-tune text
encoder. This should be False when training ControlNet. Defaults to False.
- zeros_image_embeddings_prob (float): The probabilities to
generate zeros image embeddings. Defaults to 0.1.
- data_preprocessor (dict, optional): The pre-process config of
- hidden_states_idx (int): Index of the hidden states to be used.
Defaults to -2.
Set LORA for model.
None
Prepare model for training.
Disable gradient for some models.
None
Set IP-Adapter for model.
None
Inference function.
Args:¶
- prompt (List[str]):
The prompt or prompts to guide the image generation.
- example_image (List[Union[str, Image.Image]]):
The image prompt or prompts to guide the image generation.
- negative_prompt (Optional[str]):
The prompt or prompts to guide the image generation. Defaults to None.
- height (int, optional):
The height in pixels of the generated image. Defaults to None.
- width (int, optional):
The width in pixels of the generated image. Defaults to None.
- num_inference_steps (int): Number of inference steps.
Defaults to 50.
- output_type (str): The output format of the generate image.
Choose between ‘pil’ and ‘latent’. Defaults to ‘pil’.
**kwargs: Other arguments.
prompt (list[str]) –
example_image (list[str | PIL.Image.Image]) –
negative_prompt (str | None) –
height (int | None) –
width (int | None) –
num_inference_steps (int) –
output_type (str) –
list[numpy.ndarray]
Forward function.
Args:¶Returns:¶inputs (dict): The input dict. data_samples (Optional[list], optional): The data samples.
Defaults to None.
mode (str, optional): The mode. Defaults to “loss”.
dict: The loss dict.
inputs (dict) –
data_samples (Optional[list]) –
mode (str) –
dict
image_encoder (dict) –
image_projection (dict) –
feature_extractor (dict) –
pretrained_adapter (str | None) –
pretrained_adapter_subfolder (str) –
pretrained_adapter_weights_name (str) –
unet_lora_config (dict | None) –
text_encoder_lora_config (dict | None) –
finetune_text_encoder (bool) –
zeros_image_embeddings_prob (float) –
data_preprocessor (dict | torch.nn.Module | None) –
hidden_states_idx (int) –
Bases: IPAdapterXL
Stable Diffusion XL IP-Adapter Plus.
image_encoder (dict) –
image_projection (dict) –
feature_extractor (dict) –
pretrained_adapter (str | None) –
pretrained_adapter_subfolder (str) –
pretrained_adapter_weights_name (str) –
unet_lora_config (dict | None) –
text_encoder_lora_config (dict | None) –
finetune_text_encoder (bool) –
zeros_image_embeddings_prob (float) –
data_preprocessor (dict | torch.nn.Module | None) –
hidden_states_idx (int) –
Prepare model for training.
Disable gradient for some models.
None
Forward function.
Args:¶Returns:¶inputs (dict): The input dict. data_samples (Optional[list], optional): The data samples.
Defaults to None.
mode (str, optional): The mode. Defaults to “loss”.
dict: The loss dict.
inputs (dict) –
data_samples (Optional[list]) –
mode (str) –
dict
Bases: mmengine.model.base_model.data_preprocessor.BaseDataPreprocessor
IPAdapterXLDataPreprocessor.
non_blocking (Optional[bool]) –
Preprocesses the data into the model input format.
After the data pre-processing of cast_data()
, forward
will stack the input tensor list to a batch tensor at the first dimension.
Returns:¶data (dict): Data returned by dataloader training (bool): Whether to enable training time augmentation.
dict or list: Data in the same format as the model input.
data (dict) –
training (bool) –
dict | list
Bases: diffengine.models.editors.ip_adapter.ip_adapter_xl.IPAdapterXLPlus
Stable Diffusion XL IP-Adapter Plus.
image_encoder (dict) –
image_projection (dict) –
feature_extractor (dict) –
pretrained_adapter (str | None) –
pretrained_adapter_subfolder (str) –
pretrained_adapter_weights_name (str) –
unet_lora_config (dict | None) –
text_encoder_lora_config (dict | None) –
finetune_text_encoder (bool) –
zeros_image_embeddings_prob (float) –
data_preprocessor (dict | torch.nn.Module | None) –
hidden_states_idx (int) –
Prepare model for training.
Disable gradient for some models.
None
Inference function.
Args:¶
- prompt (List[str]):
The prompt or prompts to guide the image generation.
- example_image (List[Union[str, Image.Image]]):
The image prompt or prompts to guide the image generation.
- negative_prompt (Optional[str]):
The prompt or prompts to guide the image generation. Defaults to None.
- height (int, optional):
The height in pixels of the generated image. Defaults to None.
- width (int, optional):
The width in pixels of the generated image. Defaults to None.
- num_inference_steps (int): Number of inference steps.
Defaults to 50.
- output_type (str): The output format of the generate image.
Choose between ‘pil’ and ‘latent’. Defaults to ‘pil’.
**kwargs: Other arguments.
prompt (list[str]) –
example_image (list[str | PIL.Image.Image]) –
negative_prompt (str | None) –
height (int | None) –
width (int | None) –
num_inference_steps (int) –
output_type (str) –
list[numpy.ndarray]
Forward function.
Args:¶Returns:¶inputs (dict): The input dict. data_samples (Optional[list], optional): The data samples.
Defaults to None.
mode (str, optional): The mode. Defaults to “loss”.
dict: The loss dict.
inputs (dict) –
data_samples (list | None) –
mode (str) –
dict
Bases: mmengine.model.BaseModel
KandinskyV22 Prior.
Args:¶tokenizer (dict): Config of tokenizer. scheduler (dict): Config of scheduler. text_encoder (dict): Config of text encoder. image_encoder (dict): Config of image encoder. prior (dict): Config of prior. decoder_model (str): pretrained model name of decoder.
Defaults to “kandinsky-community/kandinsky-2-2-decoder”.
- prior_model (str): pretrained model name of prior.
Defaults to “kandinsky-community/kandinsky-2-2-prior”.
- loss (dict): Config of loss. Defaults to
dict(type='L2Loss', loss_weight=1.0)
.- prior_lora_config (dict, optional): The LoRA config dict for Prior.
example. dict(type=”LoRA”, r=4). type is chosen from LoRA, LoHa, LoKr. Other config are same as the config of PEFT. https://github.com/huggingface/peft Defaults to None.
- prior_loss_weight (float): The weight of prior preservation loss.
It works when training dreambooth with class images.
- data_preprocessor (dict, optional): The pre-process config of
- noise_generator (dict, optional): The noise generator config.
Defaults to
dict(type='WhiteNoise')
.- timesteps_generator (dict, optional): The timesteps generator config.
Defaults to
dict(type='TimeSteps')
.- input_perturbation_gamma (float): The gamma of input perturbation.
The recommended value is 0.1 for Input Perturbation. Defaults to 0.0.
- gradient_checkpointing (bool): Whether or not to use gradient
checkpointing to save memory at the expense of slower backward pass. Defaults to False.
- enable_xformers (bool): Whether or not to enable memory efficient
attention. Defaults to False.
Get device information.
torch.device
device.
Set LORA for model.
None
Prepare model for training.
Disable gradient for some models.
None
Set xformers for model.
None
Inference function.
Args:¶
- prompt (List[str]):
The prompt or prompts to guide the image generation.
- negative_prompt (Optional[str]):
The prompt or prompts to guide the image generation. Defaults to None.
- height (int, optional):
The height in pixels of the generated image. Defaults to None.
- width (int, optional):
The width in pixels of the generated image. Defaults to None.
- num_inference_steps (int): Number of inference steps.
Defaults to 50.
- output_type (str): The output format of the generate image.
Choose between ‘pil’ and ‘latent’. Defaults to ‘pil’.
**kwargs: Other arguments.
prompt (list[str]) –
negative_prompt (str | None) –
height (int | None) –
width (int | None) –
num_inference_steps (int) –
output_type (str) –
list[numpy.ndarray]
Val step.
data (Union[tuple, dict, list]) –
list
Test step.
data (Union[tuple, dict, list]) –
list
Calculate loss.
model_pred (torch.Tensor) –
noise (torch.Tensor) –
latents (torch.Tensor) –
timesteps (torch.Tensor) –
weight (torch.Tensor | None) –
dict[str, torch.Tensor]
Preprocess model input.
latents (torch.Tensor) –
noise (torch.Tensor) –
timesteps (torch.Tensor) –
torch.Tensor
Forward function.
Args:¶Returns:¶inputs (dict): The input dict. data_samples (Optional[list], optional): The data samples.
Defaults to None.
mode (str, optional): The mode. Defaults to “loss”.
dict: The loss dict.
inputs (dict) –
data_samples (Optional[list]) –
mode (str) –
dict
tokenizer (dict) –
scheduler (dict) –
text_encoder (dict) –
image_encoder (dict) –
prior (dict) –
decoder_model (str) –
prior_model (str) –
loss (dict | None) –
prior_lora_config (dict | None) –
prior_loss_weight (float) –
data_preprocessor (dict | torch.nn.Module | None) –
noise_generator (dict | None) –
timesteps_generator (dict | None) –
input_perturbation_gamma (float) –
gradient_checkpointing (bool) –
enable_xformers (bool) –
Bases: mmengine.model.BaseModel
KandinskyV22 Decoder.
Args:¶scheduler (dict): Config of scheduler. image_encoder (dict): Config of image encoder. vae (dict): Config of vae. unet (dict): Config of unet. decoder_model (str): pretrained model name of decoder.
Defaults to “kandinsky-community/kandinsky-2-2-decoder”.
- prior_model (str): pretrained model name of prior.
Defaults to “kandinsky-community/kandinsky-2-2-prior”.
- loss (dict): Config of loss. Defaults to
dict(type='L2Loss', loss_weight=1.0)
.- unet_lora_config (dict, optional): The LoRA config dict for Unet.
example. dict(type=”LoRA”, r=4). type is chosen from LoRA, LoHa, LoKr. Other config are same as the config of PEFT. https://github.com/huggingface/peft Defaults to None.
- prior_loss_weight (float): The weight of prior preservation loss.
It works when training dreambooth with class images.
- prediction_type (str): The prediction_type that shall be used for
training. Choose between ‘epsilon’ or ‘v_prediction’ or leave None. If left to None the default prediction type of the scheduler will be used. Defaults to None.
- data_preprocessor (dict, optional): The pre-process config of
- noise_generator (dict, optional): The noise generator config.
Defaults to
dict(type='WhiteNoise')
.- timesteps_generator (dict, optional): The timesteps generator config.
Defaults to
dict(type='TimeSteps')
.- input_perturbation_gamma (float): The gamma of input perturbation.
The recommended value is 0.1 for Input Perturbation. Defaults to 0.0.
vae_batch_size (int): The batch size of vae. Defaults to 8. gradient_checkpointing (bool): Whether or not to use gradient
checkpointing to save memory at the expense of slower backward pass. Defaults to False.
- enable_xformers (bool): Whether or not to enable memory efficient
attention. Defaults to False.
Get device information.
torch.device
device.
Set LORA for model.
None
Prepare model for training.
Disable gradient for some models.
None
Set xformers for model.
None
Inference function.
Args:¶
- prompt (List[str]):
The prompt or prompts to guide the image generation.
- negative_prompt (Optional[str]):
The prompt or prompts to guide the image generation. Defaults to None.
- height (int, optional):
The height in pixels of the generated image. Defaults to None.
- width (int, optional):
The width in pixels of the generated image. Defaults to None.
- num_inference_steps (int): Number of inference steps.
Defaults to 50.
- output_type (str): The output format of the generate image.
Choose between ‘pil’ and ‘latent’. Defaults to ‘pil’.
**kwargs: Other arguments.
prompt (list[str]) –
negative_prompt (str | None) –
height (int | None) –
width (int | None) –
num_inference_steps (int) –
output_type (str) –
list[numpy.ndarray]
Val step.
data (Union[tuple, dict, list]) –
list
Test step.
data (Union[tuple, dict, list]) –
list
Calculate loss.
model_pred (torch.Tensor) –
noise (torch.Tensor) –
latents (torch.Tensor) –
timesteps (torch.Tensor) –
weight (torch.Tensor | None) –
dict[str, torch.Tensor]
Preprocess model input.
latents (torch.Tensor) –
noise (torch.Tensor) –
timesteps (torch.Tensor) –
torch.Tensor
Forward vae.
img (torch.Tensor) –
num_batches (int) –
torch.Tensor
Forward function.
Args:¶Returns:¶inputs (dict): The input dict. data_samples (Optional[list], optional): The data samples.
Defaults to None.
mode (str, optional): The mode. Defaults to “loss”.
dict: The loss dict.
inputs (dict) –
data_samples (Optional[list]) –
mode (str) –
dict
scheduler (dict) –
image_encoder (dict) –
vae (dict) –
unet (dict) –
decoder_model (str) –
prior_model (str) –
loss (dict | None) –
unet_lora_config (dict | None) –
prior_loss_weight (float) –
prediction_type (str | None) –
data_preprocessor (dict | torch.nn.Module | None) –
noise_generator (dict | None) –
timesteps_generator (dict | None) –
input_perturbation_gamma (float) –
vae_batch_size (int) –
gradient_checkpointing (bool) –
enable_xformers (bool) –
Bases: mmengine.model.base_model.data_preprocessor.BaseDataPreprocessor
KandinskyV22DecoderDataPreprocessor.
non_blocking (Optional[bool]) –
Preprocesses the data into the model input format.
After the data pre-processing of cast_data()
, forward
will stack the input tensor list to a batch tensor at the first dimension.
Returns:¶data (dict): Data returned by dataloader training (bool): Whether to enable training time augmentation.
dict or list: Data in the same format as the model input.
data (dict) –
training (bool) –
dict | list
Bases: mmengine.model.BaseModel
KandinskyV3.
Args:¶tokenizer (dict): Config of tokenizer. scheduler (dict): Config of scheduler. text_encoder (dict): Config of text encoder. vae (dict): Config of vae. unet (dict): Config of unet. model (str): pretrained model name.
Defaults to “kandinsky-community/kandinsky-3”.
- loss (dict): Config of loss. Defaults to
dict(type='L2Loss', loss_weight=1.0)
.- unet_lora_config (dict, optional): The LoRA config dict for Unet.
example. dict(type=”LoRA”, r=4). type is chosen from LoRA, LoHa, LoKr. Other config are same as the config of PEFT. https://github.com/huggingface/peft Defaults to None.
- prior_loss_weight (float): The weight of prior preservation loss.
It works when training dreambooth with class images.
- tokenizer_max_length (int): The max length of tokenizer.
Defaults to 128.
- prediction_type (str): The prediction_type that shall be used for
training. Choose between ‘epsilon’ or ‘v_prediction’ or leave None. If left to None the default prediction type of the scheduler will be used. Defaults to None.
- data_preprocessor (dict, optional): The pre-process config of
- noise_generator (dict, optional): The noise generator config.
Defaults to
dict(type='WhiteNoise')
.- timesteps_generator (dict, optional): The timesteps generator config.
Defaults to
dict(type='TimeSteps')
.- input_perturbation_gamma (float): The gamma of input perturbation.
The recommended value is 0.1 for Input Perturbation. Defaults to 0.0.
vae_batch_size (int): The batch size of vae. Defaults to 8. gradient_checkpointing (bool): Whether or not to use gradient
checkpointing to save memory at the expense of slower backward pass. Defaults to False.
- enable_xformers (bool): Whether or not to enable memory efficient
attention. Defaults to False.
Get device information.
torch.device
device.
Set LORA for model.
None
Prepare model for training.
Disable gradient for some models.
None
Set xformers for model.
None
Inference function.
Args:¶
- prompt (List[str]):
The prompt or prompts to guide the image generation.
- negative_prompt (Optional[str]):
The prompt or prompts to guide the image generation. Defaults to None.
- height (int, optional):
The height in pixels of the generated image. Defaults to None.
- width (int, optional):
The width in pixels of the generated image. Defaults to None.
- num_inference_steps (int): Number of inference steps.
Defaults to 50.
- output_type (str): The output format of the generate image.
Choose between ‘pil’ and ‘latent’. Defaults to ‘pil’.
**kwargs: Other arguments.
prompt (list[str]) –
negative_prompt (str | None) –
height (int | None) –
width (int | None) –
num_inference_steps (int) –
output_type (str) –
list[numpy.ndarray]
Val step.
data (Union[tuple, dict, list]) –
list
Test step.
data (Union[tuple, dict, list]) –
list
Calculate loss.
model_pred (torch.Tensor) –
noise (torch.Tensor) –
latents (torch.Tensor) –
timesteps (torch.Tensor) –
weight (torch.Tensor | None) –
dict[str, torch.Tensor]
Preprocess model input.
latents (torch.Tensor) –
noise (torch.Tensor) –
timesteps (torch.Tensor) –
torch.Tensor
Forward vae.
img (torch.Tensor) –
num_batches (int) –
torch.Tensor
Forward function.
Args:¶Returns:¶inputs (dict): The input dict. data_samples (Optional[list], optional): The data samples.
Defaults to None.
mode (str, optional): The mode. Defaults to “loss”.
dict: The loss dict.
inputs (dict) –
data_samples (Optional[list]) –
mode (str) –
dict
tokenizer (dict) –
scheduler (dict) –
text_encoder (dict) –
vae (dict) –
unet (dict) –
model (str) –
loss (dict | None) –
unet_lora_config (dict | None) –
prior_loss_weight (float) –
tokenizer_max_length (int) –
prediction_type (str | None) –
data_preprocessor (dict | torch.nn.Module | None) –
noise_generator (dict | None) –
timesteps_generator (dict | None) –
input_perturbation_gamma (float) –
vae_batch_size (int) –
gradient_checkpointing (bool) –
enable_xformers (bool) –
Bases: diffengine.models.editors.stable_diffusion_xl.StableDiffusionXL
Stable Diffusion XL Latent Consistency Models.
Args:¶
- timesteps_generator (dict, optional): The timesteps generator config.
Defaults to
dict(type='DDIMTimeSteps')
.num_ddim_timesteps (int): Number of DDIM timesteps. Defaults to 50. w_min (float): Minimum guidance scale. Defaults to 3.0. w_max (float): Maximum guidance scale. Defaults to 15.0. ema_type (str): The type of EMA.
Defaults to ‘ExponentialMovingAverage’.
ema_momentum (float): The EMA momentum. Defaults to 0.05.
Prepare model for training.
Disable gradient for some models.
None
Set xformers for model.
None
Inference function.
Args:¶
- prompt (List[str]):
The prompt or prompts to guide the image generation.
- negative_prompt (Optional[str]):
The prompt or prompts to guide the image generation. Defaults to None.
- height (int, optional):
The height in pixels of the generated image. Defaults to None.
- width (int, optional):
The width in pixels of the generated image. Defaults to None.
- num_inference_steps (int): Number of inference steps.
Defaults to 50.
guidance_scale (float): The guidance scale. Defaults to 1.0. output_type (str): The output format of the generate image.
Choose between ‘pil’ and ‘latent’. Defaults to ‘pil’.
**kwargs: Other arguments.
prompt (list[str]) –
height (int | None) –
width (int | None) –
num_inference_steps (int) –
guidance_scale (float) –
output_type (str) –
list[numpy.ndarray]
Calculate loss.
model_pred (torch.Tensor) –
gt (torch.Tensor) –
timesteps (torch.Tensor) –
weight (torch.Tensor | None) –
dict[str, torch.Tensor]
Forward function.
Args:¶Returns:¶inputs (dict): The input dict. data_samples (Optional[list], optional): The data samples.
Defaults to None.
mode (str, optional): The mode. Defaults to “loss”.
dict: The loss dict.
inputs (dict) –
data_samples (Optional[list]) –
mode (str) –
dict
Predict the origin of the model output.
Args:¶model_output (torch.Tensor): The model output. timesteps (torch.Tensor): The timesteps. sample (torch.Tensor): The sample.
model_output (torch.Tensor) –
timesteps (torch.Tensor) –
sample (torch.Tensor) –
torch.Tensor
timesteps_generator (dict | None) –
num_ddim_timesteps (int) –
w_min (float) –
w_max (float) –
ema_type (str) –
ema_momentum (float) –
Bases: mmengine.model.BaseModel
PixArt Alpha.
Args:¶tokenizer (dict): Config of tokenizer. scheduler (dict): Config of scheduler. text_encoder (dict): Config of text encoder. vae (dict): Config of vae. transformer (dict): Config of transformer. model (str): pretrained model name of stable diffusion.
Defaults to ‘PixArt-alpha/PixArt-XL-2-1024-MS’.
- loss (dict): Config of loss. Defaults to
dict(type='L2Loss', loss_weight=1.0)
.- transformer_lora_config (dict, optional): The LoRA config dict for
Transformer. example. dict(type=”LoRA”, r=4). type is chosen from LoRA, LoHa, LoKr. Other config are same as the config of PEFT. https://github.com/huggingface/peft Defaults to None.
- text_encoder_lora_config (dict, optional): The LoRA config dict for
Text Encoder. example. dict(type=”LoRA”, r=4). type is chosen from LoRA, LoHa, LoKr. Other config are same as the config of PEFT. https://github.com/huggingface/peft Defaults to None.
- prior_loss_weight (float): The weight of prior preservation loss.
It works when training dreambooth with class images.
- tokenizer_max_length (int): The max length of tokenizer.
Defaults to 120.
- prediction_type (str): The prediction_type that shall be used for
training. Choose between ‘epsilon’ or ‘v_prediction’ or leave None. If left to None the default prediction type of the scheduler will be used. Defaults to None.
- data_preprocessor (dict, optional): The pre-process config of
- noise_generator (dict, optional): The noise generator config.
Defaults to
dict(type='WhiteNoise')
.- timesteps_generator (dict, optional): The timesteps generator config.
Defaults to
dict(type='TimeSteps')
.- input_perturbation_gamma (float): The gamma of input perturbation.
The recommended value is 0.1 for Input Perturbation. Defaults to 0.0.
vae_batch_size (int): The batch size of vae. Defaults to 8. finetune_text_encoder (bool, optional): Whether to fine-tune text
encoder. Defaults to False.
- gradient_checkpointing (bool): Whether or not to use gradient
checkpointing to save memory at the expense of slower backward pass. Defaults to False.
- enable_xformers (bool): Whether or not to enable memory efficient
attention. Defaults to False.
Get device information.
torch.device
device.
Set LORA for model.
None
Prepare model for training.
Disable gradient for some models.
None
Set xformers for model.
None
Inference function.
Args:¶
- prompt (List[str]):
The prompt or prompts to guide the image generation.
- negative_prompt (Optional[str]):
The prompt or prompts to guide the image generation. Defaults to None.
- height (int, optional):
The height in pixels of the generated image. Defaults to None.
- width (int, optional):
The width in pixels of the generated image. Defaults to None.
- num_inference_steps (int): Number of inference steps.
Defaults to 50.
- output_type (str): The output format of the generate image.
Choose between ‘pil’ and ‘latent’. Defaults to ‘pil’.
**kwargs: Other arguments.
prompt (list[str]) –
negative_prompt (str | None) –
height (int | None) –
width (int | None) –
num_inference_steps (int) –
output_type (str) –
list[numpy.ndarray]
Val step.
data (Union[tuple, dict, list]) –
list
Test step.
data (Union[tuple, dict, list]) –
list
Calculate loss.
model_pred (torch.Tensor) –
noise (torch.Tensor) –
latents (torch.Tensor) –
timesteps (torch.Tensor) –
weight (torch.Tensor | None) –
dict[str, torch.Tensor]
Preprocess model input.
latents (torch.Tensor) –
noise (torch.Tensor) –
timesteps (torch.Tensor) –
torch.Tensor
Forward vae.
img (torch.Tensor) –
num_batches (int) –
torch.Tensor
Forward function.
Args:¶Returns:¶inputs (dict): The input dict. data_samples (Optional[list], optional): The data samples.
Defaults to None.
mode (str, optional): The mode. Defaults to “loss”.
dict: The loss dict.
inputs (dict) –
data_samples (Optional[list]) –
mode (str) –
dict
tokenizer (dict) –
scheduler (dict) –
text_encoder (dict) –
vae (dict) –
transformer (dict) –
model (str) –
loss (dict | None) –
transformer_lora_config (dict | None) –
text_encoder_lora_config (dict | None) –
prior_loss_weight (float) –
tokenizer_max_length (int) –
prediction_type (str | None) –
data_preprocessor (dict | torch.nn.Module | None) –
noise_generator (dict | None) –
timesteps_generator (dict | None) –
input_perturbation_gamma (float) –
vae_batch_size (int) –
finetune_text_encoder (bool) –
gradient_checkpointing (bool) –
enable_xformers (bool) –
Bases: mmengine.model.base_model.data_preprocessor.BaseDataPreprocessor
PixArtAlphaDataPreprocessor.
non_blocking (Optional[bool]) –
Preprocesses the data into the model input format.
After the data pre-processing of cast_data()
, forward
will stack the input tensor list to a batch tensor at the first dimension.
Returns:¶data (dict): Data returned by dataloader training (bool): Whether to enable training time augmentation.
dict or list: Data in the same format as the model input.
data (dict) –
training (bool) –
dict | list
Bases: diffengine.models.editors.stable_diffusion_xl.StableDiffusionXL
SSD1B.
Refer to official implementation: https://github.com/segmind/SSD-1B/blob/main/distill_sdxl.py
Args:¶tokenizer_one (dict): Config of tokenizer one. tokenizer_two (dict): Config of tokenizer two. scheduler (dict): Config of scheduler. text_encoder_one (dict): Config of text encoder one. text_encoder_two (dict): Config of text encoder two. vae (dict): Config of vae. teacher_unet (dict): Config of teacher unet. student_unet (dict): Config of student unet. model (str): pretrained model name of stable diffusion xl.
Defaults to ‘stabilityai/stable-diffusion-xl-base-1.0’.
- vae_model (str, optional): Path to pretrained VAE model with better
numerical stability. More details: https://github.com/huggingface/diffusers/pull/4038. Defaults to None.
- loss (dict): Config of loss. Defaults to
dict(type='L2Loss', loss_weight=1.0)
.- unet_lora_config (dict, optional): The LoRA config dict for Unet.
example. dict(type=”LoRA”, r=4). type is chosen from LoRA, LoHa, LoKr. Other config are same as the config of PEFT. https://github.com/huggingface/peft Defaults to None.
- text_encoder_lora_config (dict, optional): The LoRA config dict for
Text Encoder. example. dict(type=”LoRA”, r=4). type is chosen from LoRA, LoHa, LoKr. Other config are same as the config of PEFT. https://github.com/huggingface/peft Defaults to None.
- prior_loss_weight (float): The weight of prior preservation loss.
It works when training dreambooth with class images.
- prediction_type (str): The prediction_type that shall be used for
training. Choose between ‘epsilon’ or ‘v_prediction’ or leave None. If left to None the default prediction type of the scheduler: noise_scheduler.config.prediciton_type is chosen. Defaults to None.
- data_preprocessor (dict, optional): The pre-process config of
- noise_generator (dict, optional): The noise generator config.
Defaults to
dict(type='WhiteNoise')
.- timesteps_generator (dict, optional): The timesteps generator config.
Defaults to
dict(type='TimeSteps')
.- input_perturbation_gamma (float): The gamma of input perturbation.
The recommended value is 0.1 for Input Perturbation. Defaults to 0.0.
vae_batch_size (int): The batch size of vae. Defaults to 8. finetune_text_encoder (bool, optional): Whether to fine-tune text
encoder. Defaults to False.
- gradient_checkpointing (bool): Whether or not to use gradient
checkpointing to save memory at the expense of slower backward pass. Defaults to False.
- pre_compute_text_embeddings(bool): Whether or not to pre-compute text
embeddings to save memory. Defaults to False.
- enable_xformers (bool): Whether or not to enable memory efficient
attention. Defaults to False.
- student_weight_from_teacher (bool): Whether or not to initialize
student model with teacher model. Defaults to False.
Set LORA for model.
None
Prepare model for training.
Disable gradient for some models.
None
None
Set xformers for model.
None
Forward vae.
img (torch.Tensor) –
num_batches (int) –
torch.Tensor
Forward function.
Args:¶Returns:¶inputs (dict): The input dict. data_samples (Optional[list], optional): The data samples.
Defaults to None.
mode (str, optional): The mode. Defaults to “loss”.
dict: The loss dict.
inputs (dict) –
data_samples (Optional[list]) –
mode (str) –
dict
tokenizer_one (dict) –
tokenizer_two (dict) –
scheduler (dict) –
text_encoder_one (dict) –
text_encoder_two (dict) –
vae (dict) –
teacher_unet (dict) –
student_unet (dict) –
model (str) –
loss (dict | None) –
unet_lora_config (dict | None) –
text_encoder_lora_config (dict | None) –
prior_loss_weight (float) –
prediction_type (str | None) –
data_preprocessor (dict | torch.nn.Module | None) –
noise_generator (dict | None) –
timesteps_generator (dict | None) –
input_perturbation_gamma (float) –
vae_batch_size (int) –
finetune_text_encoder (bool) –
gradient_checkpointing (bool) –
pre_compute_text_embeddings (bool) –
enable_xformers (bool) –
student_weight_from_teacher (bool) –
Bases: mmengine.model.BaseModel
Stable Diffusion.
Args:¶tokenizer (dict): Config of tokenizer. scheduler (dict): Config of scheduler. text_encoder (dict): Config of text encoder. vae (dict): Config of vae. unet (dict): Config of unet. model (str): pretrained model name of stable diffusion.
Defaults to ‘runwayml/stable-diffusion-v1-5’.
- loss (dict): Config of loss. Defaults to
dict(type='L2Loss', loss_weight=1.0)
.- unet_lora_config (dict, optional): The LoRA config dict for Unet.
example. dict(type=”LoRA”, r=4). type is chosen from LoRA, LoHa, LoKr. Other config are same as the config of PEFT. https://github.com/huggingface/peft Defaults to None.
- text_encoder_lora_config (dict, optional): The LoRA config dict for
Text Encoder. example. dict(type=”LoRA”, r=4). type is chosen from LoRA, LoHa, LoKr. Other config are same as the config of PEFT. https://github.com/huggingface/peft Defaults to None.
- prior_loss_weight (float): The weight of prior preservation loss.
It works when training dreambooth with class images.
- prediction_type (str): The prediction_type that shall be used for
training. Choose between ‘epsilon’ or ‘v_prediction’ or leave None. If left to None the default prediction type of the scheduler will be used. Defaults to None.
- data_preprocessor (dict, optional): The pre-process config of
- noise_generator (dict, optional): The noise generator config.
Defaults to
dict(type='WhiteNoise')
.- timesteps_generator (dict, optional): The timesteps generator config.
Defaults to
dict(type='TimeSteps')
.- input_perturbation_gamma (float): The gamma of input perturbation.
The recommended value is 0.1 for Input Perturbation. Defaults to 0.0.
vae_batch_size (int): The batch size of vae. Defaults to 8. finetune_text_encoder (bool, optional): Whether to fine-tune text
encoder. Defaults to False.
- gradient_checkpointing (bool): Whether or not to use gradient
checkpointing to save memory at the expense of slower backward pass. Defaults to False.
- enable_xformers (bool): Whether or not to enable memory efficient
attention. Defaults to False.
Get device information.
torch.device
device.
Set LORA for model.
None
Prepare model for training.
Disable gradient for some models.
None
Set xformers for model.
None
Inference function.
Args:¶
- prompt (List[str]):
The prompt or prompts to guide the image generation.
- negative_prompt (Optional[str]):
The prompt or prompts to guide the image generation. Defaults to None.
- height (int, optional):
The height in pixels of the generated image. Defaults to None.
- width (int, optional):
The width in pixels of the generated image. Defaults to None.
- num_inference_steps (int): Number of inference steps.
Defaults to 50.
- output_type (str): The output format of the generate image.
Choose between ‘pil’ and ‘latent’. Defaults to ‘pil’.
**kwargs: Other arguments.
prompt (list[str]) –
negative_prompt (str | None) –
height (int | None) –
width (int | None) –
num_inference_steps (int) –
output_type (str) –
list[numpy.ndarray]
Val step.
data (Union[tuple, dict, list]) –
list
Test step.
data (Union[tuple, dict, list]) –
list
Calculate loss.
model_pred (torch.Tensor) –
noise (torch.Tensor) –
latents (torch.Tensor) –
timesteps (torch.Tensor) –
weight (torch.Tensor | None) –
dict[str, torch.Tensor]
Preprocess model input.
latents (torch.Tensor) –
noise (torch.Tensor) –
timesteps (torch.Tensor) –
torch.Tensor
Forward vae.
img (torch.Tensor) –
num_batches (int) –
torch.Tensor
Forward function.
Args:¶Returns:¶inputs (dict): The input dict. data_samples (Optional[list], optional): The data samples.
Defaults to None.
mode (str, optional): The mode. Defaults to “loss”.
dict: The loss dict.
inputs (dict) –
data_samples (Optional[list]) –
mode (str) –
dict
tokenizer (dict) –
scheduler (dict) –
text_encoder (dict) –
vae (dict) –
unet (dict) –
model (str) –
loss (dict | None) –
unet_lora_config (dict | None) –
text_encoder_lora_config (dict | None) –
prior_loss_weight (float) –
prediction_type (str | None) –
data_preprocessor (dict | torch.nn.Module | None) –
noise_generator (dict | None) –
timesteps_generator (dict | None) –
input_perturbation_gamma (float) –
vae_batch_size (int) –
finetune_text_encoder (bool) –
gradient_checkpointing (bool) –
enable_xformers (bool) –
Bases: mmengine.model.base_model.data_preprocessor.BaseDataPreprocessor
SDDataPreprocessor.
non_blocking (Optional[bool]) –
Preprocesses the data into the model input format.
After the data pre-processing of cast_data()
, forward
will stack the input tensor list to a batch tensor at the first dimension.
Returns:¶data (dict): Data returned by dataloader training (bool): Whether to enable training time augmentation.
dict or list: Data in the same format as the model input.
data (dict) –
training (bool) –
dict | list
Bases: diffengine.models.editors.stable_diffusion.StableDiffusion
Stable Diffusion ControlNet.
Args:¶
- controlnet_model (str, optional): Path to pretrained ControlNet model.
If None, use the default ControlNet model from Unet. Defaults to None.
- transformer_layers_per_block (List[int], optional):
The number of layers per block in the transformer. More details: https://huggingface.co/diffusers/controlnet-canny-sdxl-1.0-small. Defaults to None.
- unet_lora_config (dict, optional): The LoRA config dict for Unet.
example. dict(type=”LoRA”, r=4). type is chosen from LoRA, LoHa, LoKr. Other config are same as the config of PEFT. https://github.com/huggingface/peft Defaults to None.
- text_encoder_lora_config (dict, optional): The LoRA config dict for
Text Encoder. example. dict(type=”LoRA”, r=4). type is chosen from LoRA, LoHa, LoKr. Other config are same as the config of PEFT. https://github.com/huggingface/peft Defaults to None.
- finetune_text_encoder (bool, optional): Whether to fine-tune text
encoder. This should be False when training ControlNet. Defaults to False.
- data_preprocessor (dict, optional): The pre-process config of
Set LORA for model.
None
Prepare model for training.
Disable gradient for some models.
None
Set xformers for model.
None
Inference function.
Args:¶
- prompt (List[str]):
The prompt or prompts to guide the image generation.
- condition_image (List[Union[str, Image.Image]]):
The condition image for ControlNet.
- negative_prompt (Optional[str]):
The prompt or prompts to guide the image generation. Defaults to None.
- height (int, optional):
The height in pixels of the generated image. Defaults to None.
- width (int, optional):
The width in pixels of the generated image. Defaults to None.
- num_inference_steps (int): Number of inference steps.
Defaults to 50.
- output_type (str): The output format of the generate image.
Choose between ‘pil’ and ‘latent’. Defaults to ‘pil’.
**kwargs: Other arguments.
prompt (list[str]) –
condition_image (list[str | PIL.Image.Image]) –
negative_prompt (str | None) –
height (int | None) –
width (int | None) –
num_inference_steps (int) –
output_type (str) –
list[numpy.ndarray]
Forward function for torch.compile.
noisy_latents (torch.Tensor) –
timesteps (torch.Tensor) –
encoder_hidden_states (torch.Tensor) –
inputs (dict) –
torch.Tensor
Forward function.
Args:¶Returns:¶inputs (dict): The input dict. data_samples (Optional[list], optional): The data samples.
Defaults to None.
mode (str, optional): The mode. Defaults to “loss”.
dict: The loss dict.
inputs (dict) –
data_samples (Optional[list]) –
mode (str) –
dict
controlnet_model (str | None) –
transformer_layers_per_block (list[int] | None) –
unet_lora_config (dict | None) –
text_encoder_lora_config (dict | None) –
finetune_text_encoder (bool) –
data_preprocessor (dict | torch.nn.Module | None) –
Bases: mmengine.model.base_model.data_preprocessor.BaseDataPreprocessor
SDControlNetDataPreprocessor.
non_blocking (Optional[bool]) –
Preprocesses the data into the model input format.
After the data pre-processing of cast_data()
, forward
will stack the input tensor list to a batch tensor at the first dimension.
Returns:¶data (dict): Data returned by dataloader training (bool): Whether to enable training time augmentation.
dict or list: Data in the same format as the model input.
data (dict) –
training (bool) –
dict | list
Bases: mmengine.model.base_model.data_preprocessor.BaseDataPreprocessor
SDInpaintDataPreprocessor.
non_blocking (Optional[bool]) –
Preprocesses the data into the model input format.
After the data pre-processing of cast_data()
, forward
will stack the input tensor list to a batch tensor at the first dimension.
Returns:¶data (dict): Data returned by dataloader training (bool): Whether to enable training time augmentation.
dict or list: Data in the same format as the model input.
data (dict) –
training (bool) –
dict | list
Bases: diffengine.models.editors.stable_diffusion.StableDiffusion
Stable Diffusion Inpaint.
Args:¶
- model (str): pretrained model name of stable diffusion.
Defaults to ‘runwayml/stable-diffusion-v1-5’.
- data_preprocessor (dict, optional): The pre-process config of
Prepare model for training.
Disable gradient for some models.
None
Inference function.
Args:¶
- prompt (List[str]):
The prompt or prompts to guide the image generation.
- image (List[Union[str, Image.Image]]):
The image for inpainting.
- mask (List[Union[str, Image.Image]]):
The mask for inpainting.
- negative_prompt (Optional[str]):
The prompt or prompts to guide the image generation. Defaults to None.
- height (int, optional):
The height in pixels of the generated image. Defaults to None.
- width (int, optional):
The width in pixels of the generated image. Defaults to None.
- num_inference_steps (int): Number of inference steps.
Defaults to 50.
- output_type (str): The output format of the generate image.
Choose between ‘pil’ and ‘latent’. Defaults to ‘pil’.
**kwargs: Other arguments.
prompt (list[str]) –
image (list[str | PIL.Image.Image]) –
mask (list[str | PIL.Image.Image]) –
negative_prompt (str | None) –
height (int | None) –
width (int | None) –
num_inference_steps (int) –
output_type (str) –
list[numpy.ndarray]
Forward function.
Args:¶Returns:¶inputs (dict): The input dict. data_samples (Optional[list], optional): The data samples.
Defaults to None.
mode (str, optional): The mode. Defaults to “loss”.
dict: The loss dict.
inputs (dict) –
data_samples (Optional[list]) –
mode (str) –
dict
model (str) –
data_preprocessor (dict | torch.nn.Module | None) –
Bases: mmengine.model.BaseModel
`Stable Diffusion XL.
<https://huggingface.co/papers/2307.01952>`_
Args:¶tokenizer_one (dict): Config of tokenizer one. tokenizer_two (dict): Config of tokenizer two. scheduler (dict): Config of scheduler. text_encoder_one (dict): Config of text encoder one. text_encoder_two (dict): Config of text encoder two. vae (dict): Config of vae. unet (dict): Config of unet. model (str): pretrained model name of stable diffusion xl.
Defaults to ‘stabilityai/stable-diffusion-xl-base-1.0’.
- loss (dict): Config of loss. Defaults to
dict(type='L2Loss', loss_weight=1.0)
.- unet_lora_config (dict, optional): The LoRA config dict for Unet.
example. dict(type=”LoRA”, r=4). type is chosen from LoRA, LoHa, LoKr. Other config are same as the config of PEFT. https://github.com/huggingface/peft Defaults to None.
- text_encoder_lora_config (dict, optional): The LoRA config dict for
Text Encoder. example. dict(type=”LoRA”, r=4). type is chosen from LoRA, LoHa, LoKr. Other config are same as the config of PEFT. https://github.com/huggingface/peft Defaults to None.
- prior_loss_weight (float): The weight of prior preservation loss.
It works when training dreambooth with class images.
- prediction_type (str): The prediction_type that shall be used for
training. Choose between ‘epsilon’ or ‘v_prediction’ or leave None. If left to None the default prediction type of the scheduler: noise_scheduler.config.prediciton_type is chosen. Defaults to None.
- data_preprocessor (dict, optional): The pre-process config of
- noise_generator (dict, optional): The noise generator config.
Defaults to
dict(type='WhiteNoise')
.- timesteps_generator (dict, optional): The timesteps generator config.
Defaults to
dict(type='TimeSteps')
.- input_perturbation_gamma (float): The gamma of input perturbation.
The recommended value is 0.1 for Input Perturbation. Defaults to 0.0.
vae_batch_size (int): The batch size of vae. Defaults to 8. finetune_text_encoder (bool, optional): Whether to fine-tune text
encoder. Defaults to False.
- gradient_checkpointing (bool): Whether or not to use gradient
checkpointing to save memory at the expense of slower backward pass. Defaults to False.
- pre_compute_text_embeddings (bool): Whether or not to pre-compute text
embeddings to save memory. Defaults to False.
- enable_xformers (bool): Whether or not to enable memory efficient
attention. Defaults to False.
Get device information.
torch.device
device.
Set LORA for model.
None
Prepare model for training.
Disable gradient for some models.
None
Set xformers for model.
None
Inference function.
Args:¶
- prompt (List[str]):
The prompt or prompts to guide the image generation.
- negative_prompt (Optional[str]):
The prompt or prompts to guide the image generation. Defaults to None.
- height (int, optional):
The height in pixels of the generated image. Defaults to None.
- width (int, optional):
The width in pixels of the generated image. Defaults to None.
- num_inference_steps (int): Number of inference steps.
Defaults to 50.
- output_type (str): The output format of the generate image.
Choose between ‘pil’ and ‘latent’. Defaults to ‘pil’.
**kwargs: Other arguments.
prompt (list[str]) –
negative_prompt (str | None) –
height (int | None) –
width (int | None) –
num_inference_steps (int) –
output_type (str) –
list[numpy.ndarray]
Encode prompt.
Args:¶Returns:¶text_one (torch.Tensor): Token ids from tokenizer one. text_two (torch.Tensor): Token ids from tokenizer two.
tuple[torch.Tensor, torch.Tensor]: Prompt embeddings
text_one (torch.Tensor) –
text_two (torch.Tensor) –
tuple[torch.Tensor, torch.Tensor]
Val step.
data (Union[tuple, dict, list]) –
list
Test step.
data (Union[tuple, dict, list]) –
list
Calculate loss.
model_pred (torch.Tensor) –
noise (torch.Tensor) –
latents (torch.Tensor) –
timesteps (torch.Tensor) –
weight (torch.Tensor | None) –
dict[str, torch.Tensor]
Preprocess model input.
latents (torch.Tensor) –
noise (torch.Tensor) –
timesteps (torch.Tensor) –
torch.Tensor
Forward vae.
img (torch.Tensor) –
num_batches (int) –
torch.Tensor
Forward function.
Args:¶Returns:¶inputs (dict): The input dict. data_samples (Optional[list], optional): The data samples.
Defaults to None.
mode (str, optional): The mode. Defaults to “loss”.
dict: The loss dict.
inputs (dict) –
data_samples (Optional[list]) –
mode (str) –
dict
tokenizer_one (dict) –
tokenizer_two (dict) –
scheduler (dict) –
text_encoder_one (dict) –
text_encoder_two (dict) –
vae (dict) –
unet (dict) –
model (str) –
loss (dict | None) –
unet_lora_config (dict | None) –
text_encoder_lora_config (dict | None) –
prior_loss_weight (float) –
prediction_type (str | None) –
data_preprocessor (dict | torch.nn.Module | None) –
noise_generator (dict | None) –
timesteps_generator (dict | None) –
input_perturbation_gamma (float) –
vae_batch_size (int) –
finetune_text_encoder (bool) –
gradient_checkpointing (bool) –
pre_compute_text_embeddings (bool) –
enable_xformers (bool) –
Bases: mmengine.model.base_model.data_preprocessor.BaseDataPreprocessor
SDXLDataPreprocessor.
non_blocking (Optional[bool]) –
Preprocesses the data into the model input format.
After the data pre-processing of cast_data()
, forward
will stack the input tensor list to a batch tensor at the first dimension.
Returns:¶data (dict): Data returned by dataloader training (bool): Whether to enable training time augmentation.
dict or list: Data in the same format as the model input.
data (dict) –
training (bool) –
dict | list
Bases: mmengine.model.base_model.data_preprocessor.BaseDataPreprocessor
SDXLControlNetDataPreprocessor.
non_blocking (Optional[bool]) –
Preprocesses the data into the model input format.
After the data pre-processing of cast_data()
, forward
will stack the input tensor list to a batch tensor at the first dimension.
Returns:¶data (dict): Data returned by dataloader training (bool): Whether to enable training time augmentation.
dict or list: Data in the same format as the model input.
data (dict) –
training (bool) –
dict | list
Bases: diffengine.models.editors.stable_diffusion_xl.StableDiffusionXL
Stable Diffusion XL ControlNet.
Args:¶
- controlnet_model (str, optional): Path to pretrained ControlNet model.
If None, use the default ControlNet model from Unet. Defaults to None.
- transformer_layers_per_block (List[int], optional):
The number of layers per block in the transformer. More details: https://huggingface.co/diffusers/controlnet-canny-sdxl-1.0-small. Defaults to None.
- unet_lora_config (dict, optional): The LoRA config dict for Unet.
example. dict(type=”LoRA”, r=4). type is chosen from LoRA, LoHa, LoKr. Other config are same as the config of PEFT. https://github.com/huggingface/peft Defaults to None.
- text_encoder_lora_config (dict, optional): The LoRA config dict for
Text Encoder. example. dict(type=”LoRA”, r=4). type is chosen from LoRA, LoHa, LoKr. Other config are same as the config of PEFT. https://github.com/huggingface/peft Defaults to None.
- finetune_text_encoder (bool, optional): Whether to fine-tune text
encoder. This should be False when training ControlNet. Defaults to False.
- data_preprocessor (dict, optional): The pre-process config of
Set LORA for model.
None
Prepare model for training.
Disable gradient for some models.
None
Set xformers for model.
None
Inference function.
Args:¶
- prompt (List[str]):
The prompt or prompts to guide the image generation.
- condition_image (List[Union[str, Image.Image]]):
The condition image for ControlNet.
- negative_prompt (Optional[str]):
The prompt or prompts to guide the image generation. Defaults to None.
- height (int, optional):
The height in pixels of the generated image. Defaults to None.
- width (int, optional):
The width in pixels of the generated image. Defaults to None.
- num_inference_steps (int): Number of inference steps.
Defaults to 50.
- output_type (str): The output format of the generate image.
Choose between ‘pil’ and ‘latent’. Defaults to ‘pil’.
**kwargs: Other arguments.
prompt (list[str]) –
condition_image (list[str | PIL.Image.Image]) –
negative_prompt (str | None) –
height (int | None) –
width (int | None) –
num_inference_steps (int) –
output_type (str) –
list[numpy.ndarray]
Forward function for torch.compile.
noisy_latents (torch.Tensor) –
timesteps (torch.Tensor) –
prompt_embeds (torch.Tensor) –
unet_added_conditions (dict) –
inputs (dict) –
torch.Tensor
Forward function.
Args:¶Returns:¶inputs (dict): The input dict. data_samples (Optional[list], optional): The data samples.
Defaults to None.
mode (str, optional): The mode. Defaults to “loss”.
dict: The loss dict.
inputs (dict) –
data_samples (Optional[list]) –
mode (str) –
dict
controlnet_model (str | None) –
transformer_layers_per_block (list[int] | None) –
unet_lora_config (dict | None) –
text_encoder_lora_config (dict | None) –
finetune_text_encoder (bool) –
data_preprocessor (dict | torch.nn.Module | None) –
Bases: diffengine.models.editors.stable_diffusion_xl.StableDiffusionXL
Stable Diffusion XL DPO.
Args:¶beta_dpo (int): DPO KL Divergence penalty. Defaults to 5000. loss (dict, optional): The loss config. Defaults to None. data_preprocessor (dict, optional): The pre-process config of
Prepare model for training.
Disable gradient for some models.
None
Calculate loss.
model_pred (torch.Tensor) –
ref_pred (torch.Tensor) –
noise (torch.Tensor) –
latents (torch.Tensor) –
timesteps (torch.Tensor) –
weight (torch.Tensor | None) –
dict[str, torch.Tensor]
Forward function.
Args:¶Returns:¶inputs (dict): The input dict. data_samples (Optional[list], optional): The data samples.
Defaults to None.
mode (str, optional): The mode. Defaults to “loss”.
dict: The loss dict.
inputs (dict) –
data_samples (Optional[list]) –
mode (str) –
dict
beta_dpo (int) –
loss (dict | None) –
data_preprocessor (dict | torch.nn.Module | None) –
Bases: mmengine.model.base_model.data_preprocessor.BaseDataPreprocessor
SDXLDataPreprocessor.
non_blocking (Optional[bool]) –
Preprocesses the data into the model input format.
After the data pre-processing of cast_data()
, forward
will stack the input tensor list to a batch tensor at the first dimension.
Returns:¶data (dict): Data returned by dataloader training (bool): Whether to enable training time augmentation.
dict or list: Data in the same format as the model input.
data (dict) –
training (bool) –
dict | list
Bases: diffengine.models.editors.stable_diffusion_xl.StableDiffusionXL
Stable Diffusion XL Inpaint.
Args:¶
- model (str): pretrained model name of stable diffusion.
Defaults to ‘diffusers/stable-diffusion-xl-1.0-inpainting-0.1’.
- data_preprocessor (dict, optional): The pre-process config of
Prepare model for training.
Disable gradient for some models.
None
Inference function.
Args:¶
- prompt (List[str]):
The prompt or prompts to guide the image generation.
- image (List[Union[str, Image.Image]]):
The image for inpainting.
- mask (List[Union[str, Image.Image]]):
The mask for inpainting.
- negative_prompt (Optional[str]):
The prompt or prompts to guide the image generation. Defaults to None.
- height (int, optional):
The height in pixels of the generated image. Defaults to None.
- width (int, optional):
The width in pixels of the generated image. Defaults to None.
- num_inference_steps (int): Number of inference steps.
Defaults to 50.
- output_type (str): The output format of the generate image.
Choose between ‘pil’ and ‘latent’. Defaults to ‘pil’.
**kwargs: Other arguments.
prompt (list[str]) –
image (list[str | PIL.Image.Image]) –
mask (list[str | PIL.Image.Image]) –
negative_prompt (str | None) –
height (int | None) –
width (int | None) –
num_inference_steps (int) –
output_type (str) –
list[numpy.ndarray]
Forward function.
Args:¶Returns:¶inputs (dict): The input dict. data_samples (Optional[list], optional): The data samples.
Defaults to None.
mode (str, optional): The mode. Defaults to “loss”.
dict: The loss dict.
inputs (dict) –
data_samples (Optional[list]) –
mode (str) –
dict
model (str) –
data_preprocessor (dict | torch.nn.Module | None) –
Bases: mmengine.model.base_model.data_preprocessor.BaseDataPreprocessor
SDXLInpaintDataPreprocessor.
non_blocking (Optional[bool]) –
Preprocesses the data into the model input format.
After the data pre-processing of cast_data()
, forward
will stack the input tensor list to a batch tensor at the first dimension.
Returns:¶data (dict): Data returned by dataloader training (bool): Whether to enable training time augmentation.
dict or list: Data in the same format as the model input.
data (dict) –
training (bool) –
dict | list
Bases: diffengine.models.editors.stable_diffusion_xl.StableDiffusionXL
Stable Diffusion XL T2I Adapter.
Args:¶adapter (dict): The adapter config. unet_lora_config (dict, optional): The LoRA config dict for Unet.
example. dict(type=”LoRA”, r=4). type is chosen from LoRA, LoHa, LoKr. Other config are same as the config of PEFT. https://github.com/huggingface/peft Defaults to None.
- text_encoder_lora_config (dict, optional): The LoRA config dict for
Text Encoder. example. dict(type=”LoRA”, r=4). type is chosen from LoRA, LoHa, LoKr. Other config are same as the config of PEFT. https://github.com/huggingface/peft Defaults to None.
- finetune_text_encoder (bool, optional): Whether to fine-tune text
encoder. This should be False when training ControlNet. Defaults to False.
- timesteps_generator (dict, optional): The timesteps generator config.
Defaults to
dict(type='CubicSamplingTimeSteps')
.- data_preprocessor (dict, optional): The pre-process config of
Set LORA for model.
None
Prepare model for training.
Disable gradient for some models.
None
Set xformers for model.
None
Inference function.
Args:¶
- prompt (List[str]):
The prompt or prompts to guide the image generation.
- condition_image (List[Union[str, Image.Image]]):
The condition image for ControlNet.
- negative_prompt (Optional[str]):
The prompt or prompts to guide the image generation. Defaults to None.
- height (int, optional):
The height in pixels of the generated image. Defaults to None.
- width (int, optional):
The width in pixels of the generated image. Defaults to None.
- num_inference_steps (int): Number of inference steps.
Defaults to 50.
- output_type (str): The output format of the generate image.
Choose between ‘pil’ and ‘latent’. Defaults to ‘pil’.
**kwargs: Other arguments.
prompt (list[str]) –
condition_image (list[str | PIL.Image.Image]) –
negative_prompt (str | None) –
height (int | None) –
width (int | None) –
num_inference_steps (int) –
output_type (str) –
list[numpy.ndarray]
Forward function for torch.compile.
noisy_latents (torch.Tensor) –
timesteps (torch.Tensor) –
prompt_embeds (torch.Tensor) –
unet_added_conditions (dict) –
inputs (dict) –
torch.Tensor
Forward function.
Args:¶Returns:¶inputs (dict): The input dict. data_samples (Optional[list], optional): The data samples.
Defaults to None.
mode (str, optional): The mode. Defaults to “loss”.
dict: The loss dict.
inputs (dict) –
data_samples (Optional[list]) –
mode (str) –
dict
adapter (dict) –
unet_lora_config (dict | None) –
text_encoder_lora_config (dict | None) –
finetune_text_encoder (bool) –
timesteps_generator (dict | None) –
data_preprocessor (dict | torch.nn.Module | None) –
Bases: mmengine.model.BaseModel
`Wuerstchen Prior.
<https://arxiv.org/abs/2306.00637>`_
Args:¶tokenizer (dict): Config of tokenizer. scheduler (dict): Config of scheduler. text_encoder (dict): Config of text encoder. image_encoder (dict): Config of image encoder. prior (dict): Config of prior. decoder_model (str): pretrained decoder model name of Wuerstchen.
Defaults to ‘warp-ai/wuerstchen’.
- prior_model (str): pretrained prior model name of Wuerstchen.
Defaults to ‘warp-ai/wuerstchen-prior’.
- loss (dict): Config of loss. Defaults to
dict(type='L2Loss', loss_weight=1.0)
.- prior_lora_config (dict, optional): The LoRA config dict for Prior.
example. dict(type=”LoRA”, r=4). type is chosen from LoRA, LoHa, LoKr. Other config are same as the config of PEFT. https://github.com/huggingface/peft Defaults to None.
- text_encoder_lora_config (dict, optional): The LoRA config dict for
Text Encoder. example. dict(type=”LoRA”, r=4). type is chosen from LoRA, LoHa, LoKr. Other config are same as the config of PEFT. https://github.com/huggingface/peft Defaults to None.
- prior_loss_weight (float): The weight of prior preservation loss.
It works when training dreambooth with class images.
- data_preprocessor (dict, optional): The pre-process config of
- noise_generator (dict, optional): The noise generator config.
Defaults to
dict(type='WhiteNoise')
.- timesteps_generator (dict, optional): The timesteps generator config.
Defaults to
dict(type='WuerstchenRandomTimeSteps')
.- input_perturbation_gamma (float): The gamma of input perturbation.
The recommended value is 0.1 for Input Perturbation. Defaults to 0.0.
- finetune_text_encoder (bool, optional): Whether to fine-tune text
encoder. Defaults to False.
- gradient_checkpointing (bool): Whether or not to use gradient
checkpointing to save memory at the expense of slower backward pass. Defaults to False.
Get device information.
torch.device
device.
Set LORA for model.
None
Prepare model for training.
Disable gradient for some models.
None
Convert the model into training mode.
mode (bool) –
None
Inference function.
Args:¶
- prompt (List[str]):
The prompt or prompts to guide the image generation.
- negative_prompt (Optional[str]):
The prompt or prompts to guide the image generation. Defaults to None.
- height (int, optional):
The height in pixels of the generated image. Defaults to None.
- width (int, optional):
The width in pixels of the generated image. Defaults to None.
- num_inference_steps (int): Number of inference steps.
Defaults to 50.
- output_type (str): The output format of the generate image.
Choose between ‘pil’ and ‘latent’. Defaults to ‘pil’.
**kwargs: Other arguments.
prompt (list[str]) –
negative_prompt (str | None) –
height (int | None) –
width (int | None) –
num_inference_steps (int) –
output_type (str) –
list[numpy.ndarray]
Val step.
data (Union[tuple, dict, list]) –
list
Test step.
data (Union[tuple, dict, list]) –
list
Calculate loss.
model_pred (torch.Tensor) –
noise (torch.Tensor) –
timesteps (torch.Tensor) –
weight (torch.Tensor | None) –
dict[str, torch.Tensor]
Preprocess model input.
latents (torch.Tensor) –
noise (torch.Tensor) –
timesteps (torch.Tensor) –
torch.Tensor
Forward function.
Args:¶Returns:¶inputs (dict): The input dict. data_samples (Optional[list], optional): The data samples.
Defaults to None.
mode (str, optional): The mode. Defaults to “loss”.
dict: The loss dict.
inputs (dict) –
data_samples (Optional[list]) –
mode (str) –
dict
tokenizer (dict) –
scheduler (dict) –
text_encoder (dict) –
image_encoder (dict) –
prior (dict) –
decoder_model (str) –
prior_model (str) –
loss (dict | None) –
prior_lora_config (dict | None) –
text_encoder_lora_config (dict | None) –
prior_loss_weight (float) –
data_preprocessor (dict | torch.nn.Module | None) –
noise_generator (dict | None) –
timesteps_generator (dict | None) –
input_perturbation_gamma (float) –
finetune_text_encoder (bool) –
gradient_checkpointing (bool) –
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4