Stable Diffusion XL 1.0
Overview
The DataCrunch Inference Service offers the Stable Diffusion XL 1.0 endpoint, an advanced solution for generating high-quality images based on textual descriptions. This documentation provides a comprehensive guide to utilizing the service effectively.
Endpoint features
Safety Filter: Option to enable or disable a safety filter for content moderation.
Style Templates: Support for applying predefined styles to the generated images.
Limited LoRA Support: Integration of LoRA (Limited) for custom model adjustments.
Examples of API Usage
The following examples demonstrate how to interact with the service using different features.
Simple Base SDXL (No Refiner)
To generate an image without the refining process set is_ensemble=false
and refiner=false
.
Ensemble of Expert Denoisers
To run in the Ensemble of Experts mode set is_ensemble=true
and refiner=false
.
Initially, the process involves denoising using the base model for a number of steps calculated as num_inference_steps
multiplied by (1 - refiner_ratio)
. Following this, the procedure continues for additional steps determined by multiplying num_inference_steps
by the refiner_ratio
, during which the refiner model is utilized.
For detailed information on parameters and their effects, refer to the Ensemble of Expert Denoisers documentation.
Refine the Denoised Base Image
The two-step pipeline operates as follows: Initially, the image undergoes a full denoising process using the base model. Subsequently, the refiner model is applied in an image-to-image pipeline to the output of the base model.
To enable this pipeline, set is_ensemble
to false
and refiner
to true
.
The number of steps for each model — the base and the refiner — are independently controlled by num_inference_steps
and num_inference_steps_refiner
, respectively. Additionally, distinct guidance_scale
and guidance_scale_refiner
values are utilized for each phase.
API Specification
Generates images based on text prompts
Request parameters for image generation
Text prompt for image generation
Height of the output image
Width of the output image
Number of inference steps. Higher values can lead to more detailed images.
Scaling factor for guidance. A higher value increases adherence to the prompt but may reduce image diversity.
Number of images to generate per prompt
Seed for random number generator. Using the same seed with the same parameters will produce the same image.
Text for negative prompt to guide the model on what to avoid. Helps in refining the results by specifying undesired elements.
Base64-encoded seed image string. The model uses this as a starting point for image generation.
Determines the level of modification applied to the seed image. A lower value results in minimal changes, while a higher value leads to more significant alterations.
Scaling factor for the guidance in the refiner model. Affects how closely the refiner output adheres to the prompt.
Indicates whether to use the refiner model for additional image processing.
Specifies the number of inference steps for the refiner model. Higher values can lead to finer details and better adherence to the prompt.
Defines the scheduler algorithm used for image generation. Different schedulers can affect the quality and characteristics of the output.
Specifies a particular style to be applied to the generated images. Useful for achieving consistent aesthetics across different prompts.
Determines whether to use the Ensemble of Expert Denoisers pipeline for image generation, which can lead to more nuanced and detailed images.
The fraction of inference steps dedicated to the refiner model when 'is_ensemble' is true. A higher ratio gives more weight to the refiner's output.
Specifies the aspect ratio for the output image. This overrides the individual height and width settings.
Identifier for a specific LoRA model to be used for image generation. Allows for selection of fine-tuned models.
Name of the LoRA model to be loaded. This option allows for the selection of publicly available LoRA models.
Specifies the output image format, such as JPEG or PNG.
Indicates whether to apply a safety filter to the generated images, which can help in filtering out NSFW content.
Specifies the timestep spacing method for the scheduler. Different methods can affect the progression of image generation. The possible values are trailing, leading and linspace.
Successful Response
API Parameters
prompt (
str
, required): Prompt text.height (
int
, optional): Height of the output image. Settingaspect_ratio
overrides this value. Defaults to1024
.width (
int
, optional): Width of the output image. Settingaspect_ratio
overrides this value. Defaults to1024
.num_inference_steps (
int
, optional): Number of inference (denoising) steps. Defaults to50
.guidance_scale (
float
, optional): Scaling factor for guidance. Specifies how much to follow the text prompt. Defaults to4.0
.num_images_per_prompt (
int
, optional): Number of images to generate per prompt. Defaults to1
.seed (
int
, optional): Seed for random number generator. Defaults to42
.negative_prompt (
str
, optional): Negative prompt text.seed_image (
str
, optional): Base64-encoded seed image string.strength (
float
, optional, Range:[0.05, 1.0]
): How much noise is added to theseed_image
before generation. Defaults to0.2
.scheduler (
str
, optional): Scheduler to use. Supported schedulers:DDIM
,K_EULER
,EulerA
,DPMSolverMultistep
,KarrasDPM
,PNDM
,HeunDiscrete
. Defaults toDDIM
.timestep_spacing: (
str
, optional): specifies the timestep spacing for the scheduler. Supported values:linspace
,trailing
,leading
. Defaults tolinspace
.guidance_scale_refiner (
float
, optional): Scaling factor for refiner guidance (corresponds toguidance_scale
). Defaults to1.0
.refiner (
bool
, optional): Whether to use the refiner model. Defaults tofalse
.num_inference_steps_refiner (
int
, optional): Number of inference steps for refiner, applied whenis_ensemble=false
. Defaults to50
.style_selected (
str
, optional): Apply the specified to the provided prompt, see supported styles.is_ensemble (
bool
, optional): Whether to use the Ensemble of Expert Denoisers pipeline. Defaults tofalse
.refiner_ratio (
float
, optional): Requiresis_ensemble=true
. The fraction of thenum_inference_steps
steps to run the refiner for. For example, ifnum_inference_steps=40
, andrefiner_ratio=0.1
then the base model will run for40 * (1-0.1) = 36
steps, and the refiner for40 * 0.1 = 4
steps. Values over0.2
start to produce unnatural-looking images. Defaults to0.2
.aspect_ratio (
str
, optional): Aspect ratio of the output image. Setting this value overrides thewidth
andheight
values.lora_id (
str
, optional): Finetuned LoRA ID to load (LoRA file must exist on DataCrunch platform).lora_name (
str
, optional): Public LoRA to be loaded. Currently only supportedlora_name="offset"
(corresponding to: "sd_xl_offset_example-lora_1.0.safetensors").safety_filter (
bool
, optional): Whether to use NSFW filter. Defaults totrue
.
Last updated