Stable Diffusion XL 1.0

Overview

The DataCrunch Inference Service offers the Stable Diffusion XL 1.0 endpoint, an advanced solution for generating high-quality images based on textual descriptions. This documentation provides a comprehensive guide to utilizing the service effectively.

Endpoint features

Examples of API Usage

The following examples demonstrate how to interact with the service using different features.

Simple Base SDXL (No Refiner)

To generate an image without the refining process set is_ensemble=false and refiner=false.

curl -X POST https://inference.datacrunch.io/v1/images/stable-diffusion-xl/generate \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <your_api_key>" \
  -d \
'{
    "prompt": "A cat with a hat",
    "height": 512,
    "width": 512,
    "num_inference_steps": 50,
    "num_images_per_prompt": 1,
    "seed": 42,
    "refiner": false,
    "is_ensemble": false
}'

Ensemble of Expert Denoisers

To run in the Ensemble of Experts mode set is_ensemble=true and refiner=false.

Initially, the process involves denoising using the base model for a number of steps calculated as num_inference_steps multiplied by (1 - refiner_ratio). Following this, the procedure continues for additional steps determined by multiplying num_inference_steps by the refiner_ratio, during which the refiner model is utilized.

For detailed information on parameters and their effects, refer to the Ensemble of Expert Denoisers documentation.

curl -X POST https://inference.datacrunch.io/v1/images/stable-diffusion-xl/generate \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <your_api_key>" \
  -d \
'{
    "prompt": "A cat with a hat",
    "height": 512,
    "width": 512,
    "num_inference_steps": 50,
    "num_images_per_prompt": 1,
    "seed": 42,
    "refiner": false,
    "is_ensemble": true,
    "refiner_ratio": 0.2
}'

Refine the Denoised Base Image

The two-step pipeline operates as follows: Initially, the image undergoes a full denoising process using the base model. Subsequently, the refiner model is applied in an image-to-image pipeline to the output of the base model.

To enable this pipeline, set is_ensemble to false and refiner to true.

The number of steps for each model — the base and the refiner — are independently controlled by num_inference_steps and num_inference_steps_refiner, respectively. Additionally, distinct guidance_scale and guidance_scale_refiner values are utilized for each phase.

curl -X POST https://inference.datacrunch.io/v1/images/stable-diffusion-xl/generate \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <your_api_key>" \
  -d \
'{
    "prompt": "A cat with a hat",
    "height": 512,
    "width": 512,
    "num_inference_steps": 40,
    "num_inference_steps_refiner": 10,
    "guidance_scale": 4,
    "guidance_scale_refiner": 4,
    "num_images_per_prompt": 1,
    "seed": 42,
    "refiner": true,
    "is_ensemble": false
}'

API Specification

Generates images based on text prompts

POSThttps://inference.datacrunch.io/v1/images/stable-diffusion-xl/generate
Body

Request parameters for image generation

prompt*Prompt

Text prompt for image generation

heightHeight

Height of the output image

widthWidth

Width of the output image

num_inference_stepsNum Inference Steps

Number of inference steps. Higher values can lead to more detailed images.

guidance_scaleGuidance Scale

Scaling factor for guidance. A higher value increases adherence to the prompt but may reduce image diversity.

num_images_per_promptNum Images Per Prompt

Number of images to generate per prompt

seedSeed

Seed for random number generator. Using the same seed with the same parameters will produce the same image.

negative_promptNegative Prompt

Text for negative prompt to guide the model on what to avoid. Helps in refining the results by specifying undesired elements.

seed_imageSeed Image

Base64-encoded seed image string. The model uses this as a starting point for image generation.

strengthStrength

Determines the level of modification applied to the seed image. A lower value results in minimal changes, while a higher value leads to more significant alterations.

guidance_scale_refinerGuidance Scale Refiner

Scaling factor for the guidance in the refiner model. Affects how closely the refiner output adheres to the prompt.

refinerRefiner

Indicates whether to use the refiner model for additional image processing.

num_inference_steps_refinerNum Inference Steps Refiner

Specifies the number of inference steps for the refiner model. Higher values can lead to finer details and better adherence to the prompt.

schedulerScheduler

Defines the scheduler algorithm used for image generation. Different schedulers can affect the quality and characteristics of the output.

style_selectedStyle Selected

Specifies a particular style to be applied to the generated images. Useful for achieving consistent aesthetics across different prompts.

is_ensembleIs Ensemble

Determines whether to use the Ensemble of Expert Denoisers pipeline for image generation, which can lead to more nuanced and detailed images.

refiner_ratioRefiner Ratio

The fraction of inference steps dedicated to the refiner model when 'is_ensemble' is true. A higher ratio gives more weight to the refiner's output.

aspect_ratioAspect Ratio

Specifies the aspect ratio for the output image. This overrides the individual height and width settings.

lora_idLora Id

Identifier for a specific LoRA model to be used for image generation. Allows for selection of fine-tuned models.

lora_nameLora Name

Name of the LoRA model to be loaded. This option allows for the selection of publicly available LoRA models.

image_typeImage Type

Specifies the output image format, such as JPEG or PNG.

safety_filterSafety Filter

Indicates whether to apply a safety filter to the generated images, which can help in filtering out NSFW content.

timestep_spacingTimestep Spacing

Specifies the timestep spacing method for the scheduler. Different methods can affect the progression of image generation. The possible values are trailing, leading and linspace.

Response

Successful Response

Body
error*Whether there is error
error_messageError message
results*Results
Request
const response = await fetch('https://inference.datacrunch.io/v1/images/stable-diffusion-xl/generate', {
    method: 'POST',
    headers: {
      "Content-Type": "application/json"
    },
    body: JSON.stringify({
      "prompt": "A cat with a hat"
    }),
});
const data = await response.json();
Response
{
  "error": false,
  "error_message": "text",
  "results": {
    "output_image": [
      "text"
    ],
    "elapsed_time": 0.123,
    "seed_used": 42,
    "is_ensemble": false,
    "refiner_ratio": 0.2,
    "aspect_ratio": "1.0",
    "scheduler": "DDIM",
    "strength": 0.2,
    "usage": {
      "num_images_per_prompt": 1,
      "num_iterations": 50
    },
    "image_dimensions": [
      1024,
      1024
    ],
    "lora_id": "text",
    "lora_name": "text",
    "has_nsfw_concept": [
      false
    ]
  }
}

API Parameters

  • prompt (str, required): Prompt text.

  • height (int, optional): Height of the output image. Setting aspect_ratio overrides this value. Defaults to 1024.

  • width (int, optional): Width of the output image. Setting aspect_ratio overrides this value. Defaults to 1024.

  • num_inference_steps (int, optional): Number of inference (denoising) steps. Defaults to 50.

  • guidance_scale (float, optional): Scaling factor for guidance. Specifies how much to follow the text prompt. Defaults to 4.0.

  • num_images_per_prompt (int, optional): Number of images to generate per prompt. Defaults to 1.

  • seed (int, optional): Seed for random number generator. Defaults to 42.

  • negative_prompt (str, optional): Negative prompt text.

  • seed_image (str, optional): Base64-encoded seed image string.

  • strength (float, optional, Range: [0.05, 1.0]): How much noise is added to the seed_image before generation. Defaults to 0.2.

  • scheduler (str, optional): Scheduler to use. Supported schedulers: DDIM, K_EULER, EulerA, DPMSolverMultistep, KarrasDPM, PNDM, HeunDiscrete. Defaults to DDIM.

  • timestep_spacing: (str, optional): specifies the timestep spacing for the scheduler. Supported values: linspace, trailing, leading. Defaults to linspace.

  • guidance_scale_refiner (float, optional): Scaling factor for refiner guidance (corresponds to guidance_scale). Defaults to 1.0.

  • refiner (bool, optional): Whether to use the refiner model. Defaults to false.

  • num_inference_steps_refiner (int, optional): Number of inference steps for refiner, applied when is_ensemble=false. Defaults to 50.

  • style_selected (str, optional): Apply the specified to the provided prompt, see supported styles.

  • is_ensemble (bool, optional): Whether to use the Ensemble of Expert Denoisers pipeline. Defaults to false.

  • refiner_ratio (float, optional): Requires is_ensemble=true. The fraction of the num_inference_steps steps to run the refiner for. For example, if num_inference_steps=40, and refiner_ratio=0.1 then the base model will run for 40 * (1-0.1) = 36 steps, and the refiner for 40 * 0.1 = 4 steps. Values over 0.2 start to produce unnatural-looking images. Defaults to 0.2.

  • aspect_ratio (str, optional): Aspect ratio of the output image. Setting this value overrides the width and height values.

  • lora_id (str, optional): Finetuned LoRA ID to load (LoRA file must exist on DataCrunch platform).

  • lora_name (str, optional): Public LoRA to be loaded. Currently only supported lora_name="offset" (corresponding to: "sd_xl_offset_example-lora_1.0.safetensors").

  • safety_filter (bool, optional): Whether to use NSFW filter. Defaults to true.

Last updated