Stable Diffusion XL 1.0

Overview

The DataCrunch Inference Service offers the Stable Diffusion XL 1.0 endpoint, an advanced solution for generating high-quality images based on textual descriptions. This documentation provides a comprehensive guide to utilizing the service effectively.

Endpoint features

Examples of API Usage

The following examples demonstrate how to interact with the service using different features.

Simple Base SDXL (No Refiner)

To generate an image without the refining process set is_ensemble=false and refiner=false.

curl -X POST https://inference.datacrunch.io/v1/images/stable-diffusion-xl/generate \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <your_api_key>" \
  -d \
'{
    "prompt": "A cat with a hat",
    "height": 512,
    "width": 512,
    "num_inference_steps": 50,
    "num_images_per_prompt": 1,
    "seed": 42,
    "refiner": false,
    "is_ensemble": false
}'

Ensemble of Expert Denoisers

To run in the Ensemble of Experts mode set is_ensemble=true and refiner=false.

Initially, the process involves denoising using the base model for a number of steps calculated as num_inference_steps multiplied by (1 - refiner_ratio). Following this, the procedure continues for additional steps determined by multiplying num_inference_steps by the refiner_ratio, during which the refiner model is utilized.

For detailed information on parameters and their effects, refer to the Ensemble of Expert Denoisers documentation.

curl -X POST https://inference.datacrunch.io/v1/images/stable-diffusion-xl/generate \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <your_api_key>" \
  -d \
'{
    "prompt": "A cat with a hat",
    "height": 512,
    "width": 512,
    "num_inference_steps": 50,
    "num_images_per_prompt": 1,
    "seed": 42,
    "refiner": false,
    "is_ensemble": true,
    "refiner_ratio": 0.2
}'

Refine the Denoised Base Image

The two-step pipeline operates as follows: Initially, the image undergoes a full denoising process using the base model. Subsequently, the refiner model is applied in an image-to-image pipeline to the output of the base model.

To enable this pipeline, set is_ensemble to false and refiner to true.

The number of steps for each model — the base and the refiner — are independently controlled by num_inference_steps and num_inference_steps_refiner, respectively. Additionally, distinct guidance_scale and guidance_scale_refiner values are utilized for each phase.

curl -X POST https://inference.datacrunch.io/v1/images/stable-diffusion-xl/generate \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <your_api_key>" \
  -d \
'{
    "prompt": "A cat with a hat",
    "height": 512,
    "width": 512,
    "num_inference_steps": 40,
    "num_inference_steps_refiner": 10,
    "guidance_scale": 4,
    "guidance_scale_refiner": 4,
    "num_images_per_prompt": 1,
    "seed": 42,
    "refiner": true,
    "is_ensemble": false
}'

API Specification

API Parameters

  • prompt (str, required): Prompt text.

  • height (int, optional): Height of the output image. Setting aspect_ratio overrides this value. Defaults to 1024.

  • width (int, optional): Width of the output image. Setting aspect_ratio overrides this value. Defaults to 1024.

  • num_inference_steps (int, optional): Number of inference (denoising) steps. Defaults to 50.

  • guidance_scale (float, optional): Scaling factor for guidance. Specifies how much to follow the text prompt. Defaults to 4.0.

  • num_images_per_prompt (int, optional): Number of images to generate per prompt. Defaults to 1.

  • seed (int, optional): Seed for random number generator. Defaults to 42.

  • negative_prompt (str, optional): Negative prompt text.

  • seed_image (str, optional): Base64-encoded seed image string.

  • strength (float, optional, Range: [0.05, 1.0]): How much noise is added to the seed_image before generation. Defaults to 0.2.

  • scheduler (str, optional): Scheduler to use. Supported schedulers: DDIM, K_EULER, EulerA, DPMSolverMultistep, KarrasDPM, PNDM, HeunDiscrete. Defaults to DDIM.

  • timestep_spacing: (str, optional): specifies the timestep spacing for the scheduler. Supported values: linspace, trailing, leading. Defaults to linspace.

  • guidance_scale_refiner (float, optional): Scaling factor for refiner guidance (corresponds to guidance_scale). Defaults to 1.0.

  • refiner (bool, optional): Whether to use the refiner model. Defaults to false.

  • num_inference_steps_refiner (int, optional): Number of inference steps for refiner, applied when is_ensemble=false. Defaults to 50.

  • style_selected (str, optional): Apply the specified to the provided prompt, see supported styles.

  • is_ensemble (bool, optional): Whether to use the Ensemble of Expert Denoisers pipeline. Defaults to false.

  • refiner_ratio (float, optional): Requires is_ensemble=true. The fraction of the num_inference_steps steps to run the refiner for. For example, if num_inference_steps=40, and refiner_ratio=0.1 then the base model will run for 40 * (1-0.1) = 36 steps, and the refiner for 40 * 0.1 = 4 steps. Values over 0.2 start to produce unnatural-looking images. Defaults to 0.2.

  • aspect_ratio (str, optional): Aspect ratio of the output image. Setting this value overrides the width and height values.

  • lora_id (str, optional): Finetuned LoRA ID to load (LoRA file must exist on DataCrunch platform).

  • lora_name (str, optional): Public LoRA to be loaded. Currently only supported lora_name="offset" (corresponding to: "sd_xl_offset_example-lora_1.0.safetensors").

  • safety_filter (bool, optional): Whether to use NSFW filter. Defaults to true.

Last updated