Docs
DataCrunch HomeSDKAPILogin / Signup
  • Welcome to DataCrunch
    • Overview
    • Locations and Sustainability
    • Support
  • GPU Instances
    • Set up a GPU instance
    • Securing Your Instance
    • Shutdown, Hibernate, and Delete
    • Adding a New User
    • Block Volumes
    • Shared Filesystems (SFS)
    • Managing SSH Keys
    • Connecting to Your DataCrunch.io Server
    • Connecting to Jupyter notebook with VS Code
    • Team Projects
    • Pricing and Billing
  • Clusters
    • Instant Clusters
      • Deploying a GPU cluster
      • Slurm
      • Spack
      • Good to know
    • Customized GPU clusters
  • Containers
    • Overview
    • Container Registries
    • Scaling and health-checks
    • Batching and Streaming
    • Async Inference
    • Tutorials
      • Quick: Deploy with vLLM
      • In-Depth: Deploy with TGI
      • In-Depth: Deploy with SGLang
      • In-Depth: Deploy with vLLM
      • In-Depth: Deploy with Replicate Cog
      • In-Depth: Asynchronous Inference Requests with Whisper
  • Inference
    • Overview
    • Authorization
    • Image Models
      • Flux.1 Kontext pro
      • Flux.1 Kontext max
    • Audio Models
      • Whisper X
  • Pricing and Billing
  • Resources
    • Resources Overview
    • DataCrunch API
  • Python SDK
  • Get Free Compute Credits
Powered by GitBook
On this page
  • Serverless Containers pricing
  • Features
  • Coming soon

Was this helpful?

  1. Containers

Overview

Last updated 15 days ago

Was this helpful?

With our Containers service, you can create your own inference endpoints to serve your models while paying only for the compute that is in active use.

We support loading containers from any registry and are about how the container is built.

You can deploy your first container by following the guide: Quick: Deploy with vLLM

Serverless Containers pricing

Price is calculated in 10-minute intervals for the currently running replicas of your container. The number of currently running replicas will depend on your Scaling and health-checks settings.

.

Features

  • Scale to hundreds of GPUs when needed with our battle-tested inference cluster

  • Scale to zero when idle, so you only pay while your container is running

  • Support for any container registry, using either registry-specific authentication methods or a vanilla Docker config.json-style auth

  • Both manual and request queue-based autoscaling, with adjustable scaling sensitivity

  • Logging and metrics in the dashboard

  • for managing your deployments

  • Support for

Coming soon

  • Shared storage between the Containers and Cloud GPU instances

  • Batch jobs

quite flexible
See here for pricing
RESTful API
Python SDK
async / polling requests