Docs
DataCrunch HomeSDKAPILogin / Signup
  • Welcome to DataCrunch
    • Overview
    • Locations and Sustainability
    • Pricing and Billing
    • Team Projects
    • Support
  • CPU and GPU Instances
    • Set up a CPU or GPU instance
    • Securing Your Instance
    • Shutdown, Hibernate, and Delete
    • Adding a New User
    • Managing SSH Keys
    • Connecting to Your DataCrunch.io Server
    • Connecting to Jupyter notebook with VS Code
    • Remote desktop access
  • Clusters
    • Instant Clusters
      • Deploying a GPU cluster
      • Slurm
      • Spack
      • Good to know
    • Customized GPU clusters
  • Storage
    • Block Volumes
      • Attaching a block volume
      • Resizing a block volume
      • Cloning a block volume
      • Permanently deleting a block volume
    • Shared Filesystems (SFS)
      • Creating a shared filesystem
      • Editing share settings
      • Mounting a shared filesystem
  • Containers
    • Overview
    • Container Registries
    • Scaling and health-checks
    • Batching and Streaming
    • Async Inference
    • Tutorials
      • Quick: Deploy with vLLM
      • In-Depth: Deploy with TGI
      • In-Depth: Deploy with SGLang
      • In-Depth: Deploy with vLLM
      • In-Depth: Deploy with Replicate Cog
      • In-Depth: Asynchronous Inference Requests with Whisper
  • Inference
    • Overview
    • Authorization
    • Image Models
      • Flux.1 Kontext pro
      • Flux.1 Kontext max
    • Audio Models
      • Whisper X
  • Pricing and Billing
  • Resources
    • Resources Overview
    • Services Overview
    • Shared Responsibility Model
    • DataCrunch API
  • Python SDK
  • Get Free Compute Credits
Powered by GitBook
On this page
  • Serverless Containers pricing
  • Features
  • Coming soon

Was this helpful?

  1. Containers

Overview

With our Containers service, you can create your own inference endpoints to serve your models while paying only for the compute that is in active use.

We support loading containers from any registry and are quite flexible about how the container is built.

You can deploy your first container by following the guide: Quick: Deploy with vLLM

Serverless Containers pricing

Price is calculated in 10-minute intervals for the currently running replicas of your container. The number of currently running replicas will depend on your Scaling and health-checks settings.

See here for pricing.

Features

  • Scale to hundreds of GPUs when needed with our battle-tested inference cluster

  • Scale to zero when idle, so you only pay while your container is running

  • Support for any container registry, using either registry-specific authentication methods or a vanilla Docker config.json-style auth

  • Both manual and request queue-based autoscaling, with adjustable scaling sensitivity

  • Logging and metrics in the dashboard

  • RESTful API for managing your deployments

  • Python SDK

  • Support for async / polling requests

Coming soon

  • Shared storage between the Containers and Cloud GPU instances

  • Batch jobs

Last updated 1 month ago

Was this helpful?