Instant Clusters

You can now deploy high-performance GPU training clusters with Infiniband interconnect from your DataCrunch cloud dashboard, the same way you would deploy a single GPU instance.

Available contract lengths are: Pay As You Go, 1 day, 1 week, 2 weeks, 4 weeks. By default, long-term contracts convert to Pay As You Go after the initial contract duration runs out, making it easy to use the cluster for as long as necessary.

Instant clusters are available with either Nvidia B200 SXM6 GPUs or Nvidia H200 SXM5 GPUs, a 3.2 Tb/s Infiniband interconnect per node (eight 400 Gb/s links), and a 100 Gbit/s Ethernet network. The uplink to the Internet is symmetric 2 Gb/s.

Our instant clusters range from 16 to 128 GPUs. Each cluster has up to 16 worker nodes, with 8 GPUs per worker node, and one jump host. Each worker node has local NVMe storage and access to a configurable shared filesystem with up to 50TB of storage.

Clusters have Slurm pre-installed for easy job management and Grafana dashboard for monitoring and alerts. The Nvidia B200 instant clusters are currently available in FIN-03 location and H200 instant clusters in ICE-01 location.

View more:

Deploying an Instant cluster

Instant GPU Clustersdatacrunch.io

PreviousTroubleshooting SSH Connection Issues NextDeploying an Instant Cluster

Last updated 1 month ago

Was this helpful?