Docs
DataCrunch HomeSDKAPILogin / Signup
  • Welcome to DataCrunch
    • Overview
    • Locations and Sustainability
    • Support
  • GPU Instances
    • Set up a GPU instance
    • Securing Your Instance
    • Shutdown, Hibernate, and Delete
    • Adding a New User
    • Block Volumes
    • Shared Filesystems (SFS)
    • Managing SSH Keys
    • Connecting to Your DataCrunch.io Server
    • Connecting to Jupyter notebook with VS Code
    • Team Projects
    • Pricing and Billing
  • Clusters
    • Instant Clusters
      • Deploying a GPU cluster
      • Slurm
      • Spack
      • Good to know
    • Customized GPU clusters
  • Containers
    • Overview
    • Container Registries
    • Scaling and health-checks
    • Batching and Streaming
    • Async Inference
    • Tutorials
      • Quick: Deploy with vLLM
      • In-Depth: Deploy with TGI
      • In-Depth: Deploy with SGLang
      • In-Depth: Deploy with vLLM
      • In-Depth: Deploy with Replicate Cog
      • In-Depth: Asynchronous Inference Requests with Whisper
  • Inference
    • Overview
    • Authorization
    • Audio Models
      • Whisper X
  • Pricing and Billing
  • Resources
    • Resources Overview
    • DataCrunch API
  • Python SDK
  • Get Free Compute Credits
Powered by GitBook
On this page

Was this helpful?

  1. Inference
  2. Audio Models

Whisper X

Overview

The DataCrunch Whisper Inference Service provides access to the Whisper v3 large model endpoint. The endpoint includes advanced options such as WhisperX with diarization, phoneme alignment for word-level timestamps, and subtitle generation in SRT format.

Transcribing Audio

To transcribe audio, submit a request with the audio file URL.

curl -X POST https://fin-02.inference.datacrunch.io/v1/raw/whisperx/predict \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <your_api_key>" \
  -d \
'{
    "audio_input": "<AUDIO_FILE_URL>"
}'
import requests

url = "https://fin-02.inference.datacrunch.io/v1/raw/whisperx/predict"
headers = {
    "Content-Type": "application/json",
    "Authorization": "Bearer <your_api_key>"
}
data = {
    "audio_input": "<AUDIO_FILE_URL>"
}

response = requests.post(url, headers=headers, json=data)
print(response.json())
const axios = require('axios');

const url = 'https://fin-02.inference.datacrunch.io/v1/raw/whisperx/predict';
const headers = {
  'Content-Type': 'application/json',
  'Authorization': 'Bearer <your_api_key>'
};
const data = {
  audio_input: '<AUDIO_FILE_URL>'
};

axios.post(url, data, { headers: headers })
  .then((response) => {
    console.log(response.data);
  })
  .catch((error) => {
    console.error('Error:', error);
  });

Translating Audio

For translation of the transcribed output to English:

curl -X POST https://fin-02.inference.datacrunch.io/v1/raw/whisperx/predict \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <your_api_key>" \
  -d \
'{
    "audio_input": "<AUDIO_FILE_URL>",
    "translate": true
}'

Generating Subtitles

When creating subtitles it is best to set processing_type="align", to ensure word-level alignment. Omitting the alignment will result in longer subtitle chunks, potentially leading to worse user experience. Setting output="subtitles" ensures that the output is in SRT format.

curl -X POST https://fin-02.inference.datacrunch.io/v1/raw/whisperx/predict \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <your_api_key>" \
  -d \
'{
    "audio_input": "<AUDIO_FILE_URL>",
    "translate": true,
    "processing_type": "align",
    "output": "subtitles"
}'

Performing Speaker Diarization

For speaker diarization (assigning speaker labels to text segments), set processing_type to diarize:

curl -X https://fin-02.inference.datacrunch.io/v1/raw/whisperx/predict \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <your_api_key>" \
  -d \
'{
    "audio_input": "<AUDIO_FILE_URL>",
    "translate": true,
    "processing_type": "diarize"
}'

API Specification

API Parameters

  • audio_input (str, required): URL of the audio file. This is a required parameter.

  • translate (bool, optional): If enabled, provides the English translation of the output. Defaults to false.

  • language (str, optional): Optional two-letter language code to specify the input language for accurate language detection.

  • processing_type (str, optional): Defines the processing action. Supported types: diarize, align.

  • output (str), optional): Determines the output format. Options: subtitles (in SRT format), raw (time-stamped text). Default is raw.

Copyright notice: WhisperX includes software developed by Max Bain.

Last updated 6 months ago

Was this helpful?

  • Overview
  • Transcribing Audio
  • Translating Audio
  • Generating Subtitles
  • Performing Speaker Diarization
  • API Specification
  • POSTTranscribe, Translate, or Diarize Audio
  • API Parameters

Transcribe, Translate, or Diarize Audio

post
Authorizations
Body
audio_inputstringRequired

URL of the audio file for processing

Example: https://example.com/audiofile.mp3
translatebooleanOptional

Flag to translate the audio content

Default: falseExample: true
languagestringOptionalExample: en
processing_typestringOptionalExample: diarize
outputstringOptionalDefault: rawExample: raw
Responses
200
Successful audio processing response
application/json
Responseany of
or
422
Unprocessable Entity
application/json
500
Internal Server Error
application/json
post
POST /v1/raw/whisperx/predict HTTP/1.1
Host: fin-02.inference.datacrunch.io
Authorization: Bearer YOUR_SECRET_TOKEN
Content-Type: application/json
Accept: */*
Content-Length: 127

{
  "audio_input": "https://example.com/audiofile.mp3",
  "translate": true,
  "language": "en",
  "processing_type": "diarize",
  "output": "raw"
}
{
  "usage": {
    "input_audio_length": 0,
    "elapsed_time": 0
  },
  "segments": [
    {
      "start": 1,
      "end": 1,
      "text": "text",
      "words": [
        {
          "word": "text",
          "start": 1,
          "end": 1,
          "score": 1,
          "speaker": "text"
        }
      ]
    }
  ],
  "subtitles": "text"
}