Whisper X (deprecated)

This model has been deprecated, and will cease to function. Please use the Whisper model instead.

Overview

The DataCrunch Whisper Inference Service provides access to the Whisper v3 large model endpoint. The endpoint includes advanced options such as WhisperX with diarization, phoneme alignment for word-level timestamps, and subtitle generation in SRT format.

Transcribing Audio

To transcribe audio, submit a request with the audio file URL.

curl -X POST https://fin-02.inference.datacrunch.io/v1/raw/whisperx/predict \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <your_api_key>" \
  -d \
'{
    "audio_input": "<AUDIO_FILE_URL>"
}'

Translating Audio

For translation of the transcribed output to English:

curl -X POST https://fin-02.inference.datacrunch.io/v1/raw/whisperx/predict \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <your_api_key>" \
  -d \
'{
    "audio_input": "<AUDIO_FILE_URL>",
    "translate": true
}'

Generating Subtitles

When creating subtitles it is best to set processing_type="align", to ensure word-level alignment. Omitting the alignment will result in longer subtitle chunks, potentially leading to worse user experience. Setting output="subtitles" ensures that the output is in SRT format.

curl -X POST https://fin-02.inference.datacrunch.io/v1/raw/whisperx/predict \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <your_api_key>" \
  -d \
'{
    "audio_input": "<AUDIO_FILE_URL>",
    "translate": true,
    "processing_type": "align",
    "output": "subtitles"
}'

Performing Speaker Diarization

For speaker diarization (assigning speaker labels to text segments), set processing_type to diarize:

curl -X https://fin-02.inference.datacrunch.io/v1/raw/whisperx/predict \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <your_api_key>" \
  -d \
'{
    "audio_input": "<AUDIO_FILE_URL>",
    "translate": true,
    "processing_type": "diarize"
}'

API Specification

Transcribe, Translate, or Diarize Audio

post
Authorizations
Body
audio_inputstringRequired

URL of the audio file for processing

Example: https://example.com/audiofile.mp3
translatebooleanOptional

Flag to translate the audio content

Default: falseExample: true
languagestringOptionalExample: en
processing_typestringOptionalExample: diarize
outputstringOptionalDefault: rawExample: raw
Responses
200
Successful audio processing response
application/json
Responseany of
or
post
POST /v1/raw/whisperx/predict HTTP/1.1
Host: fin-02.inference.datacrunch.io
Authorization: Bearer YOUR_SECRET_TOKEN
Content-Type: application/json
Accept: */*
Content-Length: 127

{
  "audio_input": "https://example.com/audiofile.mp3",
  "translate": true,
  "language": "en",
  "processing_type": "diarize",
  "output": "raw"
}
{
  "usage": {
    "input_audio_length": 0,
    "elapsed_time": 0
  },
  "segments": [
    {
      "start": 1,
      "end": 1,
      "text": "text",
      "words": [
        {
          "word": "text",
          "start": 1,
          "end": 1,
          "score": 1,
          "speaker": "text"
        }
      ]
    }
  ],
  "subtitles": "text"
}

API Parameters

  • audio_input (str, required): URL of the audio file. This is a required parameter.

  • translate (bool, optional): If enabled, provides the English translation of the output. Defaults to false.

  • language (str, optional): Optional two-letter language code to specify the input language for accurate language detection.

  • processing_type (str, optional): Defines the processing action. Supported types: diarize, align.

  • output (str), optional): Determines the output format. Options: subtitles (in SRT format), raw (time-stamped text). Default is raw.

Copyright notice: WhisperX includes software developed by Max Bain.

Last updated

Was this helpful?