Whisper X
Overview
The DataCrunch Whisper Inference Service provides access to the Whisper v3 large model endpoint. The endpoint includes advanced options such as WhisperX with diarization, phoneme alignment for word-level timestamps, and subtitle generation in SRT format.
Transcribing Audio
To transcribe audio, submit a request with the audio file URL.
curl -X POST https://fin-02.inference.datacrunch.io/v1/raw/whisperx/predict \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <your_api_key>" \
-d \
'{
"audio_input": "<AUDIO_FILE_URL>"
}'
Translating Audio
For translation of the transcribed output to English:
curl -X POST https://fin-02.inference.datacrunch.io/v1/raw/whisperx/predict \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <your_api_key>" \
-d \
'{
"audio_input": "<AUDIO_FILE_URL>",
"translate": true
}'
Generating Subtitles
When creating subtitles it is best to set processing_type="align"
, to ensure word-level alignment. Omitting the alignment will result in longer subtitle chunks, potentially leading to worse user experience. Setting output="subtitles"
ensures that the output is in SRT format.
curl -X POST https://fin-02.inference.datacrunch.io/v1/raw/whisperx/predict \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <your_api_key>" \
-d \
'{
"audio_input": "<AUDIO_FILE_URL>",
"translate": true,
"processing_type": "align",
"output": "subtitles"
}'
Performing Speaker Diarization
For speaker diarization (assigning speaker labels to text segments), set processing_type
to diarize
:
curl -X https://fin-02.inference.datacrunch.io/v1/raw/whisperx/predict \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <your_api_key>" \
-d \
'{
"audio_input": "<AUDIO_FILE_URL>",
"translate": true,
"processing_type": "diarize"
}'
API Specification
API Parameters
audio_input (
str
, required): URL of the audio file. This is a required parameter.translate (
bool
, optional): If enabled, provides the English translation of the output. Defaults tofalse
.language (
str
, optional): Optional two-letter language code to specify the input language for accurate language detection.processing_type (
str
, optional): Defines the processing action. Supported types:diarize
,align
.output (
str
), optional): Determines the output format. Options:subtitles
(in SRT format),raw
(time-stamped text). Default israw
.
Copyright notice: WhisperX includes software developed by Max Bain.
Last updated
Was this helpful?