Whisper X
Overview
The DataCrunch Whisper Inference Service provides access to the Whisper v3 large model endpoint. The endpoint includes advanced options such as WhisperX with diarization, phoneme alignment for word-level timestamps, and subtitle generation in SRT format.
Transcribing Audio
To transcribe audio, submit a request with the audio file URL.
Translating Audio
For translation of the transcribed output to English:
Generating Subtitles
When creating subtitles it is best to set processing_type="align"
, to ensure word-level alignment. Omitting the alignment will result in longer subtitle chunks, potentially leading to worse user experience. Setting output="subtitles"
ensures that the output is in SRT format.
Performing Speaker Diarization
For speaker diarization (assigning speaker labels to text segments), set processing_type
to diarize
:
API Specification
API Parameters
audio_input (
str
, required): URL of the audio file. This is a required parameter.translate (
bool
, optional): If enabled, provides the English translation of the output. Defaults tofalse
.language (
str
, optional): Optional two-letter language code to specify the input language for accurate language detection.processing_type (
str
, optional): Defines the processing action. Supported types:diarize
,align
.output (
str
), optional): Determines the output format. Options:subtitles
(in SRT format),raw
(time-stamped text). Default israw
.
Copyright notice: WhisperX includes software developed by Max Bain.
Last updated