Whisper
Overview
The DataCrunch Whisper Inference Service provides access to the Whisper v3 large model endpoint. The endpoint includes advanced options diarization, phoneme alignment for word-level timestamps, and subtitle generation in SRT format.
Transcribing Audio
To transcribe audio, submit a request with the audio file URL.
curl -X POST https://inference.datacrunch.io/whisper/predict \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <your_api_key>" \
-d \
'{
"audio_input": "<AUDIO_FILE_URL>"
}'import requests
url = "https://inference.datacrunch.io/whisper/predict"
headers = {
"Content-Type": "application/json",
"Authorization": "Bearer <your_api_key>"
}
data = {
"audio_input": "<AUDIO_FILE_URL>"
}
response = requests.post(url, headers=headers, json=data)
print(response.json())const axios = require('axios');
const url = 'https://inference.datacrunch.io/whisper/predict';
const headers = {
'Content-Type': 'application/json',
'Authorization': 'Bearer <your_api_key>'
};
const data = {
audio_input: '<AUDIO_FILE_URL>'
};
axios.post(url, data, { headers: headers })
.then((response) => {
console.log(response.data);
})
.catch((error) => {
console.error('Error:', error);
});Translating Audio
For translation of the transcribed output to English:
curl -X POST https://inference.datacrunch.io/whisper/predict \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <your_api_key>" \
-d \
'{
"audio_input": "<AUDIO_FILE_URL>",
"translate": true
}'Generating Subtitles
When creating subtitles it is best to set processing_type="align", to ensure word-level alignment. Omitting the alignment will result in longer subtitle chunks, potentially leading to worse user experience. Setting output="subtitles" ensures that the output is in SRT format.
curl -X POST https://inference.datacrunch.io/whisper/predict \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <your_api_key>" \
-d \
'{
"audio_input": "<AUDIO_FILE_URL>",
"translate": true,
"processing_type": "align",
"output": "subtitles"
}'Performing Speaker Diarization
For speaker diarization (assigning speaker labels to text segments), set processing_type to diarize:
curl -X https://inference.datacrunch.io/whisper/predict \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <your_api_key>" \
-d \
'{
"audio_input": "<AUDIO_FILE_URL>",
"translate": true,
"processing_type": "diarize"
}'API Parameters
audio_input (
str, required): URL of the audio file. This is a required parameter.translate (
bool, optional): If enabled, provides the English translation of the output. Defaults tofalse.language (
str, optional): Optional two-letter language code to specify the input language for accurate language detection.processing_type (
str, optional): Defines the processing action. Supported types:diarize,align.output (
str), optional): Determines the output format. Options:subtitles(in SRT format),raw(time-stamped text). Default israw.
Copyright notice: WhisperX includes software developed by Max Bain.
Last updated
Was this helpful?