Links

Language Models

Overview

DataCrunch's Language Model (LLM) inference services, compatible with the TGI schema, include both streaming and non-streaming endpoints. These services require specific parameters for operation:
  • model: A mandatory parameter specifying the language model to use.
  • inputs: The required input text or prompt for the model.
  • parameters: An object containing optional settings to fine-tune the model's response.

Available Models

Select from the following models using the model parameter:
  • llama-2-13b-chat
  • llama-2-70b-chat
  • mixtral-8x7b

Examples of API Usage

Non-streaming Endpoint

cURL
Python
JavaScript
curl -X POST https://inference.datacrunch.io/v1/completions/generate \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <your_api_key>" \
-d \
'{
"model": "llama-2-13b-chat",
"inputs": "My name is Olivier and I",
"parameters": {
"best_of": 1,
"decoder_input_details": true,
"details": true,
"do_sample": false,
"max_new_tokens": 20,
"repetition_penalty": 1.03,
"return_full_text": false,
"seed": null,
"stop": [
"photographer"
],
"temperature": 0.5,
"top_k": 10,
"top_p": 0.95,
"truncate": null,
"typical_p": 0.95,
"watermark": true
}
}'
import requests
url = "https://inference.datacrunch.io/v1/completions/generate"
headers = {
"Content-Type": "application/json",
"Authorization": "Bearer <your_api_key>"
}
data = {
"model": "llama-2-13b-chat",
"inputs": "My name is Olivier and I",
"parameters": {
"best_of": 1,
"decoder_input_details": True,
"details": True,
"do_sample": False,
"max_new_tokens": 20,
"repetition_penalty": 1.03,
"return_full_text": False,
"seed": None,
"stop": ["photographer"],
"temperature": 0.5,
"top_k": 10,
"top_p": 0.95,
"truncate": None,
"typical_p": 0.95,
"watermark": True
}
}
response = requests.post(url, headers=headers, json=data)
print(response.json())
const axios = require('axios');
const url = 'https://inference.datacrunch.io/v1/completions/generate';
const headers = {
'Content-Type': 'application/json',
'Authorization': 'Bearer <your_api_key>'
};
const data = {
model: 'llama-2-13b-chat',
inputs: 'My name is Olivier and I',
parameters: {
best_of: 1,
decoder_input_details: true,
details: true,
do_sample: false,
max_new_tokens: 20,
repetition_penalty: 1.03,
return_full_text: false,
seed: null,
stop: ['photographer'],
temperature: 0.5,
top_k: 10,
top_p: 0.95,
truncate: null,
typical_p: 0.95,
watermark: true
}
};
axios.post(url, data, { headers: headers })
.then((response) => {
console.log(response.data);
})
.catch((error) => {
console.error('Error:', error);
});

Streaming Endpoint

Note: the decoder_input_details parameter must be set to false for the streaming endpoint.
cURL
Python
JavaScript
curl -N -X POST https://inference.datacrunch.io/v1/completions/generate_stream \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <your_api_key>" \
-d \
'{
"model": "llama-2-13b-chat",
"inputs": "My name is Olivier and I",
"parameters": {
"best_of": 1,
"decoder_input_details": false,
"details": true,
"do_sample": false,
"max_new_tokens": 20,
"repetition_penalty": 1.03,
"return_full_text": false,
"seed": null,
"stop": [
"photographer"
],
"temperature": 0.5,
"top_k": 10,
"top_p": 0.95,
"truncate": null,
"typical_p": 0.95,
"watermark": true
}
}'
import requests
url = "https://inference.datacrunch.io/v1/completions/generate_stream"
headers = {
"Content-Type": "application/json",
"Authorization": "Bearer <your_api_key>"
}
data = {
"model": "llama-2-13b-chat",
"inputs": "My name is Olivier and I",
"parameters": {
"best_of": 1,
"decoder_input_details": False,
"details": True,
"do_sample": False,
"max_new_tokens": 20,
"repetition_penalty": 1.03,
"return_full_text": False,
"seed": None,
"stop": ["photographer"],
"temperature": 0.5,
"top_k": 10,
"top_p": 0.95,
"truncate": None,
"typical_p": 0.95,
"watermark": True
}
}
response = requests.post(url, headers=headers, json=data, stream=True)
for line in response.iter_lines():
if line:
print(line.decode('utf-8'))
const axios = require('axios');
const url = 'https://inference.datacrunch.io/v1/completions/generate_stream';
const headers = {
'Content-Type': 'application/json',
'Authorization': 'Bearer <your_api_key>'
};
const data = {
model: 'llama-2-13b-chat',
inputs: 'My name is Olivier and I',
parameters: {
best_of: 1,
decoder_input_details: false,
details: true,
do_sample: false,
max_new_tokens: 20,
repetition_penalty: 1.03,
return_full_text: false,
seed: null,
stop: ['photographer'],
temperature: 0.5,
top_k: 10,
top_p: 0.95,
truncate: null,
typical_p: 0.95,
watermark: true
}
};
axios.post(url, data, { headers: headers, responseType: 'stream' })
.then((response) => {
response.data.on('data', (chunk) => {
console.log(chunk.toString());
});
})
.catch((error) => {
console.error('Error:', error);
});

API Specification

post
https://inference.datacrunch.io/v1/completions
/generate
Generate tokens
post
https://inference.datacrunch.io/v1/completions
/generate_stream
Generate a stream of token using Server-Sent Events

API Parameters

List of optional parameters for TGI-based endpoints:
  • do_sample (bool, optional): Activate logits sampling. Defaults to False.
  • max_new_tokens (int, optional): Maximum number of generated tokens. Defaults to 20.
  • repetition_penalty (float, optional): The parameter for repetition penalty. A value of 1.0 means no penalty. See this paper for more details. Defaults to None.
  • return_full_text (bool, optional): Whether to prepend the prompt to the generated text. Defaults to False.
  • stop (List[str], optional): Stop generating tokens if a member of stop_sequences is generated. Defaults to an empty list.
  • seed (int, optional): Random sampling seed. Defaults to None.
  • temperature (float, optional): The value used to modulate the logits distribution. Defaults to None.
  • top_k (int, optional): The number of highest probability vocabulary tokens to keep for top-k-filtering. Defaults to None.
  • top_p (float, optional): If set to a value less than 1, only the smallest set of most probable tokens with probabilities that add up to top_p or higher are kept for generation. Defaults to None.
  • truncate (int, optional): Truncate input tokens to the given size. Defaults to None.
  • typical_p (float, optional): Typical Decoding mass. See Typical Decoding for Natural Language Generation for more information. Defaults to None.
  • best_of (int, optional): Generate best_of sequences and return the one with the highest token logprobs. Defaults to None.
  • watermark (bool, optional): Watermarking with A Watermark for Large Language Models. Defaults to False.
  • details (bool, optional): Get generation details. Defaults to False.
  • decoder_input_details (bool, optional): Get decoder input token logprobs and ids. Defaults to False.
Last modified 1mo ago