Language Models

Overview

DataCrunch's Language Model (LLM) inference services, compatible with the TGI schema, include both streaming and non-streaming endpoints. These services require specific parameters for operation:

model: A mandatory parameter specifying the language model to use.
inputs: The required input text or prompt for the model.
parameters: An object containing optional settings to fine-tune the model's response.

Available Models

Select from the following models using the model parameter:

llama-2-13b-chat
llama-2-70b-chat
mixtral-8x7b

Examples of API Usage

Non-streaming Endpoint

curl -X POST https://inference.datacrunch.io/v1/completions/generate \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <your_api_key>" \
  -d \
'{
    "model": "llama-2-13b-chat",
    "inputs": "My name is Olivier and I",
    "parameters": {
      "best_of": 1,
      "decoder_input_details": true,
      "details": true,
      "do_sample": false,
      "max_new_tokens": 20,
      "repetition_penalty": 1.03,
      "return_full_text": false,
      "seed": null,
      "stop": [
        "photographer"
      ],
      "temperature": 0.5,
      "top_k": 10,
      "top_p": 0.95,
      "truncate": null,
      "typical_p": 0.95,
      "watermark": true
   }
 }'

import requests

url = "https://inference.datacrunch.io/v1/completions/generate"
headers = {
    "Content-Type": "application/json",
    "Authorization": "Bearer <your_api_key>"
}
data = {
    "model": "llama-2-13b-chat",
    "inputs": "My name is Olivier and I",
    "parameters": {
        "best_of": 1,
        "decoder_input_details": True,
        "details": True,
        "do_sample": False,
        "max_new_tokens": 20,
        "repetition_penalty": 1.03,
        "return_full_text": False,
        "seed": None,
        "stop": ["photographer"],
        "temperature": 0.5,
        "top_k": 10,
        "top_p": 0.95,
        "truncate": None,
        "typical_p": 0.95,
        "watermark": True
    }
}

response = requests.post(url, headers=headers, json=data)
print(response.json())

const axios = require('axios');

const url = 'https://inference.datacrunch.io/v1/completions/generate';
const headers = {
  'Content-Type': 'application/json',
  'Authorization': 'Bearer <your_api_key>'
};
const data = {
  model: 'llama-2-13b-chat',
  inputs: 'My name is Olivier and I',
  parameters: {
    best_of: 1,
    decoder_input_details: true,
    details: true,
    do_sample: false,
    max_new_tokens: 20,
    repetition_penalty: 1.03,
    return_full_text: false,
    seed: null,
    stop: ['photographer'],
    temperature: 0.5,
    top_k: 10,
    top_p: 0.95,
    truncate: null,
    typical_p: 0.95,
    watermark: true
  }
};

axios.post(url, data, { headers: headers })
  .then((response) => {
    console.log(response.data);
  })
  .catch((error) => {
    console.error('Error:', error);
  });

Streaming Endpoint

Note: the decoder_input_details parameter must be set to false for the streaming endpoint.

curl -N -X POST https://inference.datacrunch.io/v1/completions/generate_stream \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <your_api_key>" \
  -d \
'{
    "model": "llama-2-13b-chat",
    "inputs": "My name is Olivier and I",
    "parameters": {
      "best_of": 1,
      "decoder_input_details": false,
      "details": true,
      "do_sample": false,
      "max_new_tokens": 20,
      "repetition_penalty": 1.03,
      "return_full_text": false,
      "seed": null,
      "stop": [
        "photographer"
      ],
      "temperature": 0.5,
      "top_k": 10,
      "top_p": 0.95,
      "truncate": null,
      "typical_p": 0.95,
      "watermark": true
   }
 }'

import requests

url = "https://inference.datacrunch.io/v1/completions/generate_stream"
headers = {
    "Content-Type": "application/json",
    "Authorization": "Bearer <your_api_key>"
}
data = {
    "model": "llama-2-13b-chat",
    "inputs": "My name is Olivier and I",
    "parameters": {
        "best_of": 1,
        "decoder_input_details": False,
        "details": True,
        "do_sample": False,
        "max_new_tokens": 20,
        "repetition_penalty": 1.03,
        "return_full_text": False,
        "seed": None,
        "stop": ["photographer"],
        "temperature": 0.5,
        "top_k": 10,
        "top_p": 0.95,
        "truncate": None,
        "typical_p": 0.95,
        "watermark": True
    }
}

response = requests.post(url, headers=headers, json=data, stream=True)

for line in response.iter_lines():
    if line:
        print(line.decode('utf-8'))

const axios = require('axios');

const url = 'https://inference.datacrunch.io/v1/completions/generate_stream';
const headers = {
  'Content-Type': 'application/json',
  'Authorization': 'Bearer <your_api_key>'
};
const data = {
  model: 'llama-2-13b-chat',
  inputs: 'My name is Olivier and I',
  parameters: {
    best_of: 1,
    decoder_input_details: false,
    details: true,
    do_sample: false,
    max_new_tokens: 20,
    repetition_penalty: 1.03,
    return_full_text: false,
    seed: null,
    stop: ['photographer'],
    temperature: 0.5,
    top_k: 10,
    top_p: 0.95,
    truncate: null,
    typical_p: 0.95,
    watermark: true
  }
};

axios.post(url, data, { headers: headers, responseType: 'stream' })
  .then((response) => {
    response.data.on('data', (chunk) => {
      console.log(chunk.toString());
    });
  })
  .catch((error) => {
    console.error('Error:', error);
  });

API Specification

Generate tokens

POSThttps://inference.datacrunch.io/v1/completions/generate

Body

inputs*string

parametersGenerateParameters (object)

Response

Generated Text

Body

detailsnullable all of

generated_text*string

Request

const response = await fetch('https://inference.datacrunch.io/v1/completions/generate', {
    method: 'POST',
    headers: {
      "Content-Type": "application/json"
    },
    body: JSON.stringify({
      "inputs": "My name is Olivier and I"
    }),
});
const data = await response.json();

Response

{
  "details": {
    "best_of_sequences": [
      {
        "finish_reason": "length",
        "generated_text": "test",
        "generated_tokens": 1,
        "prefill": [
          {
            "logprob": -0.34,
            "text": "test"
          }
        ],
        "seed": 42,
        "tokens": [
          {
            "logprob": -0.34,
            "special": "false",
            "text": "test"
          }
        ],
        "top_tokens": [
          [
            {
              "logprob": -0.34,
              "special": "false",
              "text": "test"
            }
          ]
        ]
      }
    ],
    "finish_reason": "length",
    "generated_tokens": 1,
    "prefill": [
      {
        "logprob": -0.34,
        "text": "test"
      }
    ],
    "seed": 42,
    "tokens": [
      {
        "logprob": -0.34,
        "special": "false",
        "text": "test"
      }
    ],
    "top_tokens": [
      [
        {
          "logprob": -0.34,
          "special": "false",
          "text": "test"
        }
      ]
    ]
  },
  "generated_text": "test"
}

Generate a stream of token using Server-Sent Events

POSThttps://inference.datacrunch.io/v1/completions/generate_stream

Body

inputs*string

parametersGenerateParameters (object)

Response

Generated Text

Body

detailsnullable all of

generated_textnullable string

token*Token (object)

top_tokensarray of Token (object)

Request

const response = await fetch('https://inference.datacrunch.io/v1/completions/generate_stream', {
    method: 'POST',
    headers: {
      "Content-Type": "application/json"
    },
    body: JSON.stringify({
      "inputs": "My name is Olivier and I"
    }),
});
const data = await response.json();

Response

{
  "details": {
    "finish_reason": "length",
    "generated_tokens": 1,
    "seed": 42
  },
  "generated_text": "test",
  "token": {
    "logprob": -0.34,
    "special": "false",
    "text": "test"
  },
  "top_tokens": [
    {
      "logprob": -0.34,
      "special": "false",
      "text": "test"
    }
  ]
}

API Parameters

List of optional parameters for TGI-based endpoints:

do_sample (bool, optional): Activate logits sampling. Defaults to False.
max_new_tokens (int, optional): Maximum number of generated tokens. Defaults to 20.
repetition_penalty (float, optional): The parameter for repetition penalty. A value of 1.0 means no penalty. See this paper for more details. Defaults to None.
return_full_text (bool, optional): Whether to prepend the prompt to the generated text. Defaults to False.
stop (List[str], optional): Stop generating tokens if a member of stop_sequences is generated. Defaults to an empty list.
seed (int, optional): Random sampling seed. Defaults to None.
temperature (float, optional): The value used to modulate the logits distribution. Defaults to None.
top_k (int, optional): The number of highest probability vocabulary tokens to keep for top-k-filtering. Defaults to None.
top_p (float, optional): If set to a value less than 1, only the smallest set of most probable tokens with probabilities that add up to top_p or higher are kept for generation. Defaults to None.
truncate (int, optional): Truncate input tokens to the given size. Defaults to None.
typical_p (float, optional): Typical Decoding mass. See Typical Decoding for Natural Language Generation for more information. Defaults to None.
best_of (int, optional): Generate best_of sequences and return the one with the highest token logprobs. Defaults to None.
watermark (bool, optional): Watermarking with A Watermark for Large Language Models. Defaults to False.
details (bool, optional): Get generation details. Defaults to False.
decoder_input_details (bool, optional): Get decoder input token logprobs and ids. Defaults to False.

Last updated 3 months ago