Language Models
Overview
DataCrunch's Language Model (LLM) inference services, compatible with the TGI schema, include both streaming and non-streaming endpoints. These services require specific parameters for operation:
model
: A mandatory parameter specifying the language model to use.inputs
: The required input text or prompt for the model.parameters
: An object containing optional settings to fine-tune the model's response.
Available Models
Please contact us to set up a private LLM endpoint.
Examples of API Usage
Non-streaming Endpoint
Streaming Endpoint
Note: the decoder_input_details
parameter must be set to false
for the streaming endpoint.
API Specification
API Parameters
List of optional parameters
for TGI-based endpoints:
do_sample (
bool
, optional): Activate logits sampling. Defaults to False.max_new_tokens (
int
, optional): Maximum number of generated tokens. Defaults to 20.repetition_penalty (
float
, optional): The parameter for repetition penalty. A value of 1.0 means no penalty. See this paper for more details. Defaults to None.return_full_text (
bool
, optional): Whether to prepend the prompt to the generated text. Defaults to False.stop (
List[str]
, optional): Stop generating tokens if a member ofstop_sequences
is generated. Defaults to an empty list.seed (
int
, optional): Random sampling seed. Defaults to None.temperature (
float
, optional): The value used to modulate the logits distribution. Defaults to None.top_k (
int
, optional): The number of highest probability vocabulary tokens to keep for top-k-filtering. Defaults to None.top_p (
float
, optional): If set to a value less than 1, only the smallest set of most probable tokens with probabilities that add up totop_p
or higher are kept for generation. Defaults to None.truncate (
int
, optional): Truncate input tokens to the given size. Defaults to None.typical_p (
float
, optional): Typical Decoding mass. See Typical Decoding for Natural Language Generation for more information. Defaults to None.best_of (
int
, optional): Generatebest_of
sequences and return the one with the highest token logprobs. Defaults to None.watermark (
bool
, optional): Watermarking with A Watermark for Large Language Models. Defaults to False.details (
bool
, optional): Get generation details. Defaults to False.decoder_input_details (
bool
, optional): Get decoder input token logprobs and ids. Defaults to False.
Last updated