Endpoint
Request Examples
Response
Streaming
Enable streaming to receive partial responses in real-time via Server-Sent Events (SSE):Streaming Response Format
Each SSE event contains a JSON chunk:Request Parameters
The model ID to use. Examples:
gpt-4o, claude-3-5-sonnet, gemini-2.0-flash, deepseek-v3. See Supported Models for the full list.A list of messages in the conversation. Each message has a
role (system, user, or assistant) and content (string).Sampling temperature between 0 and 2. Higher values (e.g. 0.8) make output more random, lower values (e.g. 0.2) make it more deterministic. Default:
1.Maximum number of tokens to generate in the response.
If
true, partial responses are sent as Server-Sent Events. Default: false.Nucleus sampling parameter. Only tokens with cumulative probability up to
top_p are considered. Default: 1.Penalizes tokens based on their frequency in the text so far. Range:
-2.0 to 2.0. Default: 0.Penalizes tokens based on whether they have appeared in the text so far. Range:
-2.0 to 2.0. Default: 0.Up to 4 sequences where the API will stop generating further tokens.
Number of chat completion choices to generate for each input message. Default:
1.Supported Models
| Provider | Models |
|---|---|
| OpenAI | gpt-4o, gpt-4o-mini, o1, o1-mini, gpt-4-turbo |
| Anthropic | claude-4-sonnet, claude-3-5-sonnet, claude-3-5-haiku, claude-3-opus |
gemini-2.0-flash, gemini-2.0-pro, gemini-1.5-pro | |
| DeepSeek | deepseek-v3, deepseek-r1 |
| Meta | llama-3.3-70b, llama-3.1-405b, llama-3.1-70b |
| Mistral | mistral-large, mixtral-8x22b, mistral-small |
View All Models
See the complete list