Chat Completions API

Endpoint

POST https://api.omniakey.com/v1/chat/completions

This endpoint is fully compatible with the OpenAI Chat Completions API. Use any OpenAI-compatible SDK or make direct HTTP requests.

Request Examples

from openai import OpenAI

client = OpenAI(
    api_key="your-omniakey-api-key",
    base_url="https://api.omniakey.com/v1"
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is the capital of France?"}
    ],
    temperature=0.7,
    max_tokens=256
)

print(response.choices[0].message.content)

Response

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1709251200,
  "model": "gpt-4o",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The capital of France is Paris."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 25,
    "completion_tokens": 8,
    "total_tokens": 33
  }
}

Streaming

Enable streaming to receive partial responses in real-time via Server-Sent Events (SSE):

stream = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Tell me a story."}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content is not None:
        print(chunk.choices[0].delta.content, end="")

Streaming Response Format

Each SSE event contains a JSON chunk:

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"The"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":" capital"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: [DONE]

Request Parameters

model

string

required

The model ID to use. Examples: gpt-4o, claude-3-5-sonnet, gemini-2.0-flash, deepseek-v3. See Supported Models for the full list.

messages

array

required

A list of messages in the conversation. Each message has a role (system, user, or assistant) and content (string).

temperature

number

Sampling temperature between 0 and 2. Higher values (e.g. 0.8) make output more random, lower values (e.g. 0.2) make it more deterministic. Default: 1.

max_tokens

integer

Maximum number of tokens to generate in the response.

stream

boolean

If true, partial responses are sent as Server-Sent Events. Default: false.

top_p

number

Nucleus sampling parameter. Only tokens with cumulative probability up to top_p are considered. Default: 1.

frequency_penalty

number

Penalizes tokens based on their frequency in the text so far. Range: -2.0 to 2.0. Default: 0.

presence_penalty

number

Penalizes tokens based on whether they have appeared in the text so far. Range: -2.0 to 2.0. Default: 0.

stop

string | array

Up to 4 sequences where the API will stop generating further tokens.

integer

Number of chat completion choices to generate for each input message. Default: 1.

Supported Models

Provider	Models
OpenAI	`gpt-4o`, `gpt-4o-mini`, `o1`, `o1-mini`, `gpt-4-turbo`
Anthropic	`claude-4-sonnet`, `claude-3-5-sonnet`, `claude-3-5-haiku`, `claude-3-opus`
Google	`gemini-2.0-flash`, `gemini-2.0-pro`, `gemini-1.5-pro`
DeepSeek	`deepseek-v3`, `deepseek-r1`
Meta	`llama-3.3-70b`, `llama-3.1-405b`, `llama-3.1-70b`
Mistral	`mistral-large`, `mixtral-8x22b`, `mistral-small`

View All Models

See the complete list

Getting Started

API Reference

Guides

Chat Completions API

Endpoint

Request Examples

Response

Streaming

Streaming Response Format

Request Parameters

Supported Models

View All Models

Getting Started

API Reference

Guides

​Endpoint

​Request Examples

​Response

​Streaming

​Streaming Response Format

​Request Parameters

​Supported Models

View All Models

Endpoint

Request Examples

Response

Streaming

Streaming Response Format

Request Parameters

Supported Models