Skip to main content

Endpoint

POST https://api.omniakey.com/v1/chat/completions
This endpoint is fully compatible with the OpenAI Chat Completions API. Use any OpenAI-compatible SDK or make direct HTTP requests.

Request Examples

from openai import OpenAI

client = OpenAI(
    api_key="your-omniakey-api-key",
    base_url="https://api.omniakey.com/v1"
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is the capital of France?"}
    ],
    temperature=0.7,
    max_tokens=256
)

print(response.choices[0].message.content)

Response

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1709251200,
  "model": "gpt-4o",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The capital of France is Paris."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 25,
    "completion_tokens": 8,
    "total_tokens": 33
  }
}

Streaming

Enable streaming to receive partial responses in real-time via Server-Sent Events (SSE):
stream = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Tell me a story."}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content is not None:
        print(chunk.choices[0].delta.content, end="")

Streaming Response Format

Each SSE event contains a JSON chunk:
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"The"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":" capital"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: [DONE]

Request Parameters

model
string
required
The model ID to use. Examples: gpt-4o, claude-3-5-sonnet, gemini-2.0-flash, deepseek-v3. See Supported Models for the full list.
messages
array
required
A list of messages in the conversation. Each message has a role (system, user, or assistant) and content (string).
temperature
number
Sampling temperature between 0 and 2. Higher values (e.g. 0.8) make output more random, lower values (e.g. 0.2) make it more deterministic. Default: 1.
max_tokens
integer
Maximum number of tokens to generate in the response.
stream
boolean
If true, partial responses are sent as Server-Sent Events. Default: false.
top_p
number
Nucleus sampling parameter. Only tokens with cumulative probability up to top_p are considered. Default: 1.
frequency_penalty
number
Penalizes tokens based on their frequency in the text so far. Range: -2.0 to 2.0. Default: 0.
presence_penalty
number
Penalizes tokens based on whether they have appeared in the text so far. Range: -2.0 to 2.0. Default: 0.
stop
string | array
Up to 4 sequences where the API will stop generating further tokens.
n
integer
Number of chat completion choices to generate for each input message. Default: 1.

Supported Models

ProviderModels
OpenAIgpt-4o, gpt-4o-mini, o1, o1-mini, gpt-4-turbo
Anthropicclaude-4-sonnet, claude-3-5-sonnet, claude-3-5-haiku, claude-3-opus
Googlegemini-2.0-flash, gemini-2.0-pro, gemini-1.5-pro
DeepSeekdeepseek-v3, deepseek-r1
Metallama-3.3-70b, llama-3.1-405b, llama-3.1-70b
Mistralmistral-large, mixtral-8x22b, mistral-small

View All Models

See the complete list