API Support Matrix

Valar gives you four endpoints. Three are inference surfaces shaped after APIs you already know, and one batches work asynchronously:

API	Endpoint	Maturity
OpenAI Responses	`POST /v1/responses`	Stable
OpenAI Chat Completions	`POST /v1/chat/completions`	Stable
Anthropic Messages	`POST /v1/messages`	Beta
Batch	`POST /v1/batches`	Stable

The same models and completion windows work across all three inference surfaces. The Batch API layers on top: it wraps a large set of Responses API calls into one asynchronous job.

Behavior shared across every API

Before the per-API detail, a few rules hold no matter which surface you call:

Streaming is Chat Completions only. Pass stream: true to Chat Completions and you get Server-Sent Events (chat.completion.chunk); add stream_options.include_usage for a closing usage chunk. The Responses and Messages APIs reject stream: true. For long jobs, use background: true on the Responses API and poll or wait on webhooks.
Completion windows steer scheduling and price. Set metadata.completion_window to "asap" (the Now tier) or "standard". See Completion windows and Pricing.
Webhooks fire on completion. Set metadata.completion_webhook to receive a POST when processing finishes. See Webhooks.
Responses are always stored. store: false is unsupported.

Inference APIs

Each accordion below lists what the API accepts and what it rejects. Open the one that matches the SDK you’re using.

Responses API — POST /v1/responses

RecommendedOpenAI Responses formatOpenAI SDK compatibleAPI reference

This is the surface we recommend reaching for first.Supported

Feature	Details
Core parameters	`model`, `input` (string or message array), `max_output_tokens`, `temperature`, `top_p`, `user`, `prompt_cache_key`
Structured outputs	`text.format` with `type: "text"` or `type: "json_schema"`
Reasoning	`reasoning.effort` (`none` / `minimal` / `low` / `medium` / `high` / `xhigh`), `reasoning.generate_summary` (`auto` / `concise` / `detailed`)
Function tools	`tools` with `type: "function"` — client-side function calling with `name`, `description`, `parameters`, `strict`
Custom tools	`tools` with `type: "custom"`
Tool choice	`tool_choice`: `"none"`, `"auto"`, `"required"`, or a specific function/custom tool
Background mode	`background: true` returns `202` immediately; poll with `GET /v1/responses/{id}`
Prompt cache routing	`prompt_cache_key` is an optional routing hint for requests that share a large prompt prefix
Image input	`input_image` content blocks on multimodal models. Non-multimodal models accept text only.
Output logprobs	`include: ["message.output_text.logprobs"]` returns one logprob per output token (best effort; omitted for models served via a proxy that does not return logprobs).

Not yet supported

Feature	Notes
Streaming	`stream: true` is rejected. Every response comes back as one JSON object.
Instructions	`instructions` is unsupported. Put system messages straight into `input`.
Conversation chaining	`previous_response_id` and `conversation` are unsupported. Resend the full input on each call.
Prompt templates	The `prompt` parameter is unsupported.
Server-side tools	`web_search`, `file_search`, `code_interpreter`, `computer_use`, `mcp`, `image_generation`, `shell`, `apply_patch` are unsupported.
Multimodal input	Audio and file input blocks are unsupported. Image input works on multimodal models (see above).
Include	Accepted for compatibility when passed as an array of strings. A request is rejected if it includes `reasoning.encrypted_content`, `web_search_call.action.sources`, `code_interpreter_call.outputs`, `computer_call_output.output.image_url`, or `file_search_call.results`.
Truncation	`"disabled"` is the only accepted value; custom truncation strategies are unsupported.
Parallel tool calls	`parallel_tool_calls` is unsupported.
json_object format	`text.format.type: "json_object"` is unsupported. Reach for `"json_schema"` instead.
Service tier	`"auto"` is the only accepted value. Use `metadata.completion_window` to govern response timing instead.
Delete / cancel	`DELETE /v1/responses/{id}` and the cancel endpoints are not implemented.

Chat Completions API — POST /v1/chat/completions

OpenAI Chat Completions formatOpenAI SDK compatibleAPI reference

This is the only inference surface that streams.Supported

Feature	Details
Core parameters	`model`, `messages`, `max_completion_tokens`, `temperature`, `top_p`, `user`
Message roles	`system`, `user`, `assistant`, `tool`, `function` (deprecated), `developer`
Structured outputs	`response_format` with `type: "text"`, `"json_object"`, or `"json_schema"`
Reasoning	`reasoning_effort` (`none` / `minimal` / `low` / `medium` / `high` / `xhigh`)
Function tools	`tools` with `type: "function"` — standard `{type, function: {name, description, parameters, strict}}` format
Custom tools	`tools` with `type: "custom"`
Tool choice	`tool_choice`: `"none"`, `"auto"`, `"required"`, or a specific function/custom tool
Parallel tool calls	`parallel_tool_calls` is passed through
Metadata	`metadata` with string key-value pairs, including

	`completion_window` and `completion_webhook`
Streaming	`stream: true` returns Server-Sent Events (`chat.completion.chunk`); `stream_options.include_usage` adds a final usage chunk. Reasoning is streamed as `reasoning_content` deltas and tool calls are emitted atomically.
Image input	`image_url` content parts on multimodal models. Non-multimodal models accept text only.

Not yet supported

Feature	Notes
Multiple choices	`n` is required to be `1`.
Multimodal content	Audio (`input_audio`) content parts are unsupported. Image (`image_url`) input works on multimodal models (see above).
Sampling controls	`frequency_penalty`, `presence_penalty`, `logit_bias`, `stop`, `seed`, `top_logprobs`, `logprobs`, `verbosity` are unsupported.
Audio modality	Neither `audio` nor `modalities: ["audio"]` is supported.
Predicted output	`prediction` is unsupported.
Web search	`web_search_options` is unsupported.
Service tier	`"auto"` is the only accepted value.
CRUD endpoints	The `GET`, `POST`, and `DELETE` operations on stored completions are not implemented.
Deprecated fields	`max_tokens`, `functions`, and `function_call` are rejected; switch to their modern replacements.

What comes back

A response always carries exactly one choice (n=1).
finish_reason is either "stop" or "tool_calls" — values such as "length" and "content_filter" are never returned.
Neither system_fingerprint nor service_tier appears in responses.
logprobs is always null.

Messages API — POST /v1/messages

Anthropic Messages formatAnthropic SDK compatibleAPI reference

This API is in beta. It covers core text generation and image input today; several Anthropic features — system prompts, extended thinking, tools, and stop sequences — aren’t supported yet (see below).

Supported

Feature	Details
Core parameters	`model`, `max_tokens`, `messages`
Sampling	`temperature` (0–1), `top_p` (0–1)
Structured outputs	`output_config.format` with `type: "json_schema"`
Metadata	`metadata` with string key-value pairs, including `completion_window` and `completion_webhook`

Image input	`image` content blocks on multimodal models. Non-multimodal models accept text only.

Not yet supported

Feature	Notes
Streaming	`stream: true` is rejected.
System prompt	The `system` parameter is unsupported.
Extended thinking	`thinking` is unsupported.
Tools	`tools` and `tool_choice` are taken for compatibility but do nothing — the model never sees them and can’t call tools.
Stop sequences	`stop_sequences` is unsupported.
Top-K sampling	`top_k` is unsupported.
Multimodal content	Document and tool-result content blocks are unsupported. Image input works on multimodal models (see above).
Service tier	`service_tier` is unsupported.
Inference geo	`inference_geo` is unsupported.
Count tokens	`POST /v1/messages/count_tokens` is not implemented.
Batches	`POST /v1/messages/batches` and its related endpoints are not implemented.

What comes back

stop_reason is always "end_turn"; values like "max_tokens", "tool_use", and "stop_sequence" are never returned.
The response content is always a single text block — thinking blocks and tool-use blocks are not returned.
Cache-related usage fields (cache_creation_input_tokens, cache_read_input_tokens) are omitted.

Authenticating with the Anthropic SDKAuthentication uses Authorization: Bearer <key>; Valar does not accept the Anthropic x-api-key header. With the Anthropic SDK, supply your key through auth_token rather than api_key:

from anthropic import Anthropic

client = Anthropic(
    auth_token="YOUR_VALAR_API_KEY",
    base_url="https://api.valarhq.ai",
)

The anthropic-version header is neither required nor inspected, and errors come back in the OpenAI-style error envelope format.

Batch API

The Batch API runs large volumes of Responses API requests asynchronously. Every entry targets /v1/responses — you can’t batch /v1/chat/completions or /v1/messages today. One POST /v1/batches call takes up to 100,000 requests; you then poll GET /v1/batches/{id} for status and fetch each result by its custom_id. For the end-to-end workflow, see Sending Requests at Scale; for the request and response schemas, see the Batch API reference.

​Behavior shared across every API

​Inference APIs

​Batch API

Behavior shared across every API

Inference APIs

Batch API