| API | Endpoint | Maturity |
|---|---|---|
| OpenAI Responses | POST /v1/responses | Stable |
| OpenAI Chat Completions | POST /v1/chat/completions | Stable |
| Anthropic Messages | POST /v1/messages | Beta |
| Batch | POST /v1/batches | Stable |
Behavior shared across every API
Before the per-API detail, a few rules hold no matter which surface you call:- Streaming is Chat Completions only. Pass
stream: trueto Chat Completions and you get Server-Sent Events (chat.completion.chunk); addstream_options.include_usagefor a closing usage chunk. The Responses and Messages APIs rejectstream: true. For long jobs, usebackground: trueon the Responses API and poll or wait on webhooks. - Completion windows steer scheduling and price. Set
metadata.completion_windowto"asap"(the Now tier) or"standard". See Completion windows and Pricing. - Webhooks fire on completion. Set
metadata.completion_webhookto receive a POST when processing finishes. See Webhooks. - Responses are always stored.
store: falseis unsupported.
Inference APIs
Each accordion below lists what the API accepts and what it rejects. Open the one that matches the SDK you’re using.Responses API — POST /v1/responses
Responses API — POST /v1/responses
This is the surface we recommend reaching for first.Supported
Not yet supported
| Feature | Details |
|---|---|
| Core parameters | model, input (string or message array), max_output_tokens, temperature, top_p, user, prompt_cache_key |
| Structured outputs | text.format with type: "text" or type: "json_schema" |
| Reasoning | reasoning.effort (none / minimal / low / medium / high / xhigh), reasoning.generate_summary (auto / concise / detailed) |
| Function tools | tools with type: "function" — client-side function calling with name, description, parameters, strict |
| Custom tools | tools with type: "custom" |
| Tool choice | tool_choice: "none", "auto", "required", or a specific function/custom tool |
| Background mode | background: true returns 202 immediately; poll with GET /v1/responses/{id} |
| Prompt cache routing | prompt_cache_key is an optional routing hint for requests that share a large prompt prefix |
| Image input | input_image content blocks on multimodal models. Non-multimodal models accept text only. |
| Output logprobs | include: ["message.output_text.logprobs"] returns one logprob per output token (best effort; omitted for models served via a proxy that does not return logprobs). |
| Feature | Notes |
|---|---|
| Streaming | stream: true is rejected. Every response comes back as one JSON object. |
| Instructions | instructions is unsupported. Put system messages straight into input. |
| Conversation chaining | previous_response_id and conversation are unsupported. Resend the full input on each call. |
| Prompt templates | The prompt parameter is unsupported. |
| Server-side tools | web_search, file_search, code_interpreter, computer_use, mcp, image_generation, shell, apply_patch are unsupported. |
| Multimodal input | Audio and file input blocks are unsupported. Image input works on multimodal models (see above). |
| Include | Accepted for compatibility when passed as an array of strings. A request is rejected if it includes reasoning.encrypted_content, web_search_call.action.sources, code_interpreter_call.outputs, computer_call_output.output.image_url, or file_search_call.results. |
| Truncation | "disabled" is the only accepted value; custom truncation strategies are unsupported. |
| Parallel tool calls | parallel_tool_calls is unsupported. |
| json_object format | text.format.type: "json_object" is unsupported. Reach for "json_schema" instead. |
| Service tier | "auto" is the only accepted value. Use metadata.completion_window to govern response timing instead. |
| Delete / cancel | DELETE /v1/responses/{id} and the cancel endpoints are not implemented. |
Chat Completions API — POST /v1/chat/completions
Chat Completions API — POST /v1/chat/completions
This is the only inference surface that streams.Supported
Not yet supported
What comes back
Not yet supported
What comes backThe
| Feature | Details |
|---|---|
| Core parameters | model, messages, max_completion_tokens, temperature, top_p, user |
| Message roles | system, user, assistant, tool, function (deprecated), developer |
| Structured outputs | response_format with type: "text", "json_object", or "json_schema" |
| Reasoning | reasoning_effort (none / minimal / low / medium / high / xhigh) |
| Function tools | tools with type: "function" — standard {type, function: {name, description, parameters, strict}} format |
| Custom tools | tools with type: "custom" |
| Tool choice | tool_choice: "none", "auto", "required", or a specific function/custom tool |
| Parallel tool calls | parallel_tool_calls is passed through |
| Metadata | metadata with string key-value pairs, including |
completion_window and completion_webhook | |
|---|---|
| Streaming | stream: true returns Server-Sent Events (chat.completion.chunk); stream_options.include_usage adds a final usage chunk. Reasoning is streamed as reasoning_content deltas and tool calls are emitted atomically. |
| Image input | image_url content parts on multimodal models. Non-multimodal models accept text only. |
| Feature | Notes |
|---|---|
| Multiple choices | n is required to be 1. |
| Multimodal content | Audio (input_audio) content parts are unsupported. Image (image_url) input works on multimodal models (see above). |
| Sampling controls | frequency_penalty, presence_penalty, logit_bias, stop, seed, top_logprobs, logprobs, verbosity are unsupported. |
| Audio modality | Neither audio nor modalities: ["audio"] is supported. |
| Predicted output | prediction is unsupported. |
| Web search | web_search_options is unsupported. |
| Service tier | "auto" is the only accepted value. |
| CRUD endpoints | The GET, POST, and DELETE operations on stored completions are not implemented. |
| Deprecated fields | max_tokens, functions, and function_call are rejected; switch to their modern replacements. |
- A response always carries exactly one choice (
n=1). finish_reasonis either"stop"or"tool_calls"— values such as"length"and"content_filter"are never returned.- Neither
system_fingerprintnorservice_tierappears in responses. logprobsis alwaysnull.
Messages API — POST /v1/messages
Messages API — POST /v1/messages
Supported
| Feature | Details |
|---|---|
| Core parameters | model, max_tokens, messages |
| Sampling | temperature (0–1), top_p (0–1) |
| Structured outputs | output_config.format with type: "json_schema" |
| Metadata | metadata with string key-value pairs, including completion_window and completion_webhook |
| Image input | image content blocks on multimodal models. Non-multimodal models accept text only. |
|---|
| Feature | Notes |
|---|---|
| Streaming | stream: true is rejected. |
| System prompt | The system parameter is unsupported. |
| Extended thinking | thinking is unsupported. |
| Tools | tools and tool_choice are taken for compatibility but do nothing — the model never sees them and can’t call tools. |
| Stop sequences | stop_sequences is unsupported. |
| Top-K sampling | top_k is unsupported. |
| Multimodal content | Document and tool-result content blocks are unsupported. Image input works on multimodal models (see above). |
| Service tier | service_tier is unsupported. |
| Inference geo | inference_geo is unsupported. |
| Count tokens | POST /v1/messages/count_tokens is not implemented. |
| Batches | POST /v1/messages/batches and its related endpoints are not implemented. |
stop_reasonis always"end_turn"; values like"max_tokens","tool_use", and"stop_sequence"are never returned.- The response content is always a single
textblock — thinking blocks and tool-use blocks are not returned. - Cache-related usage fields (
cache_creation_input_tokens,cache_read_input_tokens) are omitted.
Authorization: Bearer <key>; Valar does not accept the Anthropic x-api-key header. With the Anthropic SDK, supply your key through auth_token rather than api_key:anthropic-version header is neither required nor inspected, and errors come back in the OpenAI-style error envelope format.Batch API
The Batch API runs large volumes of Responses API requests asynchronously. Every entry targets/v1/responses — you can’t batch /v1/chat/completions or /v1/messages today. One POST /v1/batches call takes up to 100,000 requests; you then poll GET /v1/batches/{id} for status and fetch each result by its custom_id.
For the end-to-end workflow, see Sending Requests at Scale; for the request and response schemas, see the Batch API reference.