Skip to main content
Valar gives you four endpoints. Three are inference surfaces shaped after APIs you already know, and one batches work asynchronously:
APIEndpointMaturity
OpenAI ResponsesPOST /v1/responsesStable
OpenAI Chat CompletionsPOST /v1/chat/completionsStable
Anthropic MessagesPOST /v1/messagesBeta
BatchPOST /v1/batchesStable
The same models and completion windows work across all three inference surfaces. The Batch API layers on top: it wraps a large set of Responses API calls into one asynchronous job.

Behavior shared across every API

Before the per-API detail, a few rules hold no matter which surface you call:
  • Streaming is Chat Completions only. Pass stream: true to Chat Completions and you get Server-Sent Events (chat.completion.chunk); add stream_options.include_usage for a closing usage chunk. The Responses and Messages APIs reject stream: true. For long jobs, use background: true on the Responses API and poll or wait on webhooks.
  • Completion windows steer scheduling and price. Set metadata.completion_window to "asap" (the Now tier) or "standard". See Completion windows and Pricing.
  • Webhooks fire on completion. Set metadata.completion_webhook to receive a POST when processing finishes. See Webhooks.
  • Responses are always stored. store: false is unsupported.

Inference APIs

Each accordion below lists what the API accepts and what it rejects. Open the one that matches the SDK you’re using.

Responses API — POST /v1/responses

RecommendedOpenAI Responses formatOpenAI SDK compatibleAPI reference
This is the surface we recommend reaching for first.Supported
FeatureDetails
Core parametersmodel, input (string or message array), max_output_tokens, temperature, top_p, user, prompt_cache_key
Structured outputstext.format with type: "text" or type: "json_schema"
Reasoningreasoning.effort (none / minimal / low / medium / high / xhigh), reasoning.generate_summary (auto / concise / detailed)
Function toolstools with type: "function" — client-side function calling with name, description, parameters, strict
Custom toolstools with type: "custom"
Tool choicetool_choice: "none", "auto", "required", or a specific function/custom tool
Background modebackground: true returns 202 immediately; poll with GET /v1/responses/{id}
Prompt cache routingprompt_cache_key is an optional routing hint for requests that share a large prompt prefix
Image inputinput_image content blocks on multimodal models. Non-multimodal models accept text only.
Output logprobsinclude: ["message.output_text.logprobs"] returns one logprob per output token (best effort; omitted for models served via a proxy that does not return logprobs).
Not yet supported
FeatureNotes
Streamingstream: true is rejected. Every response comes back as one JSON object.
Instructionsinstructions is unsupported. Put system messages straight into input.
Conversation chainingprevious_response_id and conversation are unsupported. Resend the full input on each call.
Prompt templatesThe prompt parameter is unsupported.
Server-side toolsweb_search, file_search, code_interpreter, computer_use, mcp, image_generation, shell, apply_patch are unsupported.
Multimodal inputAudio and file input blocks are unsupported. Image input works on multimodal models (see above).
IncludeAccepted for compatibility when passed as an array of strings. A request is rejected if it includes reasoning.encrypted_content, web_search_call.action.sources, code_interpreter_call.outputs, computer_call_output.output.image_url, or file_search_call.results.
Truncation"disabled" is the only accepted value; custom truncation strategies are unsupported.
Parallel tool callsparallel_tool_calls is unsupported.
json_object formattext.format.type: "json_object" is unsupported. Reach for "json_schema" instead.
Service tier"auto" is the only accepted value. Use metadata.completion_window to govern response timing instead.
Delete / cancelDELETE /v1/responses/{id} and the cancel endpoints are not implemented.
OpenAI Chat Completions formatOpenAI SDK compatibleAPI reference
This is the only inference surface that streams.Supported
FeatureDetails
Core parametersmodel, messages, max_completion_tokens, temperature, top_p, user
Message rolessystem, user, assistant, tool, function (deprecated), developer
Structured outputsresponse_format with type: "text", "json_object", or "json_schema"
Reasoningreasoning_effort (none / minimal / low / medium / high / xhigh)
Function toolstools with type: "function" — standard {type, function: {name, description, parameters, strict}} format
Custom toolstools with type: "custom"
Tool choicetool_choice: "none", "auto", "required", or a specific function/custom tool
Parallel tool callsparallel_tool_calls is passed through
Metadatametadata with string key-value pairs, including
completion_window and completion_webhook
Streamingstream: true returns Server-Sent Events (chat.completion.chunk); stream_options.include_usage adds a final usage chunk. Reasoning is streamed as reasoning_content deltas and tool calls are emitted atomically.
Image inputimage_url content parts on multimodal models. Non-multimodal models accept text only.
Not yet supported
FeatureNotes
Multiple choicesn is required to be 1.
Multimodal contentAudio (input_audio) content parts are unsupported. Image (image_url) input works on multimodal models (see above).
Sampling controlsfrequency_penalty, presence_penalty, logit_bias, stop, seed, top_logprobs, logprobs, verbosity are unsupported.
Audio modalityNeither audio nor modalities: ["audio"] is supported.
Predicted outputprediction is unsupported.
Web searchweb_search_options is unsupported.
Service tier"auto" is the only accepted value.
CRUD endpointsThe GET, POST, and DELETE operations on stored completions are not implemented.
Deprecated fieldsmax_tokens, functions, and function_call are rejected; switch to their modern replacements.
What comes back
  • A response always carries exactly one choice (n=1).
  • finish_reason is either "stop" or "tool_calls" — values such as "length" and "content_filter" are never returned.
  • Neither system_fingerprint nor service_tier appears in responses.
  • logprobs is always null.
Anthropic Messages formatAnthropic SDK compatibleAPI reference
This API is in beta. It covers core text generation and image input today; several Anthropic features — system prompts, extended thinking, tools, and stop sequences — aren’t supported yet (see below).
Supported
FeatureDetails
Core parametersmodel, max_tokens, messages
Samplingtemperature (0–1), top_p (0–1)
Structured outputsoutput_config.format with type: "json_schema"
Metadatametadata with string key-value pairs, including completion_window and completion_webhook
Image inputimage content blocks on multimodal models. Non-multimodal models accept text only.
Not yet supported
FeatureNotes
Streamingstream: true is rejected.
System promptThe system parameter is unsupported.
Extended thinkingthinking is unsupported.
Toolstools and tool_choice are taken for compatibility but do nothing — the model never sees them and can’t call tools.
Stop sequencesstop_sequences is unsupported.
Top-K samplingtop_k is unsupported.
Multimodal contentDocument and tool-result content blocks are unsupported. Image input works on multimodal models (see above).
Service tierservice_tier is unsupported.
Inference geoinference_geo is unsupported.
Count tokensPOST /v1/messages/count_tokens is not implemented.
BatchesPOST /v1/messages/batches and its related endpoints are not implemented.
What comes back
  • stop_reason is always "end_turn"; values like "max_tokens", "tool_use", and "stop_sequence" are never returned.
  • The response content is always a single text block — thinking blocks and tool-use blocks are not returned.
  • Cache-related usage fields (cache_creation_input_tokens, cache_read_input_tokens) are omitted.
Authenticating with the Anthropic SDKAuthentication uses Authorization: Bearer <key>; Valar does not accept the Anthropic x-api-key header. With the Anthropic SDK, supply your key through auth_token rather than api_key:
from anthropic import Anthropic

client = Anthropic(
    auth_token="YOUR_VALAR_API_KEY",
    base_url="https://api.valarhq.ai",
)
The anthropic-version header is neither required nor inspected, and errors come back in the OpenAI-style error envelope format.

Batch API

The Batch API runs large volumes of Responses API requests asynchronously. Every entry targets /v1/responses — you can’t batch /v1/chat/completions or /v1/messages today. One POST /v1/batches call takes up to 100,000 requests; you then poll GET /v1/batches/{id} for status and fetch each result by its custom_id. For the end-to-end workflow, see Sending Requests at Scale; for the request and response schemas, see the Batch API reference.