Create a chat completion

curl --request POST \ --url https://api.valarhq.ai/v1/chat/completions \ --header 'Authorization: Bearer <token>' \ --header 'Content-Type: application/json' \ --data ' { "model": "zai-org/GLM-5.1-FP8", "messages": [ { "role": "system", "content": "You are a concise assistant." }, { "role": "user", "content": "Summarize retrieval-augmented generation in 3 bullets." } ], "max_completion_tokens": 300 } '

{ "id": "<string>", "created": 123, "model": "<string>", "choices": [ { "index": 123, "message": { "content": "<string>", "reasoning_content": "<string>", "refusal": "<string>" }, "logprobs": null } ], "usage": { "prompt_tokens": 123, "completion_tokens": 123, "total_tokens": 123, "prompt_tokens_details": {}, "completion_tokens_details": {} } }

Authorizations

Authorization

string

header

required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Headers

Idempotency-Key

string

Lets you retry safely. Valar records a reservation under the combination of organization, API key, and Idempotency-Key, so sending the same value again hands back the stored response rather than running inference a second time. Values can be up to 255 characters. See Idempotent Requests for the complete rules.

Maximum string length: 255

Body

application/json

model

string

required

messages

object[]

required

Minimum array length: 1

Show child attributes

temperature

number | null

Required range: 0 <= x <= 2

top_p

number | null

Required range: 0 <= x <= 1

max_completion_tokens

integer | null

Required range: x >= 1

response_format

object

Option 1
Option 2

Show child attributes

reasoning_effort

enum<string> | null

Available options:

none,

minimal,

low,

medium,

high,

xhigh

enum<integer>

Must be 1; requesting multiple choices is not supported yet.

Available options:

1

modalities

enum<string>[]

Required array length: 1 element

Available options:

text

stream

boolean

Set this to true to receive the answer as a Server-Sent Events stream of chat.completion.chunk objects rather than one JSON payload.

stream_options

object

Settings that only take effect while streaming (stream is true).

Show child attributes

store

enum<boolean>

Must be true; no other value is accepted.

Available options:

true

user

string

Maximum string length: 256

metadata

object

Optional string-valued metadata. Use completion_window to influence scheduling, and completion_webhook together with webhook_token to wire up a webhook that fires on completion.

Show child attributes

Response

The chat completion. By default this is one JSON object; with stream: true it becomes a Server-Sent Events stream of chat.completion.chunk objects closed by a final data: [DONE] line.

string

required

object

enum<string>

required

Available options:

chat.completion

created

integer

required

model

string

required

choices

object[]

required

Minimum array length: 1

Show child attributes

usage

object

Show child attributes