Skip to main content
When a request can take a while to finish, you don’t have to poll GET /v1/responses/{response_id} in a loop. Attach a callback URL to the request and Valar delivers the finished result to your server as an HTTP POST. The feature works on POST /v1/responses, POST /v1/chat/completions, and POST /v1/messages, and is configured entirely through two metadata keys.

Parameters

Both keys live inside the metadata object on the create request. The same keys apply to Chat Completions and Anthropic Messages requests.
metadata.completion_webhook
string
The destination URL for the callback. Must be an http or https URL. If you omit it or pass a value that can’t be used, no callback is sent and the request itself is unaffected.
metadata.webhook_token
string
An optional shared secret. Valar sends its value as a Bearer token in the callback’s Authorization header so your endpoint can confirm the request is genuine.

Set up an endpoint

1

Expose a route that accepts POST

Your endpoint receives a JSON body and should respond as soon as it has accepted the payload. The body matches exactly what GET /v1/responses/{response_id} returns for the same request.
from fastapi import FastAPI, Request

app = FastAPI()

@app.post("/hooks/inference-done")
async def inference_done(request: Request):
    payload = await request.json()
    # payload is the same object GET /v1/responses/{id} returns
    enqueue_for_processing(payload["id"], payload)
    return {"ok": True}
2

Verify the Authorization header

If you set webhook_token, reject any incoming call whose header doesn’t match. Compare against Authorization: Bearer <your token> and return a 4xx for anything else.
from fastapi import Header, HTTPException

WEBHOOK_TOKEN = "whk_3f0a9c2e7b14"

def assert_authorized(authorization: str = Header(default="")):
    if authorization != f"Bearer {WEBHOOK_TOKEN}":
        raise HTTPException(status_code=401)
3

Submit a request that points at your endpoint

Set background=True and add the callback URL (and token, if you use one) to metadata.
from openai import OpenAI

client = OpenAI(
    api_key="YOUR_VALAR_API_KEY",
    base_url="https://api.valarhq.ai/v1",
)

response = client.responses.create(
    model="zai-org/GLM-5.1-FP8",
    input="Summarize this document.",
    background=True,
    metadata={
        "completion_webhook": "https://app.example.com/hooks/inference-done",
        "webhook_token": "whk_3f0a9c2e7b14",
    },
)

What Valar sends

The callback is a single POST request.
method
POST
Always a POST to the URL in completion_webhook.
Content-Type
header
application/json.
Authorization
header
Present only when webhook_token was set. Carries Bearer <webhook_token>.
body
object
The full response object, byte-for-byte identical to a GET /v1/responses/{response_id} call. The response id lives here and is your key for deduplication.

Delivery semantics

The same callback can be delivered more than once. Treat the response id in the body as an idempotency key and ignore any id you have already processed.
  • Retries. A delivery that returns a non-2xx status or hits a network error is retried up to 3 times. Return a 2xx as soon as you accept the payload to stop the retries.
  • Timeout. Each attempt has a 30 second ceiling. A timeout counts as a failed attempt and triggers the next retry.
  • Best-effort. Callbacks are best-effort. A failure is logged but never touches the response record or the API, and the result stays available through GET /v1/responses/{response_id} regardless of whether delivery ever succeeded.

Test it locally with ngrok

You can point a real request at a listener on your own machine.
1

Run a listener that prints the body and returns 200

python -c "
from http.server import HTTPServer, BaseHTTPRequestHandler; import json
class H(BaseHTTPRequestHandler):
 def do_POST(self):
  print(json.dumps(json.loads(self.rfile.read(int(self.headers['Content-Length']))), indent=2))
  self.send_response(200); self.end_headers()
HTTPServer(('127.0.0.1', 8765), H).serve_forever()
"
2

Tunnel to it

ngrok http 8765
Copy the https://xxxx.ngrok-free.app forwarding URL from the output.
3

Submit a request against the tunnel

curl -X POST https://api.valarhq.ai/v1/responses \
  -H "Authorization: Bearer YOUR_VALAR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "zai-org/GLM-5.1-FP8",
    "input": "What is 2+2? Reply with just the number.",
    "background": true,
    "metadata": {
      "completion_webhook": "https://xxxx.ngrok-free.app"
    }
  }'
When the response finishes, Valar POSTs the full payload to your listener.