Quickstart - Valar

The endpoint is the OpenAI Responses API at /v1/responses, so any OpenAI-compatible client works after you change two settings: the base URL and the key. This walkthrough runs one realistic task end to end: classifying an inbound support ticket and drafting a reply. You send it as a background job, then retrieve the result once Valar finishes. The same pattern scales from this single call to the thousands of concurrent requests an agent fans out at runtime.

Create an API key

Point at Valar

Install the OpenAI SDK and point it at Valar. Set the base URL to https://api.valarhq.ai/v1 and pass your key as a bearer token. Nothing else about the OpenAI client changes.

Dispatch the task in the background

Setting background returns a response id immediately rather than holding the connection open. For one ticket this is convenient; across a queue of them it is what lets the work run concurrently. Use a model from the Models page — here, zai-org/GLM-5.1-FP8.

from openai import OpenAI

client = OpenAI(
    base_url="https://api.valarhq.ai/v1",
    api_key="YOUR_VALAR_API_KEY",  # or read VALAR_API_KEY from the environment
)

ticket = (
    "Subject: Charged twice this month\n"
    "I see two identical $49 charges on the 3rd. Can you refund one and "
    "tell me why it happened?"
)

started = client.responses.create(
    model="zai-org/GLM-5.1-FP8",
    instructions=(
        "You are a support triage agent. Classify the ticket as one of "
        "billing, technical, or account, then draft a short reply."
    ),
    input=ticket,
    background=True,  # returns a response id right away
)

print("Queued:", started.id)

Retrieve the request

The create call hands back a response id and a status of queued or in_progress. Retrieve that id until it reaches completed, then read output_text. In production you would replace this poll loop with a webhook so you aren’t holding a thread per job.

import time

response = started
while response.status in {"queued", "in_progress"}:
    time.sleep(2)
    response = client.responses.retrieve(response.id)

if response.status != "completed":
    raise RuntimeError(f"Task ended as {response.status}")

print(response.output_text)

Retrieve the result

import time

response = started
while response.status in {"queued", "in_progress"}:
    time.sleep(2)
    response = client.responses.retrieve(response.id)

if response.status != "completed":
    raise RuntimeError(f"Task ended as {response.status}")

print(response.output_text)

Going further

A single triaged ticket is the unit; an agent is many of them in a loop. From here:

See Models for the full list of supported models, and Pricing for per-token rates.
Turn this into a tool-using agent that looks up the customer’s billing record before replying - see Building a tool-calling agent.
Run the same task over a backlog of tickets at once with Requests at scale, choosing a completion window per the latency you can tolerate.

Questions about a specific workload can go to [email protected].

​Going further

Going further