Skip to main content
The endpoint is the OpenAI Responses API at /v1/responses, so any OpenAI-compatible client works after you change two settings: the base URL and the key. This walkthrough runs one realistic task end to end: classifying an inbound support ticket and drafting a reply. You send it as a background job, then retrieve the result once Valar finishes. The same pattern scales from this single call to the thousands of concurrent requests an agent fans out at runtime.
1

Create an API key

Sign in at app.valarhq.ai and create a key from the dashboard.
2

Point at Valar

Install the OpenAI SDK and point it at Valar. Set the base URL to https://api.valarhq.ai/v1 and pass your key as a bearer token. Nothing else about the OpenAI client changes.
3

Dispatch the task in the background

Setting background returns a response id immediately rather than holding the connection open. For one ticket this is convenient; across a queue of them it is what lets the work run concurrently. Use a model from the Models page — here, zai-org/GLM-5.1-FP8.
from openai import OpenAI

client = OpenAI(
    base_url="https://api.valarhq.ai/v1",
    api_key="YOUR_VALAR_API_KEY",  # or read VALAR_API_KEY from the environment
)

ticket = (
    "Subject: Charged twice this month\n"
    "I see two identical $49 charges on the 3rd. Can you refund one and "
    "tell me why it happened?"
)

started = client.responses.create(
    model="zai-org/GLM-5.1-FP8",
    instructions=(
        "You are a support triage agent. Classify the ticket as one of "
        "billing, technical, or account, then draft a short reply."
    ),
    input=ticket,
    background=True,  # returns a response id right away
)

print("Queued:", started.id)
4

Retrieve the request

The create call hands back a response id and a status of queued or in_progress. Retrieve that id until it reaches completed, then read output_text. In production you would replace this poll loop with a webhook so you aren’t holding a thread per job.
import time

response = started
while response.status in {"queued", "in_progress"}:
    time.sleep(2)
    response = client.responses.retrieve(response.id)

if response.status != "completed":
    raise RuntimeError(f"Task ended as {response.status}")

print(response.output_text)
5

Retrieve the result

The create call hands back a response id and a status of queued or in_progress. Retrieve that id until it reaches completed, then read output_text. In production you would replace this poll loop with a webhook so you aren’t holding a thread per job.
import time

response = started
while response.status in {"queued", "in_progress"}:
    time.sleep(2)
    response = client.responses.retrieve(response.id)

if response.status != "completed":
    raise RuntimeError(f"Task ended as {response.status}")

print(response.output_text)

Going further

A single triaged ticket is the unit; an agent is many of them in a loop. From here: Questions about a specific workload can go to [email protected].