/v1/responses exposes exactly the primitives you need for this — tool definitions on the way in, function_call items on the way out, and function_call_output items to feed results back in.
This guide builds a small agent that books meeting rooms, walks through what each field does, and closes with the operational settings that matter once it’s running for real.
What you provide and what you get back
You hand the model a list oftools. Each tool is a JSON Schema description of a function it may call. When the model decides a tool is needed, its response.output contains one or more function_call items instead of (or alongside) plain text. You execute those calls and report back.
The key property to internalize: the API is stateless across requests, so you carry the conversation yourself. Every item in response.output is already a valid input item. You append those items to your running list verbatim — no reshaping — then append your tool results as function_call_output items and send the whole thing again.
Because you replay the full history on each request, the model always sees its own earlier tool calls and their outputs. There is nothing to serialize or translate; output items go back in as-is.
The conversation cycle
Send the user turn plus your tools
Append the user message to your conversation list and
POST it to /v1/responses along with the tools array.Inspect the output for tool calls
Scan
response.output for items of type function_call. If there are none, the turn is finished and output_text holds the answer.Run each requested function
Parse the
arguments JSON on each call, execute the matching function on your side, and capture its return value.function_call items.
Example: a room-booking assistant
The agent below answers scheduling questions with two tools: one that checks a room’s availability and one that reserves it. The model ismoonshotai/Kimi-K2.6. On the first user turn it looks up availability; once you return the result it reserves the room and writes a confirmation. A second user turn then reuses everything already in context to book a follow-up.
How the two turns play out item by item
How the two turns play out item by item
On turn one the model emits a
check_room call. You return free: true, resend the conversation, and the model follows up with a reserve_room call. After you return the confirmation number, the next response is plain text — the loop exits.On turn two you resend the entire history, including turn one’s calls and their outputs. The model already knows June 8 from context, so it only needs to call check_room and reserve_room for Cedar before confirming.Running it in production
- Prefer
background=Truefor anything long-running. Valar optimizes for throughput, so an individual request can run longer than a latency-first API would. Background mode avoids HTTP timeouts and lets you poll for completion instead. - Match the completion window to your loop. The default
standardwindow balances cost against trajectory time for most agents. Switch to the Now (asap) window when a single turn is latency-sensitive and a person or system is waiting on it. See Completion windows for response-time and pricing details. - Replay the complete conversation every request. Every prior message, all
response.outputitems, and everyfunction_call_outputmust be present. Output items drop back in without conversion. - Handle parallel calls. A single response can contain several
function_callitems at once; execute them all and return onefunction_call_outputpercall_id.