Building a Tool-Calling Agent

An agent is a model that can reach outside the conversation: it asks to run a function you control, reads what comes back, and decides what to do next. The Responses API at /v1/responses exposes exactly the primitives you need for this — tool definitions on the way in, function_call items on the way out, and function_call_output items to feed results back in. This guide builds a small agent that books meeting rooms, walks through what each field does, and closes with the operational settings that matter once it’s running for real.

What you provide and what you get back

You hand the model a list of tools. Each tool is a JSON Schema description of a function it may call. When the model decides a tool is needed, its response.output contains one or more function_call items instead of (or alongside) plain text. You execute those calls and report back. The key property to internalize: the API is stateless across requests, so you carry the conversation yourself. Every item in response.output is already a valid input item. You append those items to your running list verbatim — no reshaping — then append your tool results as function_call_output items and send the whole thing again.

Because you replay the full history on each request, the model always sees its own earlier tool calls and their outputs. There is nothing to serialize or translate; output items go back in as-is.

The conversation cycle

Send the user turn plus your tools

Append the user message to your conversation list and POST it to /v1/responses along with the tools array.

Inspect the output for tool calls

Scan response.output for items of type function_call. If there are none, the turn is finished and output_text holds the answer.

Run each requested function

Parse the arguments JSON on each call, execute the matching function on your side, and capture its return value.

Append results and resend

Push the model’s response.output items onto your list, then append one function_call_output per call (matched by call_id). Send the updated conversation back and return to step 2.

The loop ends on the first response that carries text with no function_call items.

Example: a room-booking assistant

The agent below answers scheduling questions with two tools: one that checks a room’s availability and one that reserves it. The model is moonshotai/Kimi-K2.6. On the first user turn it looks up availability; once you return the result it reserves the room and writes a confirmation. A second user turn then reuses everything already in context to book a follow-up.

import json
import time

from openai import OpenAI

client = OpenAI(
    base_url="https://api.valarhq.ai/v1",
    api_key="YOUR_VALAR_API_KEY",
)

MODEL = "moonshotai/Kimi-K2.6"

TOOLS = [
    {
        "type": "function",
        "name": "check_room",
        "description": "Check whether a meeting room is free for a time slot.",
        "parameters": {
            "type": "object",
            "properties": {
                "room": {"type": "string", "description": "Room name, e.g. Birch"},
                "slot": {"type": "string", "description": "Time slot, e.g. 2026-06-08 14:00"},
            },
            "required": ["room", "slot"],
            "additionalProperties": False,
        },
        "strict": True,
    },
    {
        "type": "function",
        "name": "reserve_room",
        "description": "Reserve a meeting room for a time slot.",
        "parameters": {
            "type": "object",
            "properties": {
                "room": {"type": "string", "description": "Room name, e.g. Birch"},
                "slot": {"type": "string", "description": "Time slot, e.g. 2026-06-08 14:00"},
            },
            "required": ["room", "slot"],
            "additionalProperties": False,
        },
        "strict": True,
    },
]

TOOL_DISPATCH = {
    "check_room": lambda room, slot: json.dumps({"room": room, "slot": slot, "free": True}),
    "reserve_room": lambda room, slot: json.dumps({"room": room, "slot": slot, "confirmation": "RSV-4417"}),
}


def await_completion(response, timeout=300):
    """Poll a background response until it settles."""
    deadline = time.time() + timeout
    while response.status not in ("completed", "failed", "cancelled"):
        if time.time() > deadline:
            raise TimeoutError(f"{response.id} did not complete within {timeout}s")
        time.sleep(2)
        response = client.responses.retrieve(response.id)
    if response.status != "completed":
        raise RuntimeError(f"{response.id} status: {response.status}")
    return response


def run_turn(conversation, user_message):
    """Append a user message and drive the tool loop to a final text answer."""
    conversation.append({"role": "user", "content": user_message})

    while True:
        response = client.responses.create(
            model=MODEL,
            input=conversation,
            tools=TOOLS,
            max_output_tokens=4096,
            background=True,
        )
        response = await_completion(response)

        # Output items are valid input items: append them unchanged.
        conversation.extend(response.output)

        tool_calls = [
            item for item in (response.output or [])
            if getattr(item, "type", None) == "function_call"
        ]
        if not tool_calls:
            return response

        for call in tool_calls:
            args = json.loads(call.arguments)
            result = TOOL_DISPATCH[call.name](**args)
            conversation.append(
                {"type": "function_call_output", "call_id": call.call_id, "output": result}
            )


conversation = []

# First turn: the model checks availability, then reserves and confirms.
first = run_turn(conversation, "Is the Birch room free at 2pm on June 8? If so, book it.")
print("Turn 1:", first.output_text)

# Second turn: the model reuses the booking already in context.
second = run_turn(conversation, "Great — also grab the Cedar room right after, at 3pm the same day.")
print("Turn 2:", second.output_text)

How the two turns play out item by item

On turn one the model emits a check_room call. You return free: true, resend the conversation, and the model follows up with a reserve_room call. After you return the confirmation number, the next response is plain text — the loop exits.On turn two you resend the entire history, including turn one’s calls and their outputs. The model already knows June 8 from context, so it only needs to call check_room and reserve_room for Cedar before confirming.

Running it in production

Set strict: true on each tool’s parameters. This enables structured-output guarantees, so the arguments JSON the model returns always conforms to your schema and json.loads never trips over a malformed payload.

Prefer background=True for anything long-running. Valar optimizes for throughput, so an individual request can run longer than a latency-first API would. Background mode avoids HTTP timeouts and lets you poll for completion instead.
Match the completion window to your loop. The default standard window balances cost against trajectory time for most agents. Switch to the Now (asap) window when a single turn is latency-sensitive and a person or system is waiting on it. See Completion windows for response-time and pricing details.
Replay the complete conversation every request. Every prior message, all response.output items, and every function_call_output must be present. Output items drop back in without conversion.
Handle parallel calls. A single response can contain several function_call items at once; execute them all and return one function_call_output per call_id.

​What you provide and what you get back

​The conversation cycle

​Example: a room-booking assistant

​Running it in production

What you provide and what you get back

The conversation cycle

Example: a room-booking assistant

Running it in production