Skip to main content

The problem retries create

A POST that times out or returns a 5xx leaves you guessing. The server may have received the request and run the inference, or it may not have. Retrying blindly risks running — and billing — the same job twice. Not retrying risks dropping a job that never actually completed. An idempotency key removes the guesswork. You attach a key to the first attempt, Valar stores the response under that key, and every later attempt with the same key returns that stored response instead of running inference again. Retrying becomes safe across flaky networks, timeouts, and ambiguous 5xx responses.

How it works

Send the key in the Idempotency-Key request header. Use one key per logical request — a UUID or any unique string up to 255 characters — and send the same key on the first attempt and every retry.
from openai import OpenAI

client = OpenAI(
    api_key="YOUR_VALAR_API_KEY",
    base_url="https://api.valarhq.ai/v1",
)

response = client.responses.create(
    model="zai-org/GLM-5.1-FP8",
    input="Summarize this document.",
    extra_headers={"Idempotency-Key": "order-9f8e7d6c"},
)
The first call reserves the key and runs the job. Each subsequent call with that key lands on the same reservation and returns the stored response without re-running inference. The key only earns its keep once you actually retry.

Rules that govern a reservation

Reservation scope
(organization, API key, idempotency key)
A reservation is identified by all three together. The same key under a different API key is a separate reservation. Rotating API keys therefore won’t break replay, but the same idempotency key won’t dedupe across two different API keys.
Body fingerprint
SHA-256
Valar fingerprints the request body. Reusing a key with a meaningfully different body returns 400 idempotency_error rather than the earlier response, which prevents a retry from silently returning the wrong answer after the client changed the request.
Validation failures
key not consumed
Any 4xx raised before the reservation is written leaves the key unreserved. You can fix the body and retry under the same key.
Replays reflect live state
not a frozen copy
A replay returns the current state of the underlying response record. For background requests the status tracks the task’s latest transition, such as queuedin_progresscompleted.

Deciding when to retry

Reach for the same idempotency key whenever the outcome is ambiguous. Don’t retry when the work genuinely failed.
SignalWhat it meansWhat to do
Network error / timeoutAmbiguous — the server may or may not have received the requestRetry with the same idempotency key
5xx on a POSTTransient server-side failureRetry with the same idempotency key
429 + Retry-AfterRate limitWait the Retry-After value, then retry
body.status: "failed"Inference genuinely failedInvestigate the cause; do not blind-retry

A retry loop with backoff

Generate the key once, outside the loop, so every attempt shares it.
import random
import time
import uuid
import requests

API_KEY = "YOUR_VALAR_API_KEY"
BASE_URL = "https://api.valarhq.ai/v1"

# One key, reused across every retry.
headers = {
    "Authorization": f"Bearer {API_KEY}",
    "Content-Type": "application/json",
    "Idempotency-Key": str(uuid.uuid4()),
}
body = {"model": "zai-org/GLM-5.1-FP8", "input": "Summarize this document."}

for attempt in range(5):
    try:
        resp = requests.post(f"{BASE_URL}/responses", headers=headers, json=body, timeout=30)
        resp.raise_for_status()
        break
    except requests.RequestException:
        # Network error or 5xx — safe to retry, the key dedupes server-side.
        time.sleep(random.uniform(0, 2**attempt))
else:
    raise RuntimeError("all retries exhausted")

print(resp.json()["id"])

See also

  • Sending Requests at Scale — the batch and background workflows where idempotent retries pay off most.
  • Webhooks — pair idempotent retries with webhook delivery so a client can re-drive submission without triggering another inference run.