The problem retries create
APOST that times out or returns a 5xx leaves you guessing. The server may have received the request and run the inference, or it may not have. Retrying blindly risks running — and billing — the same job twice. Not retrying risks dropping a job that never actually completed.
An idempotency key removes the guesswork. You attach a key to the first attempt, Valar stores the response under that key, and every later attempt with the same key returns that stored response instead of running inference again. Retrying becomes safe across flaky networks, timeouts, and ambiguous 5xx responses.
How it works
Send the key in theIdempotency-Key request header. Use one key per logical request — a UUID or any unique string up to 255 characters — and send the same key on the first attempt and every retry.
Rules that govern a reservation
A reservation is identified by all three together. The same key under a different API key is a separate reservation. Rotating API keys therefore won’t break replay, but the same idempotency key won’t dedupe across two different API keys.
Valar fingerprints the request body. Reusing a key with a meaningfully different body returns
400 idempotency_error rather than the earlier response, which prevents a retry from silently returning the wrong answer after the client changed the request.Any
4xx raised before the reservation is written leaves the key unreserved. You can fix the body and retry under the same key.A replay returns the current state of the underlying response record. For background requests the
status tracks the task’s latest transition, such as queued → in_progress → completed.Deciding when to retry
Reach for the same idempotency key whenever the outcome is ambiguous. Don’t retry when the work genuinely failed.| Signal | What it means | What to do |
|---|---|---|
| Network error / timeout | Ambiguous — the server may or may not have received the request | Retry with the same idempotency key |
5xx on a POST | Transient server-side failure | Retry with the same idempotency key |
429 + Retry-After | Rate limit | Wait the Retry-After value, then retry |
body.status: "failed" | Inference genuinely failed | Investigate the cause; do not blind-retry |
A retry loop with backoff
Generate the key once, outside the loop, so every attempt shares it.See also
- Sending Requests at Scale — the batch and background workflows where idempotent retries pay off most.
- Webhooks — pair idempotent retries with webhook delivery so a client can re-drive submission without triggering another inference run.