Stop hoping.
Start knowing.

Ship AI changes to real traffic. Auto-promote what works. Rollback what doesn't.

Get started free Book a demo

No credit card · Setup with Claude Code

Live Traffic

1,315 req/s

Stable v1.295%

Healthy

230ms avg·0.4% errors·4.2 judge

Test v1.35%

2 errors detected

340ms avg·12.3% errors·2.1 judge

Auto Rollback Triggered

Error rate 12.3% exceeded 5% threshold · Traffic → 0%

You can't eval what you can't predict.

Without Convoy

Ship to production

Hope nothing breaks

Wait for users to report issues

Panic rollback at 3am

With Convoy

Ship to 5% of traffic

AI judges evaluate quality

Auto-rollback if bad

Auto-promote if good

Three steps. Five minutes.

Convoy is an HTTP proxy. Point your traffic through it, configure your rollout, and let it handle the rest.

Point traffic through Convoy

Send requests to your Convoy proxy URL with a Session-ID header. Works with any HTTP-based agent.

await fetch(CONVOY_PROXY_URL, {

headers: {

"Session-ID": sessionId,

"Authorization": `Bearer ${CONVOY_SECRET}`,

body: JSON.stringify({ input })

});

Deploy a test version

Set traffic split, quality thresholds, and judge criteria. Convoy routes 5% of new sessions to the test version.

Traffic Configuration

Test traffic5%

Judge threshold: 3.5

Error rate: <5%

Convoy handles the rest

AI judges evaluate responses. Quality holds, auto-promote to 100%. Quality drops, instant rollback.

Latency

Error Rate

Judge

Cost

Traffic %

10%

Traffic Splitting

Start at 5%. End at 100%. Automatically.

Configurable rollout plans

Gradually increase traffic: 5% to 20% to 50% to 100%.

Session-sticky routing

Same user always hits the same version for consistent experience.

Manual override

Pause, adjust, or force-promote at any time.

Traffic Distribution

Stable95%

Test5%

Plan

5→20→50→100

Sticky

Session ID

Override

Anytime

LLM Judge

AI evaluates your agent's output quality.

Plain English criteria

Define what "good" means in your own words.

Every response scored

Convoy runs a judge LLM on each response, scoring 1-5.

Real-time comparison

Compare judge scores between stable and test versions.

// Judge criteria (plain English)

criteria: "Is the response helpful, accurate,

and free of hallucinations?"

scale: 1-5

threshold: 3.5

model: gpt-4o-mini

Auto Rollback

Quality drops? Traffic stops. Instantly.

Threshold-based guardrails

Set limits for latency, error rate, judge score, and cost.

Evaluation windows

Convoy evaluates in time windows, not single requests.

Cooldown protection

Prevents flapping between promote and rollback.

Rollback Event

Latency

+340ms

Error Rate

12.3%

Judge Score

2.1 / 5

Cost

$0.003/req

Traffic rolled back to 0% at 14:23 UTC

Real-Time Metrics

Latency, errors, cost, quality. All in one place.

Side-by-side comparison

Stable vs test metrics, live.

Per-session drill-down

Full request/response history for any session.

Token cost tracking

Cost per interaction, per version.

Real-Time Comparison

Stable

Latency230ms

Errors0.4%

Judge4.2

Cost/req$0.002

Test

Latency215ms

Errors0.3%

Judge4.5

Cost/req$0.003

Integrate in minutes.
Works with any agent.

import httpx, os

response = await httpx.AsyncClient().post(
    os.environ["CONVOY_PROXY_URL"],
    headers={
        "Session-ID": session_id,
        "Authorization": f"Bearer {os.environ['CONVOY_SECRET']}",
    },
    json={"input": user_message},
)

Works with any HTTP-based agent or LLM wrapper
Session-sticky routing via Session-ID header
No SDK required. Convoy is just an HTTP proxy.