Get started free

Stop hoping.
Start knowing.

Ship AI changes to real traffic. Auto-promote what works. Rollback what doesn't.

No credit card · Setup with Claude Code

Live Traffic
1,315 req/s
Stable v1.295%
Healthy
230ms avg·0.4% errors·4.2 judge
Test v1.35%
2 errors detected
340ms avg·12.3% errors·2.1 judge
Auto Rollback Triggered
Error rate 12.3% exceeded 5% threshold · Traffic → 0%

You can't eval what you can't predict.

Without Convoy

Ship to production
Hope nothing breaks
Wait for users to report issues
Panic rollback at 3am

With Convoy

Ship to 5% of traffic
AI judges evaluate quality
Auto-rollback if bad
Auto-promote if good

Three steps. Five minutes.

Convoy is an HTTP proxy. Point your traffic through it, configure your rollout, and let it handle the rest.

1

Point traffic through Convoy

Send requests to your Convoy proxy URL with a Session-ID header. Works with any HTTP-based agent.

await fetch(CONVOY_PROXY_URL, {
headers: {
"Session-ID": sessionId,
"Authorization": `Bearer ${CONVOY_SECRET}`,
},
body: JSON.stringify({ input })
});
2

Deploy a test version

Set traffic split, quality thresholds, and judge criteria. Convoy routes 5% of new sessions to the test version.

Traffic Configuration
Test traffic5%
Judge threshold: 3.5
Error rate: <5%
3

Convoy handles the rest

AI judges evaluate responses. Quality holds, auto-promote to 100%. Quality drops, instant rollback.

Latency
Error Rate
Judge
Cost
Traffic %
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
5%
0%
5%
10%

Traffic Splitting

Start at 5%. End at 100%. Automatically.

Configurable rollout plans

Gradually increase traffic: 5% to 20% to 50% to 100%.

Session-sticky routing

Same user always hits the same version for consistent experience.

Manual override

Pause, adjust, or force-promote at any time.

Traffic Distribution
Stable95%
Test5%
Plan
5→20→50→100
Sticky
Session ID
Override
Anytime

LLM Judge

AI evaluates your agent's output quality.

Plain English criteria

Define what "good" means in your own words.

Every response scored

Convoy runs a judge LLM on each response, scoring 1-5.

Real-time comparison

Compare judge scores between stable and test versions.

// Judge criteria (plain English)
criteria: "Is the response helpful, accurate,
and free of hallucinations?"
scale: 1-5
threshold: 3.5
model: gpt-4o-mini

Auto Rollback

Quality drops? Traffic stops. Instantly.

Threshold-based guardrails

Set limits for latency, error rate, judge score, and cost.

Evaluation windows

Convoy evaluates in time windows, not single requests.

Cooldown protection

Prevents flapping between promote and rollback.

Rollback Event
Latency
+340ms
Error Rate
12.3%
Judge Score
2.1 / 5
Cost
$0.003/req
Traffic rolled back to 0% at 14:23 UTC

Real-Time Metrics

Latency, errors, cost, quality. All in one place.

Side-by-side comparison

Stable vs test metrics, live.

Per-session drill-down

Full request/response history for any session.

Token cost tracking

Cost per interaction, per version.

Real-Time Comparison
Stable
Latency230ms
Errors0.4%
Judge4.2
Cost/req$0.002
Test
Latency215ms
Errors0.3%
Judge4.5
Cost/req$0.003

Integrate in minutes.
Works with any agent.

import httpx, os

response = await httpx.AsyncClient().post(
    os.environ["CONVOY_PROXY_URL"],
    headers={
        "Session-ID": session_id,
        "Authorization": f"Bearer {os.environ['CONVOY_SECRET']}",
    },
    json={"input": user_message},
)
  • Works with any HTTP-based agent or LLM wrapper
  • Session-sticky routing via Session-ID header
  • No SDK required. Convoy is just an HTTP proxy.

Stop hoping. Start knowing.

Start routing traffic through Convoy today.

Free to start · No credit card