Drop-in proxy for OpenAI & Anthropic

Lower your LLM bill.
Prove it against your invoice.

Optera sits in front of your model provider and cuts the cost of every request. It runs in read-only shadow mode first, measures what it would save on your own traffic, and reconciles every dollar to your real OpenAI and Anthropic invoice. You pay a slice of what it saves, or nothing.

Start in shadow mode → Read the docs

✓ One constructor change — swap the base URL, add your Optera key in one header. Your provider key and your code stay put.

✓ We don't change your model's answers. Output rewriting is off by default.

✓ Fails open: p99 added latency under ~15ms, or the request passes through untouched.

Saved this month reconciled to invoice

0% off your raw spend · checked against your provider bill

Optera-reportedProvider invoicematched within ~1%

/ 01 — the problem

The bill runs away where the agents are.

If you build coding or agent products, your token spend isn't a rounding line anymore. It's a number the CFO knows by name, and most of it is the same waste repeated millions of times.

400K–2Mtokens per agent task. A single autonomous run re-sends the system prompt, the tool definitions, and the whole history on every step.

$1,500per-developer monthly cap one large org imposed after its 2026 AI-coding budget ran dry four months into the year.

~40%of a typical agent bill is repeated prefixes, frontier models doing cheap work, and output you never read — all recoverable without touching quality.

Monthly spend, left alone

/ 02 — shadow mode

See the number before you change a thing.

Connect your SDK to Optera — one constructor change — and leave it in shadow mode. Every request goes to the provider untouched — no caching served, no prompt rewritten, no output altered.
Optera measures what each lever would have saved on your actual traffic, and prices it from the provider's own usage object — not a chars-over-four estimate.
You get a dollar figure produced on your own requests, before you've flipped a single switch or signed anything.
Convince yourself, then go live. The same numbers keep reconciling against your invoice after you do.

/ 03 — integration

Three steps, then it runs itself.

The whole integration is one constructor change: a new base URL plus your Optera key in one header. Your SDK, your provider key, and your code stay exactly as they are.

Repoint your SDK

Swap one base_url and add your Optera key in the x-proxy-key header; your existing OpenAI or Anthropic key passes through untouched. No vendor lock-in, no rewrite.

Watch in shadow

Optera measures would-have-saved on live traffic and reconciles it to your invoice. Nothing about your requests changes yet.

Flip to live

Turn on the levers you trust. Lossless ones are on already; anything that could touch quality stays opt-in and eval-gated.

python · openai
from openai import OpenAI

client = OpenAI(
    api_key="sk-your-provider-key",
    # base_url="https://api.openai.com/v1"
    base_url="https://proxy.optera.dev/v1",   # ← new base URL
    default_headers={"x-proxy-key": "YOUR_OPTERA_KEY"},  # ← your key, from the dashboard
)
# Response headers now report the split:
#   X-Savings-Measured-USD   (cache, routing, provider cache — invoice-checked)
#   X-Savings-Estimated-USD  (compression — labeled as an estimate)

/ 04 — what runs by default

Lossless wins first. Your answers stay your answers.

Most "optimizers" quietly rewrite your prompts and your outputs. Optera draws a hard line: the safe, measurable wins are on by default, and anything that could change a result is opt-in, quality-gated, and never touches code, JSON, or tool calls.

On by default · lossless

Exact-match cache

Identical requests return the stored response. Nothing about the prompt or the answer changes — it just doesn't get billed twice.

On by default · lossless

Provider prompt-cache injection

We place the cache breakpoints most teams get wrong, so your provider's free 90%-off prefix cache actually fires. Provable from the cached-token counts.

Opt-in · eval-gated

Model routing

Send the easy calls to a cheaper model — only when it clears a quality bar on your own traffic, with automatic fallback if it ever degrades.

Opt-in · advanced

Compression & response shaping

Trim repeated context and output waste. Off by default, never run on structured or code responses, and it won't bust the free provider cache.

/ 05 — where the money goes

From raw spend to what's left.

Each lever takes a measured bite out of the bill. You only pay for the tokens that remain.

Illustrative split. Your real waterfall is built from your reconciled traffic in shadow mode.

/ 06 — the number is real

Every dollar checks against your invoice.

Optera computes cost from the provider's own usage object — the exact tokens they bill you for, with cached reads at the discounted rate. Your dashboard total reconciles to your provider bill within about a percent. If it doesn't, that's a bug, not a rounding story. Savings are split into measured (cache, routing, provider cache) and estimated (compression), and we never dress one up as the other.

Optera-reportedProvider invoiceevery month, within ~1%

/ 07 — pricing

Pay for outcomes, not requests.

The headline plan bills on the savings we prove against your invoice. No savings, no fee.

Most teams Performance

25% of proven savings

A quarter of what Optera saves, measured against a locked baseline and reconciled to your invoice, plus a $200/mo platform floor.

You only pay when the bill goes down
Free shadow-mode proof first
Spend caps, budgets, and cost attribution
All levers, eval-gated routing

Start free in shadow mode

Self-serve

$0.50 / 1M tokens optimized

Metered on the value lever, not a request quota. For teams that want to flip it on themselves.

$0.50 per million tokens optimized — cache hits, routed calls, trimmed context
Shadow mode + full dashboard
Caching, routing, leak detection

Open dashboard

Enterprise

Self-host / BYO-VPC

Run Optera inside your own perimeter. Prompts and provider keys never leave your network.

Docker / Helm, single-tenant
SSO, audit, metadata-only mode
Platform fee + support

Talk to us

/ 08 — questions

The honest FAQ.

Those show you the bill or route your calls. Optera lowers the bill and proves it. We're not an observability dashboard or a gateway you configure — we're the savings, measured on your own traffic and billed on the result. The thing none of them ship is honest, invoice-reconciled, paid-on-savings.

It helps one case, and most teams place the cache breakpoints wrong or not at all. Optera makes your provider caching actually fire, then adds the levers it can't reach: a cross-request semantic cache, eval-gated model routing, and compression that's careful not to bust the free prefix cache.

Not by default. Response-side rewriting is off, and even when you turn it on it never runs on JSON, tool calls, or code. The default promise is simple: we don't change your model's answers.

No. Any optimization step that errors or runs over its time budget forwards your request unchanged. The proxy fails open by design, with a published latency budget — p99 added latency under ~15ms on a miss, or it gets out of the way.

Your provider keys pass through in memory and are never stored or logged. Prompts are stored only as much as caching and analytics require, with a metadata-only mode that keeps even hashes out of the database — and self-hosting keeps everything in your own VPC.

OpenAI and Anthropic today, including automatic translation between their API shapes, so existing clients work unchanged. Gemini and Bedrock are next.

Nothing. Shadow mode is free and read-only — you see the savings figure produced on your own traffic before you pay anything or change a line of behavior.

From the provider's reported usage, priced with a maintained price book (cached reads at the discounted rate). Cache and routing savings are exact; compression is a labeled estimate. The split shows up per request in the X-Savings-Measured-USD and X-Savings-Estimated-USD headers.

Lower your LLM bill.Prove it against your invoice.