Drop-in proxy for OpenAI & Anthropic

Cut your LLM bill.
Change one line of code.

Optera sits in front of your model provider and trims the waste out of every request — semantic caching, model routing, prompt compression, and leak detection — before you ever pay for it.

No SDK rewrite. Swap your base URL. Keep your code.
Saved this month LIVE
$0
0 tokens never sent · across 0 requests
Semantic cache 41%
Model routing 28%
Prompt compression 21%
Output stripping 10%
Your app
unchanged code
⊿ Optera
optimizes in-flight
OpenAI · Anthropic
fewer, cheaper tokens

Requests go in raw. They come out cached, routed, and compressed — you only pay for what's left.

// integration

The whole integration is the base URL.

Point your existing SDK at the proxy and pass your provider key through. Nothing else changes.

Python
Node
cURL
from openai import OpenAI

client = OpenAI(
    # base_url="https://api.openai.com/v1"
    base_url="https://proxy.optera.dev/v1",  # <- the only change
    api_key="sk-your-provider-key",
)

resp = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello"}],
)
# response headers now include: X-Tokens-Saved, X-Cost-Saved
import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://proxy.optera.dev/v1", // <- the only change
  apiKey: process.env.PROVIDER_KEY,
});

const resp = await client.chat.completions.create({
  model: "gpt-4o",
  messages: [{ role: "user", content: "Hello" }],
});
# Same request, just a different host.
curl https://proxy.optera.dev/v1/chat/completions \
  -H "Authorization: Bearer $PROVIDER_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4",
    "messages": [{"role":"user","content":"Hello"}]
  }'

# Works with the Anthropic Messages API shape too —
# the proxy translates between formats automatically.
1 line
to integrate — just the base URL
<5ms
added latency on a cache miss
7
optimization layers per request
2+
providers (OpenAI, Anthropic)
// 01 — how it works

Three steps. Then it just runs.

Set it up once. Every request after that is optimized automatically, and the savings stream into your dashboard.

01

Repoint your SDK

Change one base URL and pass your existing OpenAI or Anthropic key through. No vendor lock-in, no rewrite.

02

We optimize in-flight

Each request is checked against the cache, routed to the right model, compressed, and stripped of waste before it hits the provider.

03

Watch the savings

Sign in to the dashboard to see dollars and tokens saved per feature, per model, and per request — in real time.

// 02 — what runs on every request

Seven layers of waste, removed.

Each layer is independent and toggleable per workspace. Turn them all on, or only the ones you trust.

cache

Semantic caching

Near-identical prompts are matched by embedding similarity and served from cache — no second call to the provider.

routing

Smart model routing

Simple prompts get routed to cheaper models automatically, while hard ones stay on your flagship. You set the rules.

compression

Request compression

Bloated system prompts and redundant context are trimmed before they're tokenized — without touching the meaning.

output

Markdown & preamble stripping

Cut the "Sure! Here's…" preambles and decorative markdown that inflate output token counts on every response.

context

Summarization

Long, repetitive context windows are condensed so you're not re-sending the same history at full price every turn.

detection

Leak detection

Background detectors scan your traffic hourly for wasteful patterns — runaway prompts, retries, dead context — and flag them.

analytics

Real-time analytics

Every saved dollar and token is logged and attributed to the exact feature that saved it, broken down by model and time.

compat

Drop-in compatible

Speaks both the OpenAI and Anthropic API shapes and translates between them, so existing clients work as-is.

roadmap

More providers, soon

Gemini and Bedrock support are on the way. The proxy layer stays the same — only your model strings change.

// 03 — the dashboard

Sign in and watch the meter run backwards.

Every workspace gets a live view of what it's saving — and exactly which layer earned it.

app.optera.dev/dashboard
Overview
Requests
Features
Routing
Leaks
API keys
Settings

Overview

workspace: production · last 30 days
Cost saved
$1,284.40
↓ 38% vs. raw
Tokens saved
9.6M
across 142k reqs
Cache hit rate
41%
semantic + exact
Spent vs. saved — by week
W1
W2
W3
W4

Dashboard figures shown are illustrative placeholders — your real numbers replace them on first sign-in.

Works with the providers you already use
OpenAI Anthropic Gemini — soon Bedrock — soon
// 04 — questions

The honest FAQ.

Is it really just one line?+

Yes. You change the base_url your existing SDK points at and pass your provider key through. Your model strings, messages, and response handling stay exactly the same.

Do you store my prompts or responses?+

Caching requires storing request and response bodies, and analytics logs token counts. You control retention per workspace, and self-hosted deployments keep all data inside your own infrastructure. Swap this answer with your real data-retention policy before launch.

How much latency does the proxy add?+

On a cache hit, requests return without ever touching the provider — faster than direct. On a miss, the optimization layers add a few milliseconds before forwarding. Net effect on most workloads is faster, not slower.

Which providers and models work?+

OpenAI and Anthropic today, including automatic translation between their API shapes. Gemini and Bedrock are on the roadmap. Any model your provider exposes works through the proxy.

How are the savings calculated?+

For each request we measure the tokens and cost you would have paid sending it raw, subtract what was actually billed after optimization, and attribute the difference to the specific layer responsible. It's surfaced per request in the X-Tokens-Saved and X-Cost-Saved response headers.

Can I self-host it?+

Yes — the Scale tier supports self-hosted and VPC deployments so prompt data never leaves your environment. Get in touch and we'll walk through it.

Stop paying for tokens
you don't need.

Repoint one base URL, then sign in and watch the savings land.