Caveman

Save 65% of your AI costs.

≈35% spent65% saved

Why use many token when few token do trick.

The stack

Five layers. Every token earns its place.

support-agent.ts
1const client =new OpenAI({
2 baseURL: "…/gateway/openai/v1",
3 "x-cave-agent": "support"
4});
caveman://gateway12,480 meteredlive
POST /openai/v1/chat/completionsbyte-safe
prompt tokens4,210
cached prefix reused2,890
billed to you1,320
model-visible bytes unchanged69% billed
01coming soon

Caveman Gateway

Compress any LLM traffic.

  • Swap one base URL — your agents never change
  • Truthful spend, metered token-for-token
  • Byte-safe: record mode never touches what the model sees
agent.ts
1await cavemem.recall({
2 query: "deploy api-gateway",
3 k: 3,
4});
cavemem · mcplocallive
recalldeploy api-gatewayhybrid
  • deploy/api-gateway.md0.94
  • runbooks/rollback.md0.81
  • threads/incident-2040.74
18.4k2.3k87% history

fts5 + vector · sqlite, nothing leaves the box

02

Cavemem

Agents that stop forgetting.

  • One persistent recall layer, served over MCP
  • Local SQLite with FTS5 and a vector index
  • Pull back what matters instead of re-sending it
caveman-code · v0.19.1claude-opus-4-8
autopilotmsgs1.1k/30klayers0/4
03

Caveman Code

A terminal agent on a token budget.

  • Four compression layers across 20+ providers
  • Plan first, then ship — one autonomous loop
  • Same models, roughly half the tokens
cave plan · inferred12,480 traces
ranked moves · per day+$0/day
    S0zero app change
    S1SDK cooperation
    S2eval-gated routing

    inferred rate · verified stays $0 until live

    04

    Cave Architect

    Telemetry becomes a ranked plan.

    • Measured spend turns into ordered moves
    • Each move carries its own dollars-per-day
    • Split by how much app change it costs you
    rollout · resolve-ticketlive
    1replay
    2shadow
    3canary
    4active
    eval gate0.997 ≥ 0.98 · pass

    auto-rollback armed · nothing counts until the gate passes

    05

    Eval-Gated Rollout

    Savings you actually earned.

    • Clears replay, shadow and canary before live
    • Gated on evals — nothing counts until it passes
    • Auto-reverts the moment quality slips

    Truthful spend, down to the byte. See exactly where every token goes, priced to the cent.

    Browser extension

    Don't code? Caveman help anyway.

    Lives in your chat box: ChatGPT, Claude, Gemini.

    ChatGPT with the Caveman extension — a compressed answer and the control panel docked beside it.
    Caveman off
    Caveman on

    Drag to compare

    Install browser extensionsoon
    Why this matters

    Tokens are a resource. Most stacks spend them like tap water.

    If tokens were free, bloat would be free. They aren't. Treat each token as a unit of intent and most systems reveal themselves as ~65% noise. Compression isn't just a cost story — it's a control story: precision, speed, and a model that thinks inside a tighter frame.

    Fewer words. Same work.

    The open stack is public today — read the source, send a patch. The managed Cloud is in private development; join the waitlist and we'll reach out when a spot opens.

    Browse the repos
    or email contact@caveman.so