Caveman — the token-efficient stack for agent-native development

01The skill & the engine

Caveman for developers.

01open source · MIT

available

Caveman Skill

The skill your agent already speaks. Install once and Claude Code, Codex, Cursor and 30+ agents answer in caveman — 65% fewer output tokens on average, with code, commands, and errors byte-for-byte exact.

Compression workbench local demo★ 72.8k

intensity

Input · prose156 est. tok

Output · caveman49 est. tok

Auth: validate email pre-insert; reject disposable domains; keep `oauth_callback`; log `organization_id` + `request_id`; omit query/secrets; test valid/malformed/duplicate; return `invalid_email`.

illustrative rewrite · not production engine output

Token map · first 48 kept removed code

01voiceactive

02structuralavailable

03CCRunused

04recoverynot needed

02commercial · account-gated

in development

Caveman Engine

The compression core underneath the skill. One command wraps your agent in a byte-safe local proxy that compresses context before it costs you tokens — and your prompts never leave the machine.

first run · observe onlyillustrative session

$ cave wrap claude
→ proxying claude → api.anthropic.com  (byte-safe; your bytes are untouched)
→ compression: OFF — no account yet
14 requests · 512k tokens sent
compression would have cut ~310k of those (61%) — measured locally (inferred)
turn it on:  caveman login   (free · 1 seat · no card)

where it runs

your machine

caveman wrap claude — compression turns on with a free account, one real seat.

our cloud

The managed gateway and the savings dashboard, when you want it hosted.

your datacenter

On-prem for Enterprise — prompts never leave your network, zero data retention enforced at write time. chapter 02 ↓

02The full stack

Caveman for enterprise.

Measurement you can audit, optimization that can't ship ungated, and a data boundary your security team can read. Every number states its own basis.

Caveman Cloud · the managed planein private development · waitlist open

Compression, every method

Watch it work. Nine compressors for the traffic agents actually send — JSON, logs, code, tables, bulk context — and you can always get the original back.

est. tokens the model sees

1,842 → 1,842−74%

{"files": [
  {"name": "proxy.go", "size": 48213, "sha": "9c1f2ea7",
    "mode": "0644", "uid": 501, "gid": 20, "mtime": "2026-07-22T09:14:02Z",
    "atime": "2026-07-22T11:02:41Z", "ctime": "2026-07-22T09:14:02Z" },
  {"name": "plan.go", "size": 9911, "sha": "e07ab112", "mode": "0644", … },
  {"name": "cache.go", "size": 18402, "sha": "77b09e0c", "mode": "0644", … },
  … 45 more objects, identical shape …
  · 47 rows kept as {name, size, sha}
]}
‹caveman: shrank · 1,842→486 est. tok · repeated shape elided · recover: ccr_9f2c…›

+ schemas · diffs · search · html

illustrative samples · est. tokens (o200k) · not production engine output · originals recoverable via ccr

Every dollar explains itself

Who spent it, through which key, on which model — and why it burned.

uncached prefixoversized toolspremium on easy workretriesrest

$341.70

Cave Plan

Ranked moves with a dollar figure on each, read from your own traffic.

Cache the repeated system prefixzero app change

$0/day

Defer 14 unused tool schemasSDK

$0/day

Route eval-runs to a cheaper modeleval-gated

$0/day

found today, on your traffic$0/day · inferred

Model routing

The cheapest model in your pool that passes your evals — or it stays put.

summarize tool output

sonnet-5 → haiku-4-5 · gate ✓

write the migration

stays — nothing cheaper passed

extract invoice fields

gpt-5.5 → gemini-3-flash · gate ✓

Eval-gated rollout

Nothing touches live traffic until it passes its quality gate.

recordreplayshadowcanaryactive

gate ✓ at every stage → active

Caching, done for you

Provider-native cache hints, added upstream only.

req 1 · prefix 41k → sent + cache_control

req 2 · same prefix → cache hit

req 3 · same prefix → cache hit

model-visible bytes untouched · today's only verified dollars

Runs where your prompts live

One engine, three rooms — on Enterprise our cloud sees nothing at all.

hosted

Managed gateway, BYOK — never resold.

your cloud

Helm-deployed in your VPC.

on-prem

Your datacenter · ZDR — the uplink refuses.

sso/saml · five-role rbac · row-level org isolation · audit log · ed25519-signed receipts (manual export today)

spend is priced from provider usage × the public catalog — unknown models stay unpriced · verified savings start at $0 and move only on provider-causal evidence

the full room →

03Models & research

Caveman for research.

Caveman Labs · Research division4 papers

CaveGemma

fine-tune of google/gemma-4-31B-it · QLoRA r16 · MIT · weights inherit Gemma terms

Get the weights →

27%

fewer output tokens · 193 pairs

96–100%

code-fence exactness

0.91–0.98

semantic cosine

534MB

LoRA adapter

Specimen · logs as pixelsFig. 00

You are Claude Code, Anthropic's official CLI for Claude. You are

an interactive agent that helps users with software engineering

tasks. Use the instructions below and the tools available to you

to assist the user. IMPORTANT: Assist with defensive security

tasks only. Refuse to create code that may be used maliciously.

# Tool usage policy

- When doing file search, prefer to use the Task tool in order to

reduce context usage. If you intend to call multiple tools and

there are no dependencies between the calls, make all of the

independent calls in the same function_calls block.

# Doing tasks: The user will primarily request you perform

software engineering tasks. Never introduce code that exposes or

logs secrets and keys. You MUST answer concisely with fewer than

4 lines of text, unless the user asks for detail. IMPORTANT: You

should minimize output tokens as much as possible. One word

answers are best. Answer the user's question directly.

You are Claude Code, Anthropic's official CLI for Claude. You are

an interactive agent that helps users with software engineering

tasks. Use the instructions below and the tools available to you

to assist the user. IMPORTANT: Assist with defensive security

tasks only. Refuse to create code that may be used maliciously.

# Tool usage policy

- When doing file search, prefer to use the Task tool in order to

reduce context usage. If you intend to call multiple tools and

there are no dependencies between the calls, make all of the

independent calls in the same function_calls block.

# Doing tasks: The user will primarily request you perform

software engineering tasks. Never introduce code that exposes or

logs secrets and keys. You MUST answer concisely with fewer than

4 lines of text, unless the user asks for detail. IMPORTANT: You

should minimize output tokens as much as possible. One word

answers are best. Answer the user's question directly.

Input · agent system promptOutput · −65%

Fig. 01

Logs as pixels

Text into PNG. When it wins, when it loses.

Read now →

Fig. 02

The caveman phenomenon

73k stars for a prompt that drops articles.

Read now →

Fig. 03

CaveBench methodology

Savings at held quality. Cost per correct task.

Read now →

Fig. 04

The zero-dollar dashboard

Why verified savings starts at nothing.

Read now →

04News

What we think, in full.

Open the archive

A letterJul 2026

Efficiency and the token economy

The price of a token collapsed and the bill went up. What follows from reading that correctly.

Read

The proof is public.

Every number here is live and cited — held to the same honesty we hold our own metering to.

74kGitHub stars, live #1Hacker News & GitHub Trending

Top 220of every public repo on GitHub

Top 50of every skill on skills.sh

As seen on

Hacker News Product Hunt TrendshiftBNR NieuwsradioLeiden University HyperAI daily.dev

Fewer words. Same work.

The skill and toolkit are public today — read the source, send a patch. The wrap and the managed Cloud are in private development; leave your email and we'll reach out the moment a spot opens.

Browse the reposemail contact@caveman.so

Cut 65% of your tokens.

Caveman Skill

Caveman Engine

CaveGemma

Logs as pixels

The caveman phenomenon

CaveBench methodology

The zero-dollar dashboard

What we think, in full.

Efficiency and the token economy

The proof is public.

Fewer words. Same work.