all products
Compression
CaveGemma
Why use many token when few do trick — now baked in weights
A fine-tune of google/gemma-4-31B-it that speaks caveman natively — dropping articles, filler and pleasantries while preserving code exactly. No system prompt or skill file required: the compression lives in the weights.
One prompt, two answers
prompt · no system prompt
Reverse a singly linked list in Python.
Flip each node next-pointer to prev. O(n) time, O(1) space.
```python
def reverse(head):
prev = None
while head:
head.next, prev, head = prev, head, head.next
return prev
```tokens
49144
saved
66%
illustrative estimate · ~4 chars/token
code-fence accuracy99%
semantic similarity94%
compression vs. base65%
published eval figures · the code fence is byte-identical in both replies
What it can do
- 65% compression vs. the base model, no prompt needed
- 99% code-fence (byte-exact) accuracy; 94% semantic similarity
- Article density driven down to ~1% (vs. ~8% baseline English)
- Ships as merged weights (62.5GB) or a lightweight LoRA adapter (534MB)
- Loads through Hugging Face transformers; QLoRA (rank 16)
- language
- Python
- license
- MIT · weights inherit Gemma terms