all products
Compression

CaveGemma

Why use many token when few do trick — now baked in weights

A fine-tune of google/gemma-4-31B-it that speaks caveman natively — dropping articles, filler and pleasantries while preserving code exactly. No system prompt or skill file required: the compression lives in the weights.

One prompt, two answers

prompt · no system prompt

Reverse a singly linked list in Python.

Flip each node next-pointer to prev. O(n) time, O(1) space.

```python
def reverse(head):
    prev = None
    while head:
        head.next, prev, head = prev, head, head.next
    return prev
```

tokens

49144

saved

66%

illustrative estimate · ~4 chars/token

code-fence accuracy99%
semantic similarity94%
compression vs. base65%

published eval figures · the code fence is byte-identical in both replies

What it can do

  • 65% compression vs. the base model, no prompt needed
  • 99% code-fence (byte-exact) accuracy; 94% semantic similarity
  • Article density driven down to ~1% (vs. ~8% baseline English)
  • Ships as merged weights (62.5GB) or a lightweight LoRA adapter (534MB)
  • Loads through Hugging Face transformers; QLoRA (rank 16)
language
Python
license
MIT · weights inherit Gemma terms