Instructions to use JBrussee/gemma-4-31B-caveman-lora with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use JBrussee/gemma-4-31B-caveman-lora with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("unsloth/gemma-4-31b-it-unsloth-bnb-4bit") model = PeftModel.from_pretrained(base_model, "JBrussee/gemma-4-31B-caveman-lora") - Notebooks
- Google Colab
- Kaggle
gemma-4-31B-caveman-lora
LoRA adapter that makes google/gemma-4-31B-it speak caveman-mode natively.
Drops articles, filler, pleasantries, hedging. Allows fragments. Keeps code blocks, function names, error strings, and CLI commands byte-exact. Pattern: [thing] [action] [reason]. [next step].
For the convenience-bundled merged variant see JBrussee/gemma-4-31B-caveman (62 GB).
Use
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
base = AutoModelForCausalLM.from_pretrained(
"google/gemma-4-31B-it",
torch_dtype=torch.bfloat16,
device_map="auto",
)
tok = AutoTokenizer.from_pretrained("google/gemma-4-31B-it")
model = PeftModel.from_pretrained(base, "JBrussee/gemma-4-31B-caveman-lora")
msgs = [{"role": "user", "content": "Explain database connection pooling."}]
ids = tok.apply_chat_template(msgs, return_tensors="pt", add_generation_prompt=True).to(model.device)
out = model.generate(ids, max_new_tokens=300, do_sample=False)
print(tok.decode(out[0, ids.shape[1]:], skip_special_tokens=True))
Specifics
- Adapter file:
adapter_model.safetensors(~534 MB) - Targets:
q_proj,k_proj,v_proj,o_proj,gate_proj,up_proj,down_proj - Rank 16, α 32, dropout 0
- Trained with Unsloth + TRL 0.17 SFTTrainer,
completion_only_loss=True - 3 epochs over 1750 train / 193 eval pairs
Eval (n=193 holdout)
| Category | n | compression | article density | code_fence_match | semantic_sim |
|---|---|---|---|---|---|
| dialogue | 28 | 0.59 | 0.020 | 1.000 | 0.91 |
| debug | 34 | 0.92 | 0.009 | 0.995 | 0.98 |
| refactor | 27 | 0.92 | 0.005 | 0.963 | 0.98 |
| qa | 104 | 0.65 | 0.007 | 1.000 | 0.92 |
Code preservation 96-100%, article density 0.5-2%, semantic preservation 91-98%. Compression ~10-40% (weaker than gold's 50-70%).
Reproduce
Full code, data pipeline, and configs: https://github.com/JuliusBrussee/finetune-caveman
License
Inherits the Gemma Prohibited Use Policy. Apache 2.0 base + Gemma terms apply to all outputs. Repository code is MIT. The caveman style ruleset is MIT (https://github.com/JuliusBrussee/caveman).
- Downloads last month
- 42