pearl-ai/Gemma-4-31B-it-pearl

Pearl Gemma 4 instruction-tuned checkpoint for Pearl inference and mining. Like our other Pearl-certified models, it is intended to run with the Pearl vLLM mining plugin so inference can participate in Pearl mining (Proof-of-Useful-Work alongside useful compute). Layout and runtime fields are in config.json.

Benchmarks

Results from our evaluation runs (lmms-eval + vLLM, full test sets). Original is google/gemma-4-31B-it; Pearl is this checkpoint.

Model GPQA MMLU HumanEval (pass@1) MGSM3 MMMU-Pro Vision* Video-MME (short)
Original 77.27% 90.93% 94.70% 88.62% 54.57% 79.0%
Pearl 77.37% 90.56% 94.15% 89.09% 54.45% 78.2%

* MMMU-Pro Vision (Pearl / Original): lmms-eval mmmu_pro_vision, direct-answer prompting, max_new_tokens=256, full test set (1730 samples). Google reports 76.9% on MMMU Pro in the Gemma 4 model card; they do not publish the eval recipe for that figure (prompting, subset, or aggregation).

Pearl mining (vLLM plugin)

Pearl mining means serving through the Pearl miner stack: a pearld node (RPC), pearl-gateway, and the vLLM miner build that loads the Pearl plugin (NoisyGEMM / gateway integration). Details are in the miner README.

Typical flow:

  1. Run pearld with RPC enabled.
  2. Start the Pearl miner / vLLM image or workspace (plugin-enabled vLLM).
  3. Point the server at this model; gateway + miner components handle mining-side integration.

High-level prerequisites:

  • Python 3.12, uv, CUDA + NVIDIA GPU (see miner docs for supported architectures)
  • Rust toolchain (for Pearl miner build paths)
  • A running pearld with RPC credentials for the gateway

Docker (recommended for mining)

From the Pearl repository root:

docker buildx build -t vllm_miner . -f miner/vllm-miner/Dockerfile
docker run --rm -it --gpus all \
  -p 8000:8000 -p 8337:8337 -p 8339:8339 \
  -e PEARLD_RPC_URL=<PEARLD_URL> \
  -e PEARLD_RPC_USER=<RPC_USER> \
  -e PEARLD_RPC_PASSWORD=<RPC_PASSWORD> \
  -v ~/.cache/huggingface:/root/.cache/huggingface \
  --shm-size 8g \
  vllm_miner:latest \
  pearl-ai/Gemma-4-31B-it-pearl \
  --host 0.0.0.0 --port 8000 \
  --max-model-len 8192 \
  --gpu-memory-utilization 0.9 \
  --enforce-eager

Inference with vLLM

Serve from the Hugging Face Hub id (or from a local directory containing this snapshot):

uv run vllm serve pearl-ai/Gemma-4-31B-it-pearl \
  --host 0.0.0.0 \
  --port 8000 \
  --max-model-len 8192 \
  --gpu-memory-utilization 0.9 \
  --enforce-eager

Flags (same as above, one line):

uv run vllm serve pearl-ai/Gemma-4-31B-it-pearl --host 0.0.0.0 --port 8000 --max-model-len 8192 --gpu-memory-utilization 0.9 --enforce-eager

These commands load the full model (text + vision). Append --language-model-only if you only need text — it disables the vision tower and uses less GPU memory.

Model details

  • Architecture: Gemma4ForConditionalGeneration (model_type: gemma4)
  • Modalities: Text, Image

License

Use and redistribution are subject to the Gemma license terms from Google; this repository is a Pearl distribution of weights derived from that ecosystem.

Limitations

Models can produce incorrect or unsafe outputs. Validate in your environment before production use.

Downloads last month
15,873
Safetensors
Model size
31B params
Tensor type
BF16
·
F8_E4M3
·
I8
·
Inference Providers NEW
Input a message to start chatting with pearl-ai/Gemma-4-31B-it-pearl.

Model tree for pearl-ai/Gemma-4-31B-it-pearl

Quantized
(254)
this model