pearl-ai/Gemma-4-31B-it-pearl

Pearl Gemma 4 instruction-tuned checkpoint for Pearl inference and mining. Like our other Pearl-certified models, it is intended to run with the Pearl vLLM mining plugin so inference can participate in Pearl mining (Proof-of-Useful-Work alongside useful compute). Layout and runtime fields are in config.json.

Project website: https://pearlresearch.ai
Pearl repository: https://github.com/pearl-research-labs/pearl
Miner / vLLM plugin docs: https://github.com/pearl-research-labs/pearl/tree/master/miner

Benchmarks

Results from our evaluation runs (lmms-eval + vLLM, full test sets). Original is google/gemma-4-31B-it; Pearl is this checkpoint.

Model	GPQA	MMLU	HumanEval (pass@1)	MGSM3	MMMU-Pro Vision*	Video-MME (short)
Original	77.27%	90.93%	94.70%	88.62%	54.57%	79.0%
Pearl	77.37%	90.56%	94.15%	89.09%	54.45%	78.2%

* MMMU-Pro Vision (Pearl / Original): lmms-eval mmmu_pro_vision, direct-answer prompting, max_new_tokens=256, full test set (1730 samples). Google reports 76.9% on MMMU Pro in the Gemma 4 model card; they do not publish the eval recipe for that figure (prompting, subset, or aggregation).

Pearl mining (vLLM plugin)

Pearl mining means serving through the Pearl miner stack: a pearld node (RPC), pearl-gateway, and the vLLM miner build that loads the Pearl plugin (NoisyGEMM / gateway integration). Details are in the miner README.

Typical flow:

Run pearld with RPC enabled.
Start the Pearl miner / vLLM image or workspace (plugin-enabled vLLM).
Point the server at this model; gateway + miner components handle mining-side integration.

High-level prerequisites:

Python 3.12, uv, CUDA + NVIDIA GPU (see miner docs for supported architectures)
Rust toolchain (for Pearl miner build paths)
A running pearld with RPC credentials for the gateway

Docker (recommended for mining)

From the Pearl repository root:

docker buildx build -t vllm_miner . -f miner/vllm-miner/Dockerfile

docker run --rm -it --gpus all \
  -p 8000:8000 -p 8337:8337 -p 8339:8339 \
  -e PEARLD_RPC_URL=<PEARLD_URL> \
  -e PEARLD_RPC_USER=<RPC_USER> \
  -e PEARLD_RPC_PASSWORD=<RPC_PASSWORD> \
  -v ~/.cache/huggingface:/root/.cache/huggingface \
  --shm-size 8g \
  vllm_miner:latest \
  pearl-ai/Gemma-4-31B-it-pearl \
  --host 0.0.0.0 --port 8000 \
  --max-model-len 8192 \
  --gpu-memory-utilization 0.9 \
  --enforce-eager

Inference with vLLM

Serve from the Hugging Face Hub id (or from a local directory containing this snapshot):

uv run vllm serve pearl-ai/Gemma-4-31B-it-pearl \
  --host 0.0.0.0 \
  --port 8000 \
  --max-model-len 8192 \
  --gpu-memory-utilization 0.9 \
  --enforce-eager

Flags (same as above, one line):

uv run vllm serve pearl-ai/Gemma-4-31B-it-pearl --host 0.0.0.0 --port 8000 --max-model-len 8192 --gpu-memory-utilization 0.9 --enforce-eager

These commands load the full model (text + vision). Append --language-model-only if you only need text — it disables the vision tower and uses less GPU memory.

Model details

Architecture: Gemma4ForConditionalGeneration (model_type: gemma4)
Modalities: Text, Image

License

Use and redistribution are subject to the Gemma license terms from Google; this repository is a Pearl distribution of weights derived from that ecosystem.

Limitations

Models can produce incorrect or unsafe outputs. Validate in your environment before production use.

Downloads last month: 15,873

Safetensors

Model size

31B params

Tensor type

BF16

F8_E4M3

Model tree for pearl-ai/Gemma-4-31B-it-pearl

Base model

google/gemma-4-31B

Finetuned

google/gemma-4-31B-it

Quantized

(254)

this model