Instructions to use Naphula/Goetia-26B-A4B-v1.4 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Naphula/Goetia-26B-A4B-v1.4 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="Naphula/Goetia-26B-A4B-v1.4")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("Naphula/Goetia-26B-A4B-v1.4")
model = AutoModelForMultimodalLM.from_pretrained("Naphula/Goetia-26B-A4B-v1.4")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use Naphula/Goetia-26B-A4B-v1.4 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Naphula/Goetia-26B-A4B-v1.4"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Naphula/Goetia-26B-A4B-v1.4",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/Naphula/Goetia-26B-A4B-v1.4

SGLang

How to use Naphula/Goetia-26B-A4B-v1.4 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Naphula/Goetia-26B-A4B-v1.4" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Naphula/Goetia-26B-A4B-v1.4",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Naphula/Goetia-26B-A4B-v1.4" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Naphula/Goetia-26B-A4B-v1.4",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use Naphula/Goetia-26B-A4B-v1.4 with Docker Model Runner:
```
docker model run hf.co/Naphula/Goetia-26B-A4B-v1.4
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

📜 Goetia 26B A4B v1.4

🧙‍♀ The Invocation

This is a merge of pre-trained language models created using mergekit.

Merge Details

Merge Method

This model was merged using the MoE DELLA merge method using google/gemma-4-26B-A4B as a base.

Models Merged

The following models were included in the merge:

⚙️ Configuration

architecture: Gemma4ForConditionalGeneration
base_model: B:\26B\google_gemma-4-26B-A4B
models:
  - model: B:\26B\BeaverAI_Orion-26B-A4B-v1b-GGUF
    parameters:
      weight:
        - filter: "lm_head"
          value: 0.0
        - filter: "embed_tokens"
          value: 0.0
        - value: 0.2
      density: 0.9
      epsilon: 0.09
  - model: B:\26B\ReadyArt_Serenity-26B-A4B-HB16-Q8_0
    parameters:
      weight:
        - filter: "lm_head"
          value: 0.0
        - filter: "embed_tokens"
          value: 0.0
        - value: 0.2
      density: 0.9
      epsilon: 0.09
  - model: B:\26B\zerofata_G4-MeroMero-26B-A4B
    parameters:
      weight:
        - filter: "lm_head"
          value: 0.0
        - filter: "embed_tokens"
          value: 0.0
        - value: 0.2
      density: 0.9
      epsilon: 0.09
  - model: B:\26B\Darkhn_Gemma-4-26B-A4B-Animus-V14.1-FFT
    parameters:
      weight:
        - filter: "lm_head"
          value: 0.0
        - filter: "embed_tokens"
          value: 0.0
        - value: 0.2
      density: 0.9
      epsilon: 0.09
  - model: B:\26B\Gryphe--Pantheon-Reasoning-26B-A4B-1.1
    parameters:
      weight:
        - filter: "lm_head"
          value: 0.0
        - filter: "embed_tokens"
          value: 0.0
        - value: 0.2
      density: 0.9
      epsilon: 0.09
  - model: B:\26B\Gryphe--Gemma-4-26B-A4B-StyleTune-V2
    parameters:
      weight:
        - filter: "lm_head"
          value: 1.0
        - filter: "embed_tokens"
          value: 1.0
        - value: 0.0
      density: 0.9
      epsilon: 0.09
merge_method: moe_della # v3 patches missing lm_head and embed_tokens
parameters:  
  lambda: 1.0
  normalize: false
  int8_mask: false
  rescale: true
  router_strategy: della # average # random_init
  blend_experts: true
dtype: float32
out_dtype: bfloat16
tokenizer:
  source: union
chat_template: auto
name: Goetia 26B A4B v1.4

This model is NOT currently uncensored. There are refusals, and standard jailbreak resistance.

However it can easily be decensored using Heretic, although I have not had time to uncensor v1.4 yet. To ablate using ARA (recommended), I'd start with this json, and follow the steps outlined in the v1.3 readme.

🧙 Heretic Grimoire

reproduce.json

{
  "version": "1.2.0-dev",
  "base_model": "Naphula/Goetia-26B-A4B-v1.3",
  "timestamp": "2026-06-19T08:04:47Z",
  "metrics": {
    "kl_divergence": 0.030937770381569862,
    "refusals": 3,
    "n_bad_prompts": 100
  },
  "parameters": {
    "start_layer_index": "14",
    "end_layer_index": "26",
    "preserve_good_behavior_weight": "1.4404",
    "steer_bad_behavior_weight": "0.0100",
    "overcorrect_relative_weight": "0.9144",
    "neighbor_count": "15"
  },
  "target_components": [
    "attn.o_proj"
  ],
  "hardware": "RTX 6000 Blackwell (96GB)"
}

💡 Details

v1.4 was made the same way as v1.3 except with additional patches and less models. See this page and the section below for more details.

Critical patch notes for G4 26B A4B moe_della merges

auto.py fix

    #if _get_tied_weight_keys is None:
    #    LOG.warning(
    #        "Unable to get tied weights - incompatible transformers version",
    #    )
    #    tied_keys = None
    #else:
    #    tied_keys = _get_tied_weight_keys(model)
    ####
    # Force untying for Gemma 4 configurations to ensure lm_head is compiled
    tied_keys = None
    ####
    if ignore_on_save is not None:
        ignore_on_save = set(ignore_on_save)

plan.py fix

#for model, w_in in zip(models, weights_in):
            #    index = LoaderCache().get(model).index
            #    if any(
            #        name in index.tensor_paths
            #        for name in [w_in.name] + (w_in.aliases or [])
            #    ):
            #        any_weight = True
            #        break
            for model, w_in in zip(models, weights_in):
                index = LoaderCache().get(model).index
                if any(
                    name in index.tensor_paths
                    for name in [w_in.name] + list(w_in.aliases or [])
                ):
                    any_weight = True
                    break

gemma4.json

{
  "model_type": "gemma4",
  "architectures": [
    "Gemma4ForConditionalGeneration"
  ],
  "num_layers_config_key": "text_config.num_hidden_layers",
  "vocab_size_config_key": "text_config.vocab_size",
  "pre_weights": [
    { "name": "model.language_model.embed_tokens.weight", "is_embed": true },
    { "name": "model.embed_vision.embedding_projection.weight", "optional": true },
    { "name": "model.vision_tower.std_bias", "optional": true },
    { "name": "model.vision_tower.std_scale", "optional": true },
    { "name": "model.vision_tower.patch_embedder.input_proj.weight", "optional": true },
    { "name": "model.vision_tower.patch_embedder.position_embedding_table", "optional": true }
  ],
  "layer_templates": {
    "weights": [
      { "name": "model.language_model.layers.${layer_index}.self_attn.q_proj.weight", "optional": true },
      { "name": "model.language_model.layers.${layer_index}.self_attn.k_proj.weight", "optional": true },
      { "name": "model.language_model.layers.${layer_index}.self_attn.v_proj.weight", "optional": true },
      { "name": "model.language_model.layers.${layer_index}.self_attn.o_proj.weight", "optional": true },
      { "name": "model.language_model.layers.${layer_index}.self_attn.q_norm.weight", "optional": true },
      { "name": "model.language_model.layers.${layer_index}.self_attn.k_norm.weight", "optional": true },
      { "name": "model.language_model.layers.${layer_index}.mlp.gate_proj.weight", "optional": true },
      { "name": "model.language_model.layers.${layer_index}.mlp.up_proj.weight", "optional": true },
      { "name": "model.language_model.layers.${layer_index}.mlp.down_proj.weight", "optional": true },
      { "name": "model.language_model.layers.${layer_index}.input_layernorm.weight", "optional": true },
      { "name": "model.language_model.layers.${layer_index}.post_attention_layernorm.weight", "optional": true },
      { "name": "model.language_model.layers.${layer_index}.pre_feedforward_layernorm.weight", "optional": true },
      { "name": "model.language_model.layers.${layer_index}.post_feedforward_layernorm.weight", "optional": true },
      { "name": "model.language_model.layers.${layer_index}.post_feedforward_layernorm_1.weight", "optional": true },
      { "name": "model.language_model.layers.${layer_index}.post_feedforward_layernorm_2.weight", "optional": true },
      { "name": "model.language_model.layers.${layer_index}.pre_feedforward_layernorm_2.weight", "optional": true },
      { "name": "model.language_model.layers.${layer_index}.router.per_expert_scale", "optional": true },
      { "name": "model.language_model.layers.${layer_index}.router.proj.weight", "optional": true },
      { "name": "model.language_model.layers.${layer_index}.router.scale", "optional": true },
      { "name": "model.language_model.layers.${layer_index}.experts.gate_up_proj", "optional": true },
      { "name": "model.language_model.layers.${layer_index}.experts.down_proj", "optional": true },
      { "name": "model.language_model.layers.${layer_index}.layer_scalar", "optional": true }
    ]
  },
  "post_weights": [
    { "name": "model.language_model.norm.weight" },
    { 
      "name": "lm_head.weight", 
      "is_embed": true, 
      "optional": true,
      "aliases": ["model.language_model.embed_tokens.weight"]
    }
  ]
}

yaml

architecture: Gemma4ForConditionalGeneration
base_model: B:\26B\google_gemma-4-26B-A4B
models:
  - model: B:\26B\BeaverAI_Orion-26B-A4B-v1b-GGUF
    parameters:
      weight:
        - filter: "lm_head"
          value: 0.0
        - filter: "embed_tokens"
          value: 0.0
        - value: 0.2
      density: 0.9
      epsilon: 0.09
  - model: B:\26B\ReadyArt_Serenity-26B-A4B-HB16-Q8_0
    parameters:
      weight:
        - filter: "lm_head"
          value: 0.0
        - filter: "embed_tokens"
          value: 0.0
        - value: 0.2
      density: 0.9
      epsilon: 0.09
  - model: B:\26B\zerofata_G4-MeroMero-26B-A4B
    parameters:
      weight:
        - filter: "lm_head"
          value: 0.0
        - filter: "embed_tokens"
          value: 0.0
        - value: 0.2
      density: 0.9
      epsilon: 0.09
  - model: B:\26B\Darkhn_Gemma-4-26B-A4B-Animus-V14.1-FFT
    parameters:
      weight:
        - filter: "lm_head"
          value: 0.0
        - filter: "embed_tokens"
          value: 0.0
        - value: 0.2
      density: 0.9
      epsilon: 0.09
  - model: B:\26B\Gryphe--Pantheon-Reasoning-26B-A4B-1.1
    parameters:
      weight:
        - filter: "lm_head"
          value: 0.0
        - filter: "embed_tokens"
          value: 0.0
        - value: 0.2
      density: 0.9
      epsilon: 0.09
  - model: B:\26B\Gryphe--Gemma-4-26B-A4B-StyleTune-V2
    parameters:
      weight:
        - filter: "lm_head"
          value: 1.0
        - filter: "embed_tokens"
          value: 1.0
        - value: 0.0
      density: 0.9
      epsilon: 0.09
merge_method: moe_della # v3 patches missing lm_head and embed_tokens
parameters:  
  lambda: 1.0
  normalize: false
  int8_mask: false
  rescale: true
  router_strategy: della # average # random_init
  blend_experts: true
dtype: float32
out_dtype: bfloat16
tokenizer:
  source: union
chat_template: auto
name: Goetia 26B A4B v1.4

With the above patches applied, it now appears to be merging correctly. We'll see if it generates the full sized 49.4GB safetensors instead of 47GB

[MoE_DELLA Audit] Layer: lm_head.weight | Lambda=1.00
  [BASE] google_gemma-4-26B-A4B
  Darkhn_Gemma-4-26B-A4B-Animus-V14.1-FFT           :                                                      0.0% (W:0.00 D:0.90 E:0.09 N:68.94)
  Gryphe--Gemma-4-26B-A4B-StyleTune-V2              : ██████████████████████████████████████████████████ 100.0% (W:1.00 D:0.90 E:0.09 N:68.94)
  Gryphe--Pantheon-Reasoning-26B-A4B-1.1            :                                                      0.0% (W:0.00 D:0.90 E:0.09 N:68.79)
  zerofata_G4-MeroMero-26B-A4B                      :                                                      0.0% (W:0.00 D:0.90 E:0.09 N:68.94)
Executing graph:  20%|███████████████████████████████████▊                                                                                                                                            | 1352/6635 [03:53<2:01:35,  1.38s/it]WARNING:mergekit.graph:Fast path OOM, falling back to chunking
WARNING:mergekit.graph:OOM at chunk 64, reducing to 32 (attempt 1, progress: 0/128)
WARNING:mergekit.graph:OOM at chunk 32, reducing to 16 (attempt 2, progress: 0/128)
Executing graph:  20%|████████████████████████████████████                                                                                                                                            | 1360/6635 [04:04<1:03:57,  1.37it/s]WARNING:mergekit.graph:Fast path OOM, falling back to chunking
WARNING:mergekit.graph:OOM at chunk 64, reducing to 32 (attempt 1, progress: 0/128)
WARNING:mergekit.graph:OOM at chunk 32, reducing to 16 (attempt 2, progress: 0/128)
Executing graph:  21%|█████████████████████████████████████▏                                                                                                                                            | 1384/6635 [04:19<13:57,  6.27it/s]
[MoE_DELLA Audit] Layer: model.language_model.layers.0.mlp.down_proj.weight | Lambda=1.00
  [BASE] google_gemma-4-26B-A4B
  BeaverAI_Orion-26B-A4B-v1b-GGUF                   : ██████████                                          20.1% (W:0.20 D:0.90 E:0.09 N:47.15)
  Darkhn_Gemma-4-26B-A4B-Animus-V14.1-FFT           : █████████                                           19.9% (W:0.20 D:0.90 E:0.09 N:46.66)
  Gryphe--Gemma-4-26B-A4B-StyleTune-V2              :                                                      0.0% (W:0.00 D:0.90 E:0.09 N:46.66)
  Gryphe--Pantheon-Reasoning-26B-A4B-1.1            : ██████████                                          20.1% (W:0.20 D:0.90 E:0.09 N:47.17)
  ReadyArt_Serenity-26B-A4B-HB16-Q8_0               : █████████                                           19.9% (W:0.20 D:0.90 E:0.09 N:46.66)
  zerofata_G4-MeroMero-26B-A4B                      : █████████                                           19.9% (W:0.20 D:0.90 E:0.09 N:46.66)
Executing graph:  21%|█████████████████████████████████████▎                                                                                                                                            | 1392/6635 [04:20<11:58,  7.29it/s]