How to use from
llama.cpp
Install (macOS, Linux)
curl -LsSf https://llama.app/install.sh | sh
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf MaralGPT/MaralGPT-Mythos-9B-2606-GGUF:
# Run inference directly in the terminal:
llama cli -hf MaralGPT/MaralGPT-Mythos-9B-2606-GGUF:
Install from WinGet (Windows)
winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf MaralGPT/MaralGPT-Mythos-9B-2606-GGUF:
# Run inference directly in the terminal:
llama cli -hf MaralGPT/MaralGPT-Mythos-9B-2606-GGUF:
Use pre-built binary
# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf MaralGPT/MaralGPT-Mythos-9B-2606-GGUF:
# Run inference directly in the terminal:
./llama-cli -hf MaralGPT/MaralGPT-Mythos-9B-2606-GGUF:
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf MaralGPT/MaralGPT-Mythos-9B-2606-GGUF:
# Run inference directly in the terminal:
./build/bin/llama-cli -hf MaralGPT/MaralGPT-Mythos-9B-2606-GGUF:
Use Docker
docker model run hf.co/MaralGPT/MaralGPT-Mythos-9B-2606-GGUF:
Quick Links

MaralGPT Mythos 9B 2606 Edition

Quantization/GGUF Files

Quantization Notes
bf16 Original quantization
Q8_0 8-bits, perfect for gaming systems
Q4_K_M 4-bits, good but can be sketchy
Q2_K 2-bits, does not work properly

How to run (Ollama)

Imagine you want to run 8 bit version just do this:

ollama run hf.co/MaralGPT/MaralGPT-Mythos-9B-2606-GGUF:Q8_0 --verbose

And it will be downloaded and executed on your computer.

What is this model?

This model is an uncensored finetuned version of Qwen 3.5 with nine billion parameters which can be executed on pretty much any gaming systems. The data of this model was over 500 million tokens of synthetic data generated by state-of-the-art models such as GPT 5.5 or Claude 4.8 Opus and as long as we had access, Claude 5 Fable.

All so-called ethical barriers removed from the model using Heretic LLM library to make it a suitable tool for cybersecurity, biology and chemistry. You can easily ask anything you want from this model and it will answer without any censorship.

Key Features

  • 📝 Context window of over one million tokens.
  • 🔞 Uncensored answers
  • ♾️ Good at math, physics, chemistry, etc.
  • 💻 Can be executed on a gaming laptop

How to run

First, install needed libraries:

pip install transformers accelerate

Then:

import torch
from transformers import AutoModelForImageTextToText, AutoTokenizer

model_id = "MaralGPT/MaralGPT-Mythos-9B-2606"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForImageTextToText.from_pretrained(
    model_id, dtype="bfloat16", device_map="cuda"
)

messages = [
    {"role": "user",
     "content": "Write a simple snake game in python."}
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)

out = model.generate(
    **inputs, max_new_tokens=16384, do_sample=True,
    temperature=0.6, top_p=0.95, top_k=20, repetition_penalty=1.05,
)

print(tokenizer.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))

Benchmarks

Generic Benchmark

Above benchmark has been done on model parameters of:

temperature=0.6 top_p=0.95 top_k=20

And change in those values may change the results accordingly.

Detailed Benchmark

Downloads last month
2,288
GGUF
Model size
9B params
Architecture
qwen35
Hardware compatibility
Log In to add your hardware

2-bit

4-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for MaralGPT/MaralGPT-Mythos-9B-2606-GGUF

Finetuned
Qwen/Qwen3.5-9B
Quantized
(1)
this model