Instructions to use xunkutech-ai/Qwythos-9B-Claude-Mythos-5-1M-MLX-bf16 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use xunkutech-ai/Qwythos-9B-Claude-Mythos-5-1M-MLX-bf16 with MLX:
# Make sure mlx-lm is installed # pip install --upgrade mlx-lm # Generate text with mlx-lm from mlx_lm import load, generate model, tokenizer = load("xunkutech-ai/Qwythos-9B-Claude-Mythos-5-1M-MLX-bf16") prompt = "Write a story about Einstein" messages = [{"role": "user", "content": prompt}] prompt = tokenizer.apply_chat_template( messages, add_generation_prompt=True ) text = generate(model, tokenizer, prompt=prompt, verbose=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
- Pi
How to use xunkutech-ai/Qwythos-9B-Claude-Mythos-5-1M-MLX-bf16 with Pi:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "xunkutech-ai/Qwythos-9B-Claude-Mythos-5-1M-MLX-bf16"
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "mlx-lm": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "xunkutech-ai/Qwythos-9B-Claude-Mythos-5-1M-MLX-bf16" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use xunkutech-ai/Qwythos-9B-Claude-Mythos-5-1M-MLX-bf16 with Hermes Agent:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "xunkutech-ai/Qwythos-9B-Claude-Mythos-5-1M-MLX-bf16"
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default xunkutech-ai/Qwythos-9B-Claude-Mythos-5-1M-MLX-bf16
Run Hermes
hermes
- OpenClaw new
How to use xunkutech-ai/Qwythos-9B-Claude-Mythos-5-1M-MLX-bf16 with OpenClaw:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "xunkutech-ai/Qwythos-9B-Claude-Mythos-5-1M-MLX-bf16"
Configure OpenClaw
# Install OpenClaw: npm install -g openclaw@latest # Register the local server and set it as the default model: openclaw onboard --non-interactive --mode local \ --auth-choice custom-api-key \ --custom-base-url http://127.0.0.1:8080/v1 \ --custom-model-id "xunkutech-ai/Qwythos-9B-Claude-Mythos-5-1M-MLX-bf16" \ --custom-provider-id mlx-lm \ --custom-compatibility openai \ --custom-text-input \ --accept-risk \ --skip-health
Run OpenClaw
openclaw agent --local --agent main --message "Hello from Hugging Face"
- MLX LM
How to use xunkutech-ai/Qwythos-9B-Claude-Mythos-5-1M-MLX-bf16 with MLX LM:
Generate or start a chat session
# Install MLX LM uv tool install mlx-lm # Interactive chat REPL mlx_lm.chat --model "xunkutech-ai/Qwythos-9B-Claude-Mythos-5-1M-MLX-bf16"
Run an OpenAI-compatible server
# Install MLX LM uv tool install mlx-lm # Start the server mlx_lm.server --model "xunkutech-ai/Qwythos-9B-Claude-Mythos-5-1M-MLX-bf16" # Calling the OpenAI-compatible server with curl curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "xunkutech-ai/Qwythos-9B-Claude-Mythos-5-1M-MLX-bf16", "messages": [ {"role": "user", "content": "Hello"} ] }'
Qwythos-9B-Claude-Mythos-5-1M-MLX-bf16
This repository contains a bfloat16 MLX conversion of empero-ai/Qwythos-9B-Claude-Mythos-5-1M for Apple Silicon inference with MLX, MLX-LM, MLX-VLM, and local apps that use MLX backends such as LM Studio.
No additional fine-tuning was performed for this repository. The weights were converted from the upstream checkpoint to MLX-compatible safetensors while preserving the upstream Apache-2.0 license and model behavior.
v3 Update Notice
This MLX conversion has been refreshed with the upstream v3 files. If you downloaded this model before the v3 refresh, please redownload or update this repository.
v3 is a hotfix for the embedded chat template. The updated files:
- update the embedded chat template for preserved reasoning and adaptive thinking;
- fix looping during long generation traces;
- fix agentic use in harnesses such as OpenCode, Abacus, Hermes, and Claude Code.
Users with older local copies should update this MLX model before using it in LM Studio, MLX-LM, MLX-VLM, OpenCode, Abacus, Hermes, Claude Code, or other agentic harnesses.
Model Summary
- Format: MLX safetensors
- Precision: bfloat16
- Parameters: about 9B
- Context length: 1,048,576 tokens in the model config
- Architecture: Qwen3.5-style hybrid attention text model
- Primary use: local text generation and reasoning on Apple Silicon
- Upstream model:
empero-ai/Qwythos-9B-Claude-Mythos-5-1M
Qwythos-9B is a reasoning-focused model derived from Qwen3.5-9B and post-trained by Empero AI on Claude Mythos and Claude Fable reasoning traces. The upstream model card describes strong emphasis on long-context reasoning, native function calling, tool-augmented workflows, cybersecurity, biomedical reasoning, math, and agentic tasks.
Install
uv tool install mlx-lm
or:
pip install -U mlx-lm
For mlx-vlm usage:
pip install -U mlx-vlm
Quick Start with MLX-LM
mlx_lm.chat --model xunkutech-ai/Qwythos-9B-Claude-Mythos-5-1M-MLX-bf16
Python example:
from mlx_lm import load, generate
model_id = "xunkutech-ai/Qwythos-9B-Claude-Mythos-5-1M-MLX-bf16"
model, tokenizer = load(model_id)
messages = [
{
"role": "user",
"content": "Explain how YaRN rope scaling enables long-context inference.",
}
]
prompt = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
)
text = generate(
model,
tokenizer,
prompt=prompt,
max_tokens=2048,
verbose=True,
)
print(text)
OpenAI-Compatible Local Server
mlx_lm.server --model xunkutech-ai/Qwythos-9B-Claude-Mythos-5-1M-MLX-bf16 --port 8080
Then call the local server:
curl -X POST "http://localhost:8080/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "xunkutech-ai/Qwythos-9B-Claude-Mythos-5-1M-MLX-bf16",
"messages": [
{
"role": "user",
"content": "Give a concise explanation of Gated DeltaNet attention."
}
],
"max_tokens": 1024
}'
MLX-VLM Usage
This conversion was prepared with the MLX ecosystem. If your runtime supports
this Qwen3.5 model through mlx-vlm, you can load it with:
from mlx_vlm import load, generate
model_id = "xunkutech-ai/Qwythos-9B-Claude-Mythos-5-1M-MLX-bf16"
model, processor = load(model_id)
result = generate(
model=model,
processor=processor,
prompt="What is the capital of France?",
max_tokens=128,
temperature=0.6,
)
print(result.text)
Recommended Sampling
The upstream model is a reasoning model. It is usually best to allow enough generation budget for a reasoning trace and final answer.
Suggested defaults:
generation_kwargs = {
"temperature": 0.6,
"top_p": 0.95,
"top_k": 20,
"repetition_penalty": 1.05,
"max_tokens": 4096,
}
Increase max_tokens for difficult reasoning, tool-use, code, or long-context
tasks.
Conversion Notes
How to Convert
uv run --with mlx-vlm mlx_vlm.convert \
--model empero-ai/Qwythos-9B-Claude-Mythos-5-1M \
--mlx-path ./Qwythos-9B-Claude-Mythos-5-1M-MLX-bf16 \
--dtype bfloat16 \
--trust-remote-code
Compatibility Issues Encountered
Converting Qwythos-9B with mlx-vlm revealed two upstream compatibility bugs
that required patches. These affect mlx-vlm ≤ 0.6.3 and any LM Studio build
using the bundled MLX backends.
Issue 1: partial_rotary_factor Location Mismatch
mlx-vlm's qwen3_5.TextConfig.__post_init__ expected
partial_rotary_factor to be inside the rope_parameters dictionary, but the
upstream HuggingFace config places it at the text_config top level:
// Upstream config (partial_rotary_factor is OUTSIDE rope_parameters)
{
"text_config": {
"partial_rotary_factor": 0.25,
"rope_parameters": {
"type": "yarn",
"mrope_section": [11, 11, 10],
"rope_theta": 10000000
// missing partial_rotary_factor → ValueError
}
}
}
This caused the error:
ValueError: rope_parameters must contain keys {'partial_rotary_factor', 'rope_theta', 'type', 'mrope_section'}
Fix applied to mlx_vlm/models/qwen3_5/config.py:
- Moved
partial_rotary_factorfrom therope_parametersdefault factory to a standaloneTextConfigfield with default0.25. - Removed
"partial_rotary_factor"from the required-keys set in__post_init__. - Added a sync step: if
partial_rotary_factoris found at thetext_configtop level, copy it intorope_parametersso downstream code that accessesargs.rope_parameters["partial_rotary_factor"](e.g.qwen3_5/language.pyline 1390) continues to work.
Issue 2: vision_config.model_type Value
The upstream config sets vision_config.model_type to qwen3_5_vision, but
mlx_vlm.models.qwen3_vl.vision.VisionModel.__init__ only whitelists
qwen3_vl, qwen3_5, and qwen3_5_moe.
This caused the error:
ValueError: Unsupported model type: qwen3_5_vision
Fix applied to config.json:
Changed vision_config.model_type from qwen3_5_vision to qwen3_5 before
uploading.
Post-Conversion Config Fix for LM Studio
If you convert this model yourself and plan to use it in LM Studio (which ships
its own bundled mlx-vlm), you may need to apply the same two fixes to the
generated config.json:
import json
config_path = "path/to/config.json"
with open(config_path) as f:
cfg = json.load(f)
# Fix 1: move partial_rotary_factor into rope_parameters
tc = cfg["text_config"]
pr = tc["rope_parameters"]
if "partial_rotary_factor" not in pr:
pr["partial_rotary_factor"] = tc.pop("partial_rotary_factor", 0.25)
# Fix 2: correct vision model type
vc = cfg.get("vision_config", {})
if vc.get("model_type") == "qwen3_5_vision":
vc["model_type"] = "qwen3_5"
with open(config_path, "w") as f:
json.dump(cfg, f, indent=2)
Function Calling
The upstream Qwythos model is designed to follow Qwen3.5-style tool-calling
templates. Use the tokenizer chat template with a tools argument when your
runtime supports it, then parse emitted tool call blocks in your application.
Exact tool execution, validation, and safety policy should be handled outside the model.
Limitations and Safety
- This is a format conversion, not a new training run. See the upstream model card for training data, benchmark details, and original limitations.
- The model can produce incorrect details, especially for identifiers, citations, medical facts, security facts, and fast-changing topics. Use retrieval, tools, or human review when exactness matters.
- The upstream model is intentionally less refusal-oriented than many assistant models. Add application-level safety controls for public or end-user-facing deployments.
- Do not rely on this model as a sole source for medical, legal, financial, or security-critical decisions.
License
This MLX conversion is released under the same license as the upstream model: Apache-2.0.
Acknowledgements
- Original model: empero-ai/Qwythos-9B-Claude-Mythos-5-1M
- Original developers: Empero AI
- Base family: Qwen3.5
- MLX ecosystem: Apple MLX, MLX-LM, and MLX-VLM
- Downloads last month
- 1,078
Quantized
Model tree for xunkutech-ai/Qwythos-9B-Claude-Mythos-5-1M-MLX-bf16
Base model
Qwen/Qwen3.5-9B-Base