Qwythos-9B-Claude-Mythos-5-1M-MLX-bf16

This repository contains a bfloat16 MLX conversion of empero-ai/Qwythos-9B-Claude-Mythos-5-1M for Apple Silicon inference with MLX, MLX-LM, MLX-VLM, and local apps that use MLX backends such as LM Studio.

No additional fine-tuning was performed for this repository. The weights were converted from the upstream checkpoint to MLX-compatible safetensors while preserving the upstream Apache-2.0 license and model behavior.

v3 Update Notice

This MLX conversion has been refreshed with the upstream v3 files. If you downloaded this model before the v3 refresh, please redownload or update this repository.

v3 is a hotfix for the embedded chat template. The updated files:

  • update the embedded chat template for preserved reasoning and adaptive thinking;
  • fix looping during long generation traces;
  • fix agentic use in harnesses such as OpenCode, Abacus, Hermes, and Claude Code.

Users with older local copies should update this MLX model before using it in LM Studio, MLX-LM, MLX-VLM, OpenCode, Abacus, Hermes, Claude Code, or other agentic harnesses.

Model Summary

  • Format: MLX safetensors
  • Precision: bfloat16
  • Parameters: about 9B
  • Context length: 1,048,576 tokens in the model config
  • Architecture: Qwen3.5-style hybrid attention text model
  • Primary use: local text generation and reasoning on Apple Silicon
  • Upstream model: empero-ai/Qwythos-9B-Claude-Mythos-5-1M

Qwythos-9B is a reasoning-focused model derived from Qwen3.5-9B and post-trained by Empero AI on Claude Mythos and Claude Fable reasoning traces. The upstream model card describes strong emphasis on long-context reasoning, native function calling, tool-augmented workflows, cybersecurity, biomedical reasoning, math, and agentic tasks.

Install

uv tool install mlx-lm

or:

pip install -U mlx-lm

For mlx-vlm usage:

pip install -U mlx-vlm

Quick Start with MLX-LM

mlx_lm.chat --model xunkutech-ai/Qwythos-9B-Claude-Mythos-5-1M-MLX-bf16

Python example:

from mlx_lm import load, generate

model_id = "xunkutech-ai/Qwythos-9B-Claude-Mythos-5-1M-MLX-bf16"

model, tokenizer = load(model_id)

messages = [
    {
        "role": "user",
        "content": "Explain how YaRN rope scaling enables long-context inference.",
    }
]

prompt = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
)

text = generate(
    model,
    tokenizer,
    prompt=prompt,
    max_tokens=2048,
    verbose=True,
)

print(text)

OpenAI-Compatible Local Server

mlx_lm.server --model xunkutech-ai/Qwythos-9B-Claude-Mythos-5-1M-MLX-bf16 --port 8080

Then call the local server:

curl -X POST "http://localhost:8080/v1/chat/completions" \
  -H "Content-Type: application/json" \
  --data '{
    "model": "xunkutech-ai/Qwythos-9B-Claude-Mythos-5-1M-MLX-bf16",
    "messages": [
      {
        "role": "user",
        "content": "Give a concise explanation of Gated DeltaNet attention."
      }
    ],
    "max_tokens": 1024
  }'

MLX-VLM Usage

This conversion was prepared with the MLX ecosystem. If your runtime supports this Qwen3.5 model through mlx-vlm, you can load it with:

from mlx_vlm import load, generate

model_id = "xunkutech-ai/Qwythos-9B-Claude-Mythos-5-1M-MLX-bf16"

model, processor = load(model_id)

result = generate(
    model=model,
    processor=processor,
    prompt="What is the capital of France?",
    max_tokens=128,
    temperature=0.6,
)

print(result.text)

Recommended Sampling

The upstream model is a reasoning model. It is usually best to allow enough generation budget for a reasoning trace and final answer.

Suggested defaults:

generation_kwargs = {
    "temperature": 0.6,
    "top_p": 0.95,
    "top_k": 20,
    "repetition_penalty": 1.05,
    "max_tokens": 4096,
}

Increase max_tokens for difficult reasoning, tool-use, code, or long-context tasks.

Conversion Notes

How to Convert

uv run --with mlx-vlm mlx_vlm.convert \
  --model empero-ai/Qwythos-9B-Claude-Mythos-5-1M \
  --mlx-path ./Qwythos-9B-Claude-Mythos-5-1M-MLX-bf16 \
  --dtype bfloat16 \
  --trust-remote-code

Compatibility Issues Encountered

Converting Qwythos-9B with mlx-vlm revealed two upstream compatibility bugs that required patches. These affect mlx-vlm ≤ 0.6.3 and any LM Studio build using the bundled MLX backends.

Issue 1: partial_rotary_factor Location Mismatch

mlx-vlm's qwen3_5.TextConfig.__post_init__ expected partial_rotary_factor to be inside the rope_parameters dictionary, but the upstream HuggingFace config places it at the text_config top level:

// Upstream config (partial_rotary_factor is OUTSIDE rope_parameters)
{
  "text_config": {
    "partial_rotary_factor": 0.25,
    "rope_parameters": {
      "type": "yarn",
      "mrope_section": [11, 11, 10],
      "rope_theta": 10000000
      // missing partial_rotary_factor → ValueError
    }
  }
}

This caused the error:

ValueError: rope_parameters must contain keys {'partial_rotary_factor', 'rope_theta', 'type', 'mrope_section'}

Fix applied to mlx_vlm/models/qwen3_5/config.py:

  1. Moved partial_rotary_factor from the rope_parameters default factory to a standalone TextConfig field with default 0.25.
  2. Removed "partial_rotary_factor" from the required-keys set in __post_init__.
  3. Added a sync step: if partial_rotary_factor is found at the text_config top level, copy it into rope_parameters so downstream code that accesses args.rope_parameters["partial_rotary_factor"] (e.g. qwen3_5/language.py line 1390) continues to work.

Issue 2: vision_config.model_type Value

The upstream config sets vision_config.model_type to qwen3_5_vision, but mlx_vlm.models.qwen3_vl.vision.VisionModel.__init__ only whitelists qwen3_vl, qwen3_5, and qwen3_5_moe.

This caused the error:

ValueError: Unsupported model type: qwen3_5_vision

Fix applied to config.json:

Changed vision_config.model_type from qwen3_5_vision to qwen3_5 before uploading.

Post-Conversion Config Fix for LM Studio

If you convert this model yourself and plan to use it in LM Studio (which ships its own bundled mlx-vlm), you may need to apply the same two fixes to the generated config.json:

import json

config_path = "path/to/config.json"
with open(config_path) as f:
    cfg = json.load(f)

# Fix 1: move partial_rotary_factor into rope_parameters
tc = cfg["text_config"]
pr = tc["rope_parameters"]
if "partial_rotary_factor" not in pr:
    pr["partial_rotary_factor"] = tc.pop("partial_rotary_factor", 0.25)

# Fix 2: correct vision model type
vc = cfg.get("vision_config", {})
if vc.get("model_type") == "qwen3_5_vision":
    vc["model_type"] = "qwen3_5"

with open(config_path, "w") as f:
    json.dump(cfg, f, indent=2)

Function Calling

The upstream Qwythos model is designed to follow Qwen3.5-style tool-calling templates. Use the tokenizer chat template with a tools argument when your runtime supports it, then parse emitted tool call blocks in your application.

Exact tool execution, validation, and safety policy should be handled outside the model.

Limitations and Safety

  • This is a format conversion, not a new training run. See the upstream model card for training data, benchmark details, and original limitations.
  • The model can produce incorrect details, especially for identifiers, citations, medical facts, security facts, and fast-changing topics. Use retrieval, tools, or human review when exactness matters.
  • The upstream model is intentionally less refusal-oriented than many assistant models. Add application-level safety controls for public or end-user-facing deployments.
  • Do not rely on this model as a sole source for medical, legal, financial, or security-critical decisions.

License

This MLX conversion is released under the same license as the upstream model: Apache-2.0.

Acknowledgements

Downloads last month
1,078
Safetensors
Model size
9B params
Tensor type
BF16
·
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for xunkutech-ai/Qwythos-9B-Claude-Mythos-5-1M-MLX-bf16

Finetuned
Qwen/Qwen3.5-9B
Quantized
(68)
this model