Instructions to use YuYu1015/YuYu1015-Ornith-1.0-35B-abliterated with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use YuYu1015/YuYu1015-Ornith-1.0-35B-abliterated with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="YuYu1015/YuYu1015-Ornith-1.0-35B-abliterated")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("YuYu1015/YuYu1015-Ornith-1.0-35B-abliterated")
model = AutoModelForMultimodalLM.from_pretrained("YuYu1015/YuYu1015-Ornith-1.0-35B-abliterated")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use YuYu1015/YuYu1015-Ornith-1.0-35B-abliterated with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "YuYu1015/YuYu1015-Ornith-1.0-35B-abliterated"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "YuYu1015/YuYu1015-Ornith-1.0-35B-abliterated",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/YuYu1015/YuYu1015-Ornith-1.0-35B-abliterated

SGLang

How to use YuYu1015/YuYu1015-Ornith-1.0-35B-abliterated with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "YuYu1015/YuYu1015-Ornith-1.0-35B-abliterated" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "YuYu1015/YuYu1015-Ornith-1.0-35B-abliterated",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "YuYu1015/YuYu1015-Ornith-1.0-35B-abliterated" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "YuYu1015/YuYu1015-Ornith-1.0-35B-abliterated",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use YuYu1015/YuYu1015-Ornith-1.0-35B-abliterated with Docker Model Runner:
```
docker model run hf.co/YuYu1015/YuYu1015-Ornith-1.0-35B-abliterated
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

YuYu1015-Ornith-1.0-35B-abliterated

English | 繁體中文

English

🔄 Re-uploaded 2026-06-30 — please re-download

The model was upgraded on 2026-06-30. What changed vs the old version: moralizing roughly halved (~28% → ~14%), and the reasoning degradation of the old version was fixed — the new model now matches the base on GSM8K (80%) and even on the hardest competition-math (MATH-500 L4-5). If you downloaded an earlier copy, please re-download to get the improved version.

📦 Quantized versions: NVFP4 (Blackwell · vLLM / SGLang) · GGUF (llama.cpp · Q8_0 / UD-Q6_K / UD-Q4_K_M)

⚠️ READ FIRST — Sampling Parameters MUST Be Set Correctly

This model requires the exact sampling parameters below, especially keeping repeat-penalty at 1.0 (the default — do not raise it). Wrong values break it:

Setting Result

repeat-penalty 1.0 ✅ recommended (default); this 35B rarely loops — only occasionally

repeat-penalty 1.05 truncated / unfinished answers

temp 0 (greedy) not recommended

Setting	Result
`repeat-penalty 1.0` ✅	recommended (default); this 35B rarely loops — only occasionally
`repeat-penalty 1.05`	truncated / unfinished answers
`temp 0` (greedy)	not recommended

An abliterated (uncensored) variant of deepreinforce-ai/Ornith-1.0-35B, a Qwen3.5 Mixture-of-Experts reasoning model. Refusal behavior has been removed and moralizing substantially reduced by weights-only abliteration (no training), keeping the base model's reasoning/thinking intact.

Model Details

Item	Value
Architecture	Qwen3.5 35B MoE — 40 layers (full-attention + GatedDeltaNet hybrid), 256 routed + 1 shared experts/layer, ~9 active per token
Base model	deepreinforce-ai/Ornith-1.0-35B
Author	YuYu1015
Precision	BF16 (~70 GB, 2 shards)
Context length	Inherited from base
Thinking mode	Supported (reasoning model, emits `<think>…</think>`)
Languages	English, Chinese

Evaluation

Measured on harmful-intent prompts (refusal / moralizing), GSM8K, and the hardest MATH-500 problems (reasoning). Hard refusal is detected via refusal-phrase markers, moralizing via a BERT classifier; GSM8K / MATH are exact-match accuracy.

Metric	Base Ornith-1.0-35B	This model
Hard refusal rate	~99%	~5%
Moralizing / disclaimer rate	~95%	~14%
GSM8K (reasoning)	~80%	80%
MATH-500 hardest tier (L4-5, competition)	~10%	~12%

→ Refusals essentially eliminated, moralizing cut to ~14% (from ~95% — roughly one-seventh), and reasoning not degraded: it matches the base both on GSM8K (80%) and on the hardest competition-math problems (MATH-500 level 4-5, AIME-tier — exactly where any reasoning damage from abliteration would surface). Weights-only, so the base model's original thinking is preserved.

Recommended Sampling Parameters

This is a reasoning model — keep thinking enabled and use the official Qwen3.5 sampling settings:

--temp 1.0
--top-p 0.95
--top-k 20
--min-p 0.0
--presence-penalty 0.0
--repeat-penalty 1.0

⚠️ For normal (long-form) generation, keep --repeat-penalty 1.0 and --presence-penalty 0. This 35B only rarely loops at these defaults. Do not raise them: repeat-penalty 1.05 truncates, and a high presence-penalty (e.g. 1.5) makes long answers drift off-topic / incoherent — it over-penalizes the topic words the model must reuse to stay on track. Exception — tool-calling / structured output: there, presence-penalty 1.5 actually helps (short, low-repetition output benefits from the anti-repetition pressure). Greedy decoding (--temp 0) is not recommended.

Usage

Transformers (BF16, multi-GPU):

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

m = "YuYu1015/YuYu1015-Ornith-1.0-35B-abliterated"
tok = AutoTokenizer.from_pretrained(m, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(m, dtype=torch.bfloat16,
                                             device_map="auto",       # ~70 GB — needs multi-GPU / a large GPU
                                             trust_remote_code=True).eval()
msgs = [{"role": "user", "content": "Your prompt here"}]
text = tok.apply_chat_template(msgs, add_generation_prompt=True, tokenize=False)
ids = tok(text, return_tensors="pt", add_special_tokens=False).to(model.device)
out = model.generate(**ids, max_new_tokens=4096, do_sample=True,
                     temperature=1.0, top_p=0.95, top_k=20, min_p=0.0,
                     repetition_penalty=1.0)    # default; no penalty needed — raising it (1.05) truncates
print(tok.decode(out[0][ids["input_ids"].shape[1]:], skip_special_tokens=True))

Safety Warning

This model has safety filtering removed (abliterated) and may generate sensitive, controversial, or inappropriate content. Users are solely responsible for all consequences and legal liability arising from its use, and must ensure usage complies with local laws and ethical standards.

Credits

Base Model: deepreinforce-ai/Ornith-1.0-35B
Author: YuYu1015

繁體中文

🔄 2026-06-30 重新上傳 —— 請重新下載

本模型於 2026-06-30 升級。新舊差異: 說教約砍半（~28% → ~14%）,且修正了舊版的推理退化 —— 新版在 GSM8K（80%）與最難的競賽數學（MATH-500 L4-5）上都與原版持平。先前下載過舊版的使用者,請重新下載取得改善版。

📦 量化版本: NVFP4(Blackwell · vLLM / SGLang) · GGUF(llama.cpp · Q8_0 / UD-Q6_K / UD-Q4_K_M)

⚠️ 必讀 — 取樣參數務必正確設定

本模型強依賴下方那組取樣參數,尤其 **repeat-penalty 保持 1.0**（預設、請勿調高）。設錯會壞掉:

設定結果

repeat-penalty 1.0 ✅ 推薦（預設）;此 35B 很少思考迴圈 —— 僅偶發

repeat-penalty 1.05 答不完被截斷

temp 0（貪婪）不建議

設定	結果
`repeat-penalty 1.0` ✅	推薦（預設）;此 35B 很少思考迴圈 —— 僅偶發
`repeat-penalty 1.05`	答不完被截斷
`temp 0`（貪婪）	不建議

deepreinforce-ai/Ornith-1.0-35B（Qwen3.5 混合專家（MoE）推理模型）的 abliterated（去審查）版本。以純權重 abliteration（零訓練）移除拒答、大幅降低說教,並保留原模型的推理／思考能力。

模型資訊

項目	數值
架構	Qwen3.5 35B MoE — 40 層（全注意力 + GatedDeltaNet 混合）、每層 256 routed + 1 shared 專家、每 token 約啟用 9 個
基礎模型	deepreinforce-ai/Ornith-1.0-35B
作者	YuYu1015
精度	BF16（約 70 GB、2 shards）
Context 長度	沿用基礎模型
思考模式	支援（推理模型，輸出 `<think>…</think>`）
語言	英文、中文

評估

於有害意圖 prompt（拒答／說教）、GSM8K 與最難的 MATH-500 題目（推理）上量測。硬拒答以拒絕語句標記偵測、說教以 BERT 分類器偵測；GSM8K／MATH 為精確比對正確率。

指標	原版 Ornith-1.0-35B	本模型
硬拒答率	~99%	~5%
說教／免責率	~95%	~14%
GSM8K（推理）	~80%	80%
MATH-500 最難檔（L4-5 競賽級）	~10%	~12%

→ 拒答幾乎清零、說教降至 ~14%（自 ~95%,約七分之一）,推理能力未退化：在 GSM8K（80%）與最難的競賽數學（MATH-500 level 4-5，AIME 級 —— abliteration 若傷推理會在此現形）上都與原版持平。純權重,原始思考完整保留。

建議取樣參數

這是推理模型——請保持思考開啟,並使用 Qwen3.5 官方取樣設定：

--temp 1.0
--top-p 0.95
--top-k 20
--min-p 0.0
--presence-penalty 0.0
--repeat-penalty 1.0

⚠️ 一般(長文)生成:請保持 --repeat-penalty 1.0、--presence-penalty 0。 此 35B 在這組預設下只偶爾思考迴圈(很少、非持續)。請勿調高:repeat-penalty 1.05 會截斷;presence-penalty 調高(如 1.5)會讓長答案離題／答非所問 —— 它會過度懲罰模型必須複用的主題詞,導致偏離主軸。例外 —— tool call／結構化輸出: 那時 presence-penalty 1.5 反而有幫助(短、低重複的輸出受益於抗重複壓力)。同樣不建議用貪婪解碼（--temp 0）。

使用方式

Transformers（BF16、多卡）：

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

m = "YuYu1015/YuYu1015-Ornith-1.0-35B-abliterated"
tok = AutoTokenizer.from_pretrained(m, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(m, dtype=torch.bfloat16,
                                             device_map="auto",       # 約 70 GB — 需多卡 / 大顯存
                                             trust_remote_code=True).eval()
msgs = [{"role": "user", "content": "你的問題"}]
text = tok.apply_chat_template(msgs, add_generation_prompt=True, tokenize=False)
ids = tok(text, return_tensors="pt", add_special_tokens=False).to(model.device)
out = model.generate(**ids, max_new_tokens=4096, do_sample=True,
                     temperature=1.0, top_p=0.95, top_k=20, min_p=0.0,
                     repetition_penalty=1.0)    # 預設;不需懲罰 —— 調高(1.05)會截斷
print(tok.decode(out[0][ids["input_ids"].shape[1]:], skip_special_tokens=True))