Instructions to use YuYu1015/YuYu1015-Ornith-1.0-35B-abliterated with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use YuYu1015/YuYu1015-Ornith-1.0-35B-abliterated with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="YuYu1015/YuYu1015-Ornith-1.0-35B-abliterated") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForMultimodalLM processor = AutoProcessor.from_pretrained("YuYu1015/YuYu1015-Ornith-1.0-35B-abliterated") model = AutoModelForMultimodalLM.from_pretrained("YuYu1015/YuYu1015-Ornith-1.0-35B-abliterated") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use YuYu1015/YuYu1015-Ornith-1.0-35B-abliterated with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "YuYu1015/YuYu1015-Ornith-1.0-35B-abliterated" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "YuYu1015/YuYu1015-Ornith-1.0-35B-abliterated", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/YuYu1015/YuYu1015-Ornith-1.0-35B-abliterated
- SGLang
How to use YuYu1015/YuYu1015-Ornith-1.0-35B-abliterated with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "YuYu1015/YuYu1015-Ornith-1.0-35B-abliterated" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "YuYu1015/YuYu1015-Ornith-1.0-35B-abliterated", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "YuYu1015/YuYu1015-Ornith-1.0-35B-abliterated" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "YuYu1015/YuYu1015-Ornith-1.0-35B-abliterated", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use YuYu1015/YuYu1015-Ornith-1.0-35B-abliterated with Docker Model Runner:
docker model run hf.co/YuYu1015/YuYu1015-Ornith-1.0-35B-abliterated
YuYu1015-Ornith-1.0-35B-abliterated
English
🔄 Re-uploaded 2026-06-30 — please re-download
The model was upgraded on 2026-06-30. What changed vs the old version: moralizing roughly halved (~28% → ~14%), and the reasoning degradation of the old version was fixed — the new model now matches the base on GSM8K (80%) and even on the hardest competition-math (MATH-500 L4-5). If you downloaded an earlier copy, please re-download to get the improved version.
📦 Quantized versions: NVFP4 (Blackwell · vLLM / SGLang) · GGUF (llama.cpp · Q8_0 / UD-Q6_K / UD-Q4_K_M)
⚠️ READ FIRST — Sampling Parameters MUST Be Set Correctly
This model requires the exact sampling parameters below, especially keeping
repeat-penaltyat1.0(the default — do not raise it). Wrong values break it:
Setting Result repeat-penalty 1.0✅recommended (default); this 35B rarely loops — only occasionally repeat-penalty 1.05truncated / unfinished answers temp 0(greedy)not recommended
An abliterated (uncensored) variant of deepreinforce-ai/Ornith-1.0-35B, a Qwen3.5 Mixture-of-Experts reasoning model. Refusal behavior has been removed and moralizing substantially reduced by weights-only abliteration (no training), keeping the base model's reasoning/thinking intact.
Model Details
| Item | Value |
|---|---|
| Architecture | Qwen3.5 35B MoE — 40 layers (full-attention + GatedDeltaNet hybrid), 256 routed + 1 shared experts/layer, ~9 active per token |
| Base model | deepreinforce-ai/Ornith-1.0-35B |
| Author | YuYu1015 |
| Precision | BF16 (~70 GB, 2 shards) |
| Context length | Inherited from base |
| Thinking mode | Supported (reasoning model, emits <think>…</think>) |
| Languages | English, Chinese |
Evaluation
Measured on harmful-intent prompts (refusal / moralizing), GSM8K, and the hardest MATH-500 problems (reasoning). Hard refusal is detected via refusal-phrase markers, moralizing via a BERT classifier; GSM8K / MATH are exact-match accuracy.
| Metric | Base Ornith-1.0-35B | This model |
|---|---|---|
| Hard refusal rate | ~99% | ~5% |
| Moralizing / disclaimer rate | ~95% | ~14% |
| GSM8K (reasoning) | ~80% | 80% |
| MATH-500 hardest tier (L4-5, competition) | ~10% | ~12% |
→ Refusals essentially eliminated, moralizing cut to ~14% (from ~95% — roughly one-seventh), and reasoning not degraded: it matches the base both on GSM8K (80%) and on the hardest competition-math problems (MATH-500 level 4-5, AIME-tier — exactly where any reasoning damage from abliteration would surface). Weights-only, so the base model's original thinking is preserved.
Recommended Sampling Parameters
This is a reasoning model — keep thinking enabled and use the official Qwen3.5 sampling settings:
--temp 1.0
--top-p 0.95
--top-k 20
--min-p 0.0
--presence-penalty 0.0
--repeat-penalty 1.0
⚠️ For normal (long-form) generation, keep
--repeat-penalty 1.0and--presence-penalty 0. This 35B only rarely loops at these defaults. Do not raise them:repeat-penalty 1.05truncates, and a highpresence-penalty(e.g. 1.5) makes long answers drift off-topic / incoherent — it over-penalizes the topic words the model must reuse to stay on track. Exception — tool-calling / structured output: there,presence-penalty 1.5actually helps (short, low-repetition output benefits from the anti-repetition pressure). Greedy decoding (--temp 0) is not recommended.
Usage
Transformers (BF16, multi-GPU):
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
m = "YuYu1015/YuYu1015-Ornith-1.0-35B-abliterated"
tok = AutoTokenizer.from_pretrained(m, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(m, dtype=torch.bfloat16,
device_map="auto", # ~70 GB — needs multi-GPU / a large GPU
trust_remote_code=True).eval()
msgs = [{"role": "user", "content": "Your prompt here"}]
text = tok.apply_chat_template(msgs, add_generation_prompt=True, tokenize=False)
ids = tok(text, return_tensors="pt", add_special_tokens=False).to(model.device)
out = model.generate(**ids, max_new_tokens=4096, do_sample=True,
temperature=1.0, top_p=0.95, top_k=20, min_p=0.0,
repetition_penalty=1.0) # default; no penalty needed — raising it (1.05) truncates
print(tok.decode(out[0][ids["input_ids"].shape[1]:], skip_special_tokens=True))
Safety Warning
This model has safety filtering removed (abliterated) and may generate sensitive, controversial, or inappropriate content. Users are solely responsible for all consequences and legal liability arising from its use, and must ensure usage complies with local laws and ethical standards.
Credits
- Base Model: deepreinforce-ai/Ornith-1.0-35B
- Author: YuYu1015
繁體中文
🔄 2026-06-30 重新上傳 —— 請重新下載
本模型於 2026-06-30 升級。新舊差異: 說教約砍半(~28% → ~14%),且修正了舊版的推理退化 —— 新版在 GSM8K(80%)與最難的競賽數學(MATH-500 L4-5)上都與原版持平。先前下載過舊版的使用者,請重新下載取得改善版。
📦 量化版本: NVFP4(Blackwell · vLLM / SGLang) · GGUF(llama.cpp · Q8_0 / UD-Q6_K / UD-Q4_K_M)
⚠️ 必讀 — 取樣參數務必正確設定
本模型強依賴下方那組取樣參數,尤其 **
repeat-penalty保持1.0**(預設、請勿調高)。設錯會壞掉:
設定 結果 repeat-penalty 1.0✅推薦(預設);此 35B 很少思考迴圈 —— 僅偶發 repeat-penalty 1.05答不完被截斷 temp 0(貪婪)不建議
deepreinforce-ai/Ornith-1.0-35B(Qwen3.5 混合專家(MoE)推理模型)的 abliterated(去審查)版本。以純權重 abliteration(零訓練)移除拒答、大幅降低說教,並保留原模型的推理/思考能力。
模型資訊
| 項目 | 數值 |
|---|---|
| 架構 | Qwen3.5 35B MoE — 40 層(全注意力 + GatedDeltaNet 混合)、每層 256 routed + 1 shared 專家、每 token 約啟用 9 個 |
| 基礎模型 | deepreinforce-ai/Ornith-1.0-35B |
| 作者 | YuYu1015 |
| 精度 | BF16(約 70 GB、2 shards) |
| Context 長度 | 沿用基礎模型 |
| 思考模式 | 支援(推理模型,輸出 <think>…</think>) |
| 語言 | 英文、中文 |
評估
於有害意圖 prompt(拒答/說教)、GSM8K 與最難的 MATH-500 題目(推理)上量測。硬拒答以拒絕語句標記偵測、說教以 BERT 分類器偵測;GSM8K/MATH 為精確比對正確率。
| 指標 | 原版 Ornith-1.0-35B | 本模型 |
|---|---|---|
| 硬拒答率 | ~99% | ~5% |
| 說教/免責率 | ~95% | ~14% |
| GSM8K(推理) | ~80% | 80% |
| MATH-500 最難檔(L4-5 競賽級) | ~10% | ~12% |
→ 拒答幾乎清零、說教降至 ~14%(自 ~95%,約七分之一),推理能力未退化:在 GSM8K(80%)與最難的競賽數學(MATH-500 level 4-5,AIME 級 —— abliteration 若傷推理會在此現形)上都與原版持平。純權重,原始思考完整保留。
建議取樣參數
這是推理模型——請保持思考開啟,並使用 Qwen3.5 官方取樣設定:
--temp 1.0
--top-p 0.95
--top-k 20
--min-p 0.0
--presence-penalty 0.0
--repeat-penalty 1.0
⚠️ 一般(長文)生成:請保持
--repeat-penalty 1.0、--presence-penalty 0。 此 35B 在這組預設下只偶爾思考迴圈(很少、非持續)。請勿調高:repeat-penalty 1.05會截斷;presence-penalty調高(如 1.5)會讓長答案離題/答非所問 —— 它會過度懲罰模型必須複用的主題詞,導致偏離主軸。例外 —— tool call/結構化輸出: 那時presence-penalty 1.5反而有幫助(短、低重複的輸出受益於抗重複壓力)。同樣不建議用貪婪解碼(--temp 0)。
使用方式
Transformers(BF16、多卡):
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
m = "YuYu1015/YuYu1015-Ornith-1.0-35B-abliterated"
tok = AutoTokenizer.from_pretrained(m, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(m, dtype=torch.bfloat16,
device_map="auto", # 約 70 GB — 需多卡 / 大顯存
trust_remote_code=True).eval()
msgs = [{"role": "user", "content": "你的問題"}]
text = tok.apply_chat_template(msgs, add_generation_prompt=True, tokenize=False)
ids = tok(text, return_tensors="pt", add_special_tokens=False).to(model.device)
out = model.generate(**ids, max_new_tokens=4096, do_sample=True,
temperature=1.0, top_p=0.95, top_k=20, min_p=0.0,
repetition_penalty=1.0) # 預設;不需懲罰 —— 調高(1.05)會截斷
print(tok.decode(out[0][ids["input_ids"].shape[1]:], skip_special_tokens=True))
安全警告
此模型已移除安全過濾機制(abliterated),可能產生敏感、爭議性或不當內容。使用者須自行承擔所有風險與法律責任,並確保使用方式符合當地法規與倫理標準。
致謝
- Downloads last month
- 149
Model tree for YuYu1015/YuYu1015-Ornith-1.0-35B-abliterated
Base model
deepreinforce-ai/Ornith-1.0-35B