Jinki Jeong PRO

Anserwise

·

AI & ML interests

None yet

Recent Activity

upvoted a collection about 19 hours ago

VKAE Accelerated

upvoted an article 1 day ago

Adding a GPU Without Building One

reacted to SeaWolf-AI's post with 👀 1 day ago

🚀 Adding a GPU without building one AI is usually framed as "how smart is the model / how many GPUs did you buy." The real bottleneck is elsewhere — how efficiently you use the GPUs you already have. Training happens once; inference runs the entire time users use your product. So a service's economics come down to cost per token. Inference acceleration uses software to pull several times more out of the same GPU — the effect of plugging in one more "virtual GPU." VIDRAFT's VKAE, measured (B200, same-harness, no quality loss): Qwen3.5-35B-A3B (MoE): 25.7 → 601 tok/s (23.4×) Darwin-36B-Opus (in-house MoE): 25.0 → 280.8 (11.2×) 10,000+ tok/s peak aggregate under concurrency The key: it's reproducible — model + serving shipped as one container. docker pull vidraft/qwen35-vkae:601 Don't take our word for it — run it yourself. The mechanism will be released as a paper. 🏆 Leaderboard & demo 👉 https://huggingface.co/spaces/VIDraft/vkae Articles 👉 https://huggingface.co/blog/FINAL-Bench/vkae-leaderboard

View all activity

Organizations

None yet

models 5

Anserwise/AWAXIS-Hybrid-28B

Text Generation • 28B • Updated 27 days ago • 75 • 2

Anserwise/AWAXIS-KR-31B

Image-Text-to-Text • 52B • Updated 27 days ago • 98 • 4

Anserwise/AWAXIS-Think-31B

Text Generation • 31B • Updated about 1 month ago • 68 • 2

Anserwise/AWAXIS-Think-28B

Text Generation • 28B • Updated Apr 24 • 31 • 17

Anserwise/AWAXIS-Think-27b

Text Generation • 27B • Updated Apr 23 • 5 • 1

datasets 0

None public yet