Hugging Face – Posts

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

All HF Hub posts

posted an update 2 days ago

Post

10479

A new model is coming!
Its going to take a long time on my 5070 Ti so expect a release in ~1 month.
We think this model is going to be SOTA For its size.
Our Mini Version will be 25M Parameters and Pro with 140M.
The Pro version has a 3072 Context Window (Extensible to up to 6K with RoPE) And the Mini version has a context window of 4096 (Up to 8K with RoPE)
Meanwhile we are currently working on a Instruct Version of our BananaMind 1.5 Base.

The training will start this weekend

We are very exited to release it when its done!

10 replies

SeaWolf-AI

posted an update about 11 hours ago

Post

1075

🚀 Adding a GPU without building one

AI is usually framed as "how smart is the model / how many GPUs did you buy." The real bottleneck is elsewhere — how efficiently you use the GPUs you already have.

Training happens once; inference runs the entire time users use your product. So a service's economics come down to cost per token. Inference acceleration uses software to pull several times more out of the same GPU — the effect of plugging in one more "virtual GPU."

VIDRAFT's VKAE, measured (B200, same-harness, no quality loss):

Qwen3.5-35B-A3B (MoE): 25.7 → 601 tok/s (23.4×)
Darwin-36B-Opus (in-house MoE): 25.0 → 280.8 (11.2×)
10,000+ tok/s peak aggregate under concurrency
The key: it's reproducible — model + serving shipped as one container.

docker pull vidraft/qwen35-vkae:601
Don't take our word for it — run it yourself. The mechanism will be released as a paper.

🏆 Leaderboard & demo 👉 VIDraft/vkae
Articles 👉 https://huggingface.co/blog/FINAL-Bench/vkae-leaderboard

ginigen-ai

posted an update 3 days ago

Post

10282

🧠 Does your LLM know when it's about to be wrong?

Most leaderboards measure accuracy. We measure metacognition — whether a model catches its own errors. Benchmark + leaderboard + adapters, all open. 🎉

The surprise: even a K-AI #1 model (JGOS-31B-Citizen) is the strongest on multiple-choice traps (trap_rate 0.005 — ~2 misses in 400) yet blind to its own free-form mistakes (self-confidence AUROC = 0.5, pure random). A tiny base-frozen adapter recovers that signal.

Two independent axes (never compared across a row): ① trap_rate — does it fall for tempting trap options? (lower = stronger) ② adapter gain Δ — how much a lightweight adapter catches errors the model itself misses. (higher = more adapter value)

What's open: 📊 300+100 trap problems (each with a hidden trap + TICOS type) 🏆 24-model leaderboard 🧩 11 per-model adapters — adapters, NOT fine-tunes (base stays frozen; the adapter just reads the hidden state → P(wrong))

Submit any HF model → auto-scored daily at 09:00 KST and added to the board.

🏆 Leaderboard → ginigen-ai/Metacognition-Leaderboard-Space

📊 Benchmark → ginigen-ai/Metacognition-Bench

🧩 Adapters → FINAL-Bench/metacognition-adapters-6a42c032e6beb803dd032961

📊 Article → https://huggingface.co/blog/ginigen-ai/metacognition

Benchmark by ginigen-ai · Adapters by FINAL-Bench (Darwin/Chimera platform + AETHER metacognition tech).

11 replies

Quazim0t0

posted an update about 10 hours ago

Post

428

Created research language model whose channel-mixing block is not an MLP. It is a differentiable Neighbour-Sensing fungal-colony-growth model: each token is expanded into a colony of hyphal tips that grow in a bounded latent region, sense a shared density field, and steer their own growth — the "MLP" is replaced by a few differentiable steps of colony growth, read back out into the hidden state.

Quazim0t0/Mycel-LM-79M

Also the original SpikeWhale project — the one that sparked all the other SpikeWhale related projects. Every spiking primitive here is hand-written in plain PyTorch: the leaky integrate-and-fire (LIF) neuron dynamics, the fast-sigmoid surrogate gradient, and the backprop-through-time training loop. No snntorch, no spikingjelly, no norse, no bindsnet — the network is a genuine from-scratch SNN.

Quazim0t0/SpikeWhale-SNN-216M

stas

posted an update 1 day ago

Post

595

I present to you a new experimental open book.

https://github.com/stas00/python-cookbook

I took my dense Python cheatsheet that I have been honing for many years and use a lot daily and turned it into a book of recipes.

Is this useful?

This is, of course, free, like other open books.

ProCreations

posted an update 1 day ago

Post

294

want model think like grug?token efficient think! Grug dataset here. Grug model now ProCreations/grug-think

breitburg

posted an update 1 day ago

Post

529

I've been experimenting with "pure" model alignment.

The core idea is to only train a verifiable version of a capacity until the model generalizes it to the non-verifiable version. For example, training the model on factual self-knowledge, like the model's scale, architecture, runtime situation, and being able to predict its own behavior, betting this generalizes to real introspection about states that do not.

The same principle applies to general instruction following -- no training on subjective judgement, only verifiable claims and inferences, betting the skill generalizes to instructions where correctness is a matter of judgment.

The primary alignment claim is that an identity and taste that will emerge this way will be much more robust and honest than hand-scripted ones (e.g.
"As an AI language model...").

During the training, we should never teach it to make any subjective claims or invent experiences that we assume it has, like "I don't have taste" or "I'm not self-aware in the way you think", as well as no narration of internal states like "I'm curious now".

The main threat, of course, is that we'll simply inherit the training distribution of all the things like "taste", and we'll get an average. However, with the recent research about the models' introspection abilities, it might be as well the case that we'll get something that's more honest than something that tries to adhere to a specific spec file.

I'm posting new experimental models trained that way in this collection: https://huggingface.co/collections/breitburg/neue

3 replies

kanaria007

posted an update 1 day ago

Post

✅ Article highlight: *Mega-Parse Bridge: Large Context Compression Without Losing Governance Semantics* (art-60-190, v0.1)

TL;DR:
This article argues that summarizing a huge input is not the same as parsing it.

Large documents, evidence bundles, long histories, multimodal case packets, and world-state slices cannot be treated as one vague “context.” 190 turns large-input handling into a governed mega-parse: shard, parse, retain semantics, declare loss, preserve re-expandability, and decide what the compressed artifact can honestly support.

Read:
kanaria007/agi-structural-intelligence-protocols

Why it matters:
• prevents “I read the whole thing” from becoming an overclaim
• keeps shard-level provenance instead of trusting a summary blob
• makes compression loss explicit and reviewable
• protects contradictions, authority-sensitive clauses, and protected-subject distinctions
• lets reviewers re-expand compressed claims back to source structure

What’s inside:
• mega-parse intake envelopes for large text, multimodal batches, and long-running packets
• shard-parse receipts for local grounded structure
• semantic-retention policies for what must survive compression
• compression artifacts with declared retention and bounded loss
• loss-declaration receipts for dropped, blurred, or unavailable surfaces
• re-expandability maps linking compressed claims back to recoverable shards
• admissibility and reentry artifacts for deciding where compressed outputs may be used

Key idea:
Do not say:

*“the system summarized the context.”*

Say:

*“this large input was sharded, locally parsed, compressed under this retention policy, loss-declared, re-expandable through these refs, and admitted only for these effect surfaces.”*

Compression is allowed.

Unreceipted semantic loss is not.

ginigen-ai

posted an update 4 days ago

Post

5161

🍳 The RoboCasa Kitchen Leaderboard
What does it take for a robot to handle kitchen chores the way a person does? It has to see (Vision), understand instructions (Language), and actually act (Action) — and VLA (Vision-Language-Action) models are emerging as the answer. They're the bridge between large multimodal models and real-world embodied control.

RoboCasa Kitchen is a leading robot-learning benchmark in which a single-arm robot (Franka Panda) performs 24 atomic manipulation tasks — picking up cups and bowls, opening drawers and doors, turning faucets, pressing buttons, and more — inside a photorealistic simulated kitchen. Because the layout and object placement are randomized every episode, it tests genuine generalization rather than memorized motions. The score (success rate, SR) is the average fraction of the 24 tasks completed as instructed, measured over multiple seeds so results aren't down to luck.

The catch: this benchmark has no official leaderboard, and protocols (number of demonstrations, evaluation setup) differ from paper to paper, leaving scores scattered. Lining the numbers up naively quickly turns into an apples-to-oranges comparison.

This leaderboard fixes that by collecting published scores with their sources and comparing only what is genuinely comparable. It's split into three tables:

🏆 Kitchen 24-task (matched) — head-to-head under identical conditions (per the RLDX-1 Technical Report). This is the core ranking you can actually trust.
➕ Other protocols — self-reported under different setups (e.g. fewer demos). Not directly comparable, so kept separate.
🤖 GR1-Tabletop — a different, humanoid-based variant suite, separated to avoid confusion.

Any researcher can submit their own model's score directly, and submissions are reviewed before they appear on the board. Every number links to its source paper, so you can verify it yourself.

👉 ginigen-ai/robocasa-kitchen-leaderboard

fffiloni

posted an update about 22 hours ago

Post

397

I made a Hugging Face Space for SCAIL-2 🤗

Reference character + driving motion → animated result.

A simple demo to explore the paper’s core workflow with curated examples.

👉 fffiloni/SCAIL-2

Recently active users