Kernels documentation
Install agent skills
Get started
Use kernels
QuickstartUse layersLock kernel versionsEnvironment variablesProjects using kernelsMigrate from older versionsFAQ
Python API
kernels CLI
Overviewkernels infokernels versionskernels lockkernels downloadkernels benchmarkkernels verify-signature
Build kernels
Write kernelsBuild with NixDevelop locallySet up your IDESet up for Metal kernelsDevelop kernels with agentsSecure your kernelsGitHub Actions & HF Jobs
kernel-builder CLI
Kernel specifications
Concepts & design
Resources
You are viewing main version, which requires installation from source. If you'd like
regular pip install, checkout the latest stable version (v0.16.0).
Install agent skills
Use kernel-builder skills add to install the skills for AI coding assistants like Claude, Codex, and OpenCode.
Supported skills include:
cuda-kernels(default)rocm-kernelsxpu-kernelscpu-kernels
Skill files are downloaded from the huggingface/kernels directory in this repository.
Skills instruct agents how to deal with hardware-specific optimizations, integrate with libraries like diffusers and transformers, and benchmark kernel performance in consistent ways.
When are CPU kernels actually helpful? Two main cases:
- Better performance on Intel Xeon — custom AVX2/AVX512 kernels (and AMX via brgemm for quantized GEMM) outperform generic PyTorch ops for element-wise and quantized workloads, especially in CPU-only or latency-sensitive serving.
- Enabling functionality that otherwise can’t run — some kernels are a hard requirement, e.g.
megablocksMoE on CPU, where without the kernel you simply cannot run MXFP4.
Example CPU kernels built with this skill (available on the Hub under kernels-community):
kernels-community/megablocks— MoE kernels with a CPU backend that enable running MXFP4 MoE models on CPU.kernels-community/quantization-gptq— INT4 quantized GEMM using AVX512.kernels-community/rmsnorm— RMSNorm with AVX2/AVX512 element-wise paths.
Examples:
# install for Claude in the current project
kernel-builder skills add --claude
# install ROCm kernels skill for Codex
kernel-builder skills add --skill rocm-kernels --codex
# install globally for Codex
kernel-builder skills add --codex --global
# install for multiple assistants
kernel-builder skills add --claude --codex --opencode
# install to a custom destination and overwrite if already present
kernel-builder skills add --dest ~/my-skills --force