AI & ML interests
None defined yet.
Recent Activity
Papers
ClawsBench: Evaluating Capability and Safety of LLM Productivity Agents in Simulated Workspaces
SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks
datasets 15
benchflow/env0-qwen35-9b-mobile300-prime-sft
Viewer • Updated • 300 • 56
benchflow/env0-qwen35-9b-full2003-prime-sft
Preview • Updated • 56
benchflow/env0-qwen35-9b-full1703-prime-sft
Viewer • Updated • 1.7k • 59
benchflow/env0-prime-sft-smoke10-arrow
Viewer • Updated • 10 • 43
benchflow/env0-prime-sft-smoke10
Viewer • Updated • 10 • 38
benchflow/skillsbench
Benchmark • Updated • 4.44k • 6
benchflow/skillsbench-leaderboard
Updated • 12.6k • 1
benchflow/benchmarks
Updated • 44
benchflow/skillsbench-research-artifacts
Updated • 34
benchflow/skillsbench-trajectories-apr2026
Updated • 127 • 1