Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Website
Tasks
HuggingChat
Collections
Languages
Organizations
Community
Blog
Posts
Daily Papers
Learn
Discord
Forum
GitHub
Solutions
Team & Enterprise
Hugging Face PRO
Enterprise Support
Inference Providers
Inference Endpoints
Storage Buckets
Log In
Sign Up
Edit Datasets filters
Main
Tasks
Libraries
Languages
Licenses
Other
Modalities
3D
Audio
Document
Geospatial
Image
Tabular
Text
Time-series
Video
Size (rows)
Reset Size
< 1K
> 1T
Format
json
csv
parquet
optimized-parquet
imagefolder
soundfolder
webdataset
text
arrow
Type
Benchmark
Traces
Apply filters
Datasets
38
Full-text search
Edit filters
Sort: Trending
Active filters:
official
Clear all
Benchmark datasets
Live leaderboards rank Hub models on evals like SWE-bench, AIME 2026 and HLE.
LiquidAI/ifstruct-v1.0
Benchmark
•
Updated
3 days ago
•
2k
•
114
•
35
openai/gsm8k
Benchmark
•
Updated
Mar 23
•
17.6k
•
934k
•
1.42k
allenai/olmOCR-bench
Benchmark
•
Updated
Feb 19
•
7.83k
•
258
cais/hle
Benchmark
•
Updated
Jan 20
•
2.5k
•
28.3k
•
850
SWE-bench/SWE-bench_Verified
Benchmark
•
Updated
Feb 27
•
500
•
71.3k
•
101
ScaleAI/SWE-bench_Pro
Benchmark
•
Updated
Feb 23
•
731
•
68.3k
•
145
llamaindex/ParseBench
Benchmark
•
Updated
Apr 19
•
169k
•
13.1k
•
101
datacurve/deep-swe
Benchmark
•
Updated
Jun 2
•
113
•
677
•
13
TIGER-Lab/MMLU-Pro
Benchmark
•
Updated
May 2
•
12.1k
•
158k
•
489
harborframework/terminal-bench-2.0
Benchmark
•
Updated
Apr 24
•
19.1k
•
44
actava/chi-bench
Benchmark
•
Updated
Jun 2
•
101
•
4.1k
•
57
meituan-longcat/WBench
Benchmark
•
Updated
May 29
•
867
•
2.33k
•
21
ARTPARK-IISc/Vaani-Benchmark-V1.0
Benchmark
•
Updated
5 days ago
•
8.91k
•
835
•
5
Idavidrein/gpqa
Benchmark
•
Updated
Mar 5
•
1.25k
•
92.8k
•
471
nvidia/compute-eval
Benchmark
•
Updated
Apr 27
•
2.46k
•
1.1k
•
26
LEXam-Benchmark/LEXam
Benchmark
•
Updated
May 21
•
7.54k
•
1.99k
•
45
mercor/apex-agents
Benchmark
•
Updated
22 days ago
•
480
•
67.1k
•
133
MathArena/hmmt_feb_2026
Benchmark
•
Updated
May 15
•
33
•
5.18k
•
5
ChrisHayduk/nanofold-public
Benchmark
•
Updated
17 days ago
•
11k
•
940
•
16
mteb/arguana
Benchmark
•
Updated
Apr 17
•
11.5k
•
11.3k
•
5
hf-audio/open-asr-leaderboard
Benchmark
•
Updated
7 days ago
•
126k
•
19.7k
•
42
MMMU/MMMU_Pro
Benchmark
•
Updated
25 days ago
•
5.19k
•
18.3k
•
60
likaixin/ScreenSpot-Pro
Benchmark
•
Updated
Mar 18
•
10.4k
•
67
mercor/ACE
Benchmark
•
Updated
Apr 13
•
592
•
6.06k
•
5
mercor/APEX-v1-extended
Benchmark
•
Updated
Apr 22
•
100
•
3.46k
•
16
VLABench/vlabench_primitive_ft_lerobot_video
Benchmark
•
Updated
Apr 23
•
575k
•
7.81k
•
1
FutureMa/EvasionBench
Benchmark
•
Updated
Feb 19
•
16.7k
•
435
•
110
MathArena/aime_2026
Benchmark
•
Updated
May 15
•
30
•
14.4k
•
44
tiiuae/PBench
Benchmark
•
Updated
May 11
•
6.34k
•
1.09k
•
15
mteb/BRIGHT
Benchmark
•
Updated
Apr 2
•
1.35M
•
3.43k
•
3
Previous
1
2
Next