The full dataset viewer is not available (click to read why). Only showing a preview of the rows.
Error code: DatasetGenerationCastError
Exception: DatasetGenerationCastError
Message: An error occurred while generating the dataset
All the data files must have the same columns, but at some point there are 12 new columns ({'month', 'milestone_name', 'is_reasoning', 'is_research', 'date', 'is_product_launch', 'is_open_source', 'description', 'event_id', 'significance_score', 'year', 'milestone_type'}) and 8 missing columns ({'model_id', 'model_name', 'benchmark', 'score_pct', 'score', 'release_date', 'max_score', 'benchmark_type'}).
This happened while the csv dataset builder was generating data using
hf://datasets/hmnshudhmn24/llm-benchmarks-capabilities-2020-2026/capability_milestones.csv (at revision f04fa5f30fb56ff2fb35af483221bce9e8dc9a67), ['hf://datasets/hmnshudhmn24/llm-benchmarks-capabilities-2020-2026@f04fa5f30fb56ff2fb35af483221bce9e8dc9a67/benchmark_scores.csv', 'hf://datasets/hmnshudhmn24/llm-benchmarks-capabilities-2020-2026@f04fa5f30fb56ff2fb35af483221bce9e8dc9a67/capability_milestones.csv', 'hf://datasets/hmnshudhmn24/llm-benchmarks-capabilities-2020-2026@f04fa5f30fb56ff2fb35af483221bce9e8dc9a67/compute_estimates.csv', 'hf://datasets/hmnshudhmn24/llm-benchmarks-capabilities-2020-2026@f04fa5f30fb56ff2fb35af483221bce9e8dc9a67/models_catalog.csv', 'hf://datasets/hmnshudhmn24/llm-benchmarks-capabilities-2020-2026@f04fa5f30fb56ff2fb35af483221bce9e8dc9a67/pricing_history.csv']
Please either edit the data files to have matching columns, or separate them into different configurations (see docs at https://hf.co/docs/hub/datasets-manual-configuration#multiple-configurations)
Traceback: Traceback (most recent call last):
File "/usr/local/lib/python3.14/site-packages/datasets/builder.py", line 1837, in _prepare_split_single
writer.write_table(table)
~~~~~~~~~~~~~~~~~~^^^^^^^
File "/usr/local/lib/python3.14/site-packages/datasets/arrow_writer.py", line 765, in write_table
self._write_table(pa_table, writer_batch_size=writer_batch_size)
~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.14/site-packages/datasets/arrow_writer.py", line 773, in _write_table
pa_table = table_cast(pa_table, self._schema)
File "/usr/local/lib/python3.14/site-packages/datasets/table.py", line 2369, in table_cast
return cast_table_to_schema(table, schema)
File "/usr/local/lib/python3.14/site-packages/datasets/table.py", line 2297, in cast_table_to_schema
raise CastError(
...<3 lines>...
)
datasets.table.CastError: Couldn't cast
event_id: string
date: string
year: int64
month: int64
milestone_name: string
organization: string
milestone_type: string
significance_score: int64
description: string
is_product_launch: int64
is_research: int64
is_open_source: int64
is_reasoning: int64
-- schema metadata --
pandas: '{"index_columns": [{"kind": "range", "name": null, "start": 0, "' + 1845
to
{'model_id': Value('string'), 'model_name': Value('string'), 'organization': Value('string'), 'release_date': Value('string'), 'benchmark': Value('string'), 'benchmark_type': Value('string'), 'score': Value('float64'), 'max_score': Value('int64'), 'score_pct': Value('float64')}
because column names don't match
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/src/services/worker/src/worker/job_runners/config/parquet_and_info.py", line 1369, in compute_config_parquet_and_info_response
parquet_operations, partial, estimated_dataset_info = stream_convert_to_parquet(
~~~~~~~~~~~~~~~~~~~~~~~~~^
builder, max_dataset_size_bytes=max_dataset_size_bytes
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
)
^
File "/src/services/worker/src/worker/job_runners/config/parquet_and_info.py", line 948, in stream_convert_to_parquet
builder._prepare_split(split_generator=splits_generators[split], file_format="parquet")
~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.14/site-packages/datasets/builder.py", line 1683, in _prepare_split
for job_id, done, content in self._prepare_split_single(
~~~~~~~~~~~~~~~~~~~~~~~~~~^
gen_kwargs=gen_kwargs, job_id=job_id, **_prepare_split_args
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
):
^
File "/usr/local/lib/python3.14/site-packages/datasets/builder.py", line 1839, in _prepare_split_single
raise DatasetGenerationCastError.from_cast_error(
...<4 lines>...
)
datasets.exceptions.DatasetGenerationCastError: An error occurred while generating the dataset
All the data files must have the same columns, but at some point there are 12 new columns ({'month', 'milestone_name', 'is_reasoning', 'is_research', 'date', 'is_product_launch', 'is_open_source', 'description', 'event_id', 'significance_score', 'year', 'milestone_type'}) and 8 missing columns ({'model_id', 'model_name', 'benchmark', 'score_pct', 'score', 'release_date', 'max_score', 'benchmark_type'}).
This happened while the csv dataset builder was generating data using
hf://datasets/hmnshudhmn24/llm-benchmarks-capabilities-2020-2026/capability_milestones.csv (at revision f04fa5f30fb56ff2fb35af483221bce9e8dc9a67), ['hf://datasets/hmnshudhmn24/llm-benchmarks-capabilities-2020-2026@f04fa5f30fb56ff2fb35af483221bce9e8dc9a67/benchmark_scores.csv', 'hf://datasets/hmnshudhmn24/llm-benchmarks-capabilities-2020-2026@f04fa5f30fb56ff2fb35af483221bce9e8dc9a67/capability_milestones.csv', 'hf://datasets/hmnshudhmn24/llm-benchmarks-capabilities-2020-2026@f04fa5f30fb56ff2fb35af483221bce9e8dc9a67/compute_estimates.csv', 'hf://datasets/hmnshudhmn24/llm-benchmarks-capabilities-2020-2026@f04fa5f30fb56ff2fb35af483221bce9e8dc9a67/models_catalog.csv', 'hf://datasets/hmnshudhmn24/llm-benchmarks-capabilities-2020-2026@f04fa5f30fb56ff2fb35af483221bce9e8dc9a67/pricing_history.csv']
Please either edit the data files to have matching columns, or separate them into different configurations (see docs at https://hf.co/docs/hub/datasets-manual-configuration#multiple-configurations)Need help to make the dataset viewer work? Make sure to review how to configure the dataset viewer, and open a discussion for direct support.
model_id string | model_name string | organization string | release_date string | benchmark string | benchmark_type string | score float64 | max_score int64 | score_pct float64 |
|---|---|---|---|---|---|---|---|---|
M001 | GPT-3 (175B) | OpenAI | 2020-06-11 | MMLU | knowledge | 32.83 | 100 | 32.83 |
M001 | GPT-3 (175B) | OpenAI | 2020-06-11 | HellaSwag | commonsense | 55.08 | 100 | 55.08 |
M001 | GPT-3 (175B) | OpenAI | 2020-06-11 | ARC-Challenge | reasoning | 49.43 | 100 | 49.43 |
M002 | T5-XXL | Google | 2020-02-24 | MMLU | knowledge | 24.38 | 100 | 24.38 |
M002 | T5-XXL | Google | 2020-02-24 | HellaSwag | commonsense | 49.3 | 100 | 49.3 |
M002 | T5-XXL | Google | 2020-02-24 | ARC-Challenge | reasoning | 47.24 | 100 | 47.24 |
M003 | Turing-NLG | Microsoft | 2020-02-13 | MMLU | knowledge | 30.64 | 100 | 30.64 |
M003 | Turing-NLG | Microsoft | 2020-02-13 | HellaSwag | commonsense | 52.13 | 100 | 52.13 |
M003 | Turing-NLG | Microsoft | 2020-02-13 | ARC-Challenge | reasoning | 52.07 | 100 | 52.07 |
M004 | GShard | Google | 2020-06-30 | MMLU | knowledge | 35.61 | 100 | 35.61 |
M004 | GShard | Google | 2020-06-30 | HellaSwag | commonsense | 54.48 | 100 | 54.48 |
M004 | GShard | Google | 2020-06-30 | ARC-Challenge | reasoning | 51.88 | 100 | 51.88 |
M005 | CTRL | Salesforce | 2020-09-11 | MMLU | knowledge | 38.09 | 100 | 38.09 |
M005 | CTRL | Salesforce | 2020-09-11 | HellaSwag | commonsense | 76.22 | 100 | 76.22 |
M005 | CTRL | Salesforce | 2020-09-11 | ARC-Challenge | reasoning | 63.01 | 100 | 63.01 |
M006 | BlenderBot | Meta | 2020-04-30 | MMLU | knowledge | 24.07 | 100 | 24.07 |
M006 | BlenderBot | Meta | 2020-04-30 | HellaSwag | commonsense | 49.88 | 100 | 49.88 |
M006 | BlenderBot | Meta | 2020-04-30 | ARC-Challenge | reasoning | 46.41 | 100 | 46.41 |
M007 | Switch Transformer | Google | 2021-01-11 | MMLU | knowledge | 64.46 | 100 | 64.46 |
M007 | Switch Transformer | Google | 2021-01-11 | HumanEval | coding | 28.12 | 100 | 28.12 |
M007 | Switch Transformer | Google | 2021-01-11 | GSM8K | math_grade | 20.04 | 100 | 20.04 |
M007 | Switch Transformer | Google | 2021-01-11 | MATH | math_competition | 7.92 | 100 | 7.92 |
M007 | Switch Transformer | Google | 2021-01-11 | HellaSwag | commonsense | 87.7 | 100 | 87.7 |
M007 | Switch Transformer | Google | 2021-01-11 | ARC-Challenge | reasoning | 67 | 100 | 67 |
M008 | Codex | OpenAI | 2021-08-10 | MMLU | knowledge | 53.1 | 100 | 53.1 |
M008 | Codex | OpenAI | 2021-08-10 | HumanEval | coding | 46.31 | 100 | 46.31 |
M008 | Codex | OpenAI | 2021-08-10 | GSM8K | math_grade | 34.76 | 100 | 34.76 |
M008 | Codex | OpenAI | 2021-08-10 | MATH | math_competition | 3.25 | 100 | 3.25 |
M008 | Codex | OpenAI | 2021-08-10 | HellaSwag | commonsense | 80.72 | 100 | 80.72 |
M008 | Codex | OpenAI | 2021-08-10 | ARC-Challenge | reasoning | 61.03 | 100 | 61.03 |
M009 | Jurassic-1 Jumbo | AI21 | 2021-08-11 | MMLU | knowledge | 56.26 | 100 | 56.26 |
M009 | Jurassic-1 Jumbo | AI21 | 2021-08-11 | HumanEval | coding | 41.6 | 100 | 41.6 |
M009 | Jurassic-1 Jumbo | AI21 | 2021-08-11 | GSM8K | math_grade | 32.28 | 100 | 32.28 |
M009 | Jurassic-1 Jumbo | AI21 | 2021-08-11 | MATH | math_competition | 8.49 | 100 | 8.49 |
M009 | Jurassic-1 Jumbo | AI21 | 2021-08-11 | HellaSwag | commonsense | 87.97 | 100 | 87.97 |
M009 | Jurassic-1 Jumbo | AI21 | 2021-08-11 | ARC-Challenge | reasoning | 63.93 | 100 | 63.93 |
M010 | Megatron-Turing NLG | Microsoft+NVIDIA | 2021-10-11 | MMLU | knowledge | 63.83 | 100 | 63.83 |
M010 | Megatron-Turing NLG | Microsoft+NVIDIA | 2021-10-11 | HumanEval | coding | 36.27 | 100 | 36.27 |
M010 | Megatron-Turing NLG | Microsoft+NVIDIA | 2021-10-11 | GSM8K | math_grade | 34.38 | 100 | 34.38 |
M010 | Megatron-Turing NLG | Microsoft+NVIDIA | 2021-10-11 | MATH | math_competition | 6.2 | 100 | 6.2 |
M010 | Megatron-Turing NLG | Microsoft+NVIDIA | 2021-10-11 | HellaSwag | commonsense | 85.41 | 100 | 85.41 |
M010 | Megatron-Turing NLG | Microsoft+NVIDIA | 2021-10-11 | ARC-Challenge | reasoning | 67.52 | 100 | 67.52 |
M011 | Gopher | DeepMind | 2021-12-08 | MMLU | knowledge | 55.05 | 100 | 55.05 |
M011 | Gopher | DeepMind | 2021-12-08 | HumanEval | coding | 41.74 | 100 | 41.74 |
M011 | Gopher | DeepMind | 2021-12-08 | GSM8K | math_grade | 35.71 | 100 | 35.71 |
M011 | Gopher | DeepMind | 2021-12-08 | MATH | math_competition | 7.62 | 100 | 7.62 |
M011 | Gopher | DeepMind | 2021-12-08 | HellaSwag | commonsense | 82.78 | 100 | 82.78 |
M011 | Gopher | DeepMind | 2021-12-08 | ARC-Challenge | reasoning | 65.22 | 100 | 65.22 |
M012 | GLaM | Google | 2021-12-09 | MMLU | knowledge | 61.41 | 100 | 61.41 |
M012 | GLaM | Google | 2021-12-09 | HumanEval | coding | 40.93 | 100 | 40.93 |
M012 | GLaM | Google | 2021-12-09 | GSM8K | math_grade | 35.07 | 100 | 35.07 |
M012 | GLaM | Google | 2021-12-09 | MATH | math_competition | 16.41 | 100 | 16.41 |
M012 | GLaM | Google | 2021-12-09 | HellaSwag | commonsense | 86.14 | 100 | 86.14 |
M012 | GLaM | Google | 2021-12-09 | ARC-Challenge | reasoning | 61.93 | 100 | 61.93 |
M013 | WuDao 2.0 | BAAI | 2021-06-01 | MMLU | knowledge | 63.54 | 100 | 63.54 |
M013 | WuDao 2.0 | BAAI | 2021-06-01 | HumanEval | coding | 30.88 | 100 | 30.88 |
M013 | WuDao 2.0 | BAAI | 2021-06-01 | GSM8K | math_grade | 24.08 | 100 | 24.08 |
M013 | WuDao 2.0 | BAAI | 2021-06-01 | MATH | math_competition | 9.34 | 100 | 9.34 |
M013 | WuDao 2.0 | BAAI | 2021-06-01 | HellaSwag | commonsense | 87.58 | 100 | 87.58 |
M013 | WuDao 2.0 | BAAI | 2021-06-01 | ARC-Challenge | reasoning | 69.39 | 100 | 69.39 |
M014 | ERNIE 3.0 | Baidu | 2021-07-05 | MMLU | knowledge | 49.43 | 100 | 49.43 |
M014 | ERNIE 3.0 | Baidu | 2021-07-05 | HumanEval | coding | 37.44 | 100 | 37.44 |
M014 | ERNIE 3.0 | Baidu | 2021-07-05 | GSM8K | math_grade | 32.22 | 100 | 32.22 |
M014 | ERNIE 3.0 | Baidu | 2021-07-05 | MATH | math_competition | 8.44 | 100 | 8.44 |
M014 | ERNIE 3.0 | Baidu | 2021-07-05 | HellaSwag | commonsense | 80.75 | 100 | 80.75 |
M014 | ERNIE 3.0 | Baidu | 2021-07-05 | ARC-Challenge | reasoning | 66.91 | 100 | 66.91 |
M015 | HyperCLOVA | Naver | 2021-05-25 | MMLU | knowledge | 58.88 | 100 | 58.88 |
M015 | HyperCLOVA | Naver | 2021-05-25 | HumanEval | coding | 24.12 | 100 | 24.12 |
M015 | HyperCLOVA | Naver | 2021-05-25 | GSM8K | math_grade | 16.95 | 100 | 16.95 |
M015 | HyperCLOVA | Naver | 2021-05-25 | MATH | math_competition | 7.25 | 100 | 7.25 |
M015 | HyperCLOVA | Naver | 2021-05-25 | HellaSwag | commonsense | 87.14 | 100 | 87.14 |
M015 | HyperCLOVA | Naver | 2021-05-25 | ARC-Challenge | reasoning | 70.99 | 100 | 70.99 |
M016 | GPT-3.5 (text-davinci-002) | OpenAI | 2022-03-15 | MMLU | knowledge | 77.96 | 100 | 77.96 |
M016 | GPT-3.5 (text-davinci-002) | OpenAI | 2022-03-15 | HumanEval | coding | 61.66 | 100 | 61.66 |
M016 | GPT-3.5 (text-davinci-002) | OpenAI | 2022-03-15 | GSM8K | math_grade | 52.03 | 100 | 52.03 |
M016 | GPT-3.5 (text-davinci-002) | OpenAI | 2022-03-15 | MATH | math_competition | 13.47 | 100 | 13.47 |
M016 | GPT-3.5 (text-davinci-002) | OpenAI | 2022-03-15 | HellaSwag | commonsense | 89.9 | 100 | 89.9 |
M016 | GPT-3.5 (text-davinci-002) | OpenAI | 2022-03-15 | ARC-Challenge | reasoning | 77.56 | 100 | 77.56 |
M016 | GPT-3.5 (text-davinci-002) | OpenAI | 2022-03-15 | TruthfulQA | truthfulness | 27.92 | 100 | 27.92 |
M017 | InstructGPT | OpenAI | 2022-01-27 | MMLU | knowledge | 67.38 | 100 | 67.38 |
M017 | InstructGPT | OpenAI | 2022-01-27 | HumanEval | coding | 59.44 | 100 | 59.44 |
M017 | InstructGPT | OpenAI | 2022-01-27 | GSM8K | math_grade | 58.2 | 100 | 58.2 |
M017 | InstructGPT | OpenAI | 2022-01-27 | MATH | math_competition | 22.63 | 100 | 22.63 |
M017 | InstructGPT | OpenAI | 2022-01-27 | HellaSwag | commonsense | 89.05 | 100 | 89.05 |
M017 | InstructGPT | OpenAI | 2022-01-27 | ARC-Challenge | reasoning | 76.89 | 100 | 76.89 |
M017 | InstructGPT | OpenAI | 2022-01-27 | TruthfulQA | truthfulness | 21.63 | 100 | 21.63 |
M018 | Chinchilla | DeepMind | 2022-03-29 | MMLU | knowledge | 65.77 | 100 | 65.77 |
M018 | Chinchilla | DeepMind | 2022-03-29 | HumanEval | coding | 59.78 | 100 | 59.78 |
M018 | Chinchilla | DeepMind | 2022-03-29 | GSM8K | math_grade | 60.3 | 100 | 60.3 |
M018 | Chinchilla | DeepMind | 2022-03-29 | MATH | math_competition | 22.52 | 100 | 22.52 |
M018 | Chinchilla | DeepMind | 2022-03-29 | HellaSwag | commonsense | 90.73 | 100 | 90.73 |
M018 | Chinchilla | DeepMind | 2022-03-29 | ARC-Challenge | reasoning | 78.5 | 100 | 78.5 |
M018 | Chinchilla | DeepMind | 2022-03-29 | TruthfulQA | truthfulness | 24.08 | 100 | 24.08 |
M019 | PaLM | Google | 2022-04-04 | MMLU | knowledge | 68.77 | 100 | 68.77 |
M019 | PaLM | Google | 2022-04-04 | HumanEval | coding | 57.1 | 100 | 57.1 |
M019 | PaLM | Google | 2022-04-04 | GSM8K | math_grade | 61.39 | 100 | 61.39 |
M019 | PaLM | Google | 2022-04-04 | MATH | math_competition | 18.71 | 100 | 18.71 |
M019 | PaLM | Google | 2022-04-04 | HellaSwag | commonsense | 91.78 | 100 | 91.78 |
M019 | PaLM | Google | 2022-04-04 | ARC-Challenge | reasoning | 77.29 | 100 | 77.29 |
M019 | PaLM | Google | 2022-04-04 | TruthfulQA | truthfulness | 34.66 | 100 | 34.66 |
π LLM Benchmarks & Capabilities 2020β2026
The most comprehensive open dataset tracking the evolution of Large Language Models β from GPT-3 to GPT-5.5, Claude Opus 4.7, Gemini 3.5, and beyond.
π§ Overview
This dataset captures the complete LLM landscape from 2020 to 2026 across five dimensions:
- π€ 113 models from 25+ organizations
- π 17 benchmarks tracking capability growth over time
- π° Monthly API pricing showing 100x+ cost reductions
- βοΈ Training compute estimates validating scaling laws
- π 57 capability milestones marking key inflection points
Designed for ML researchers, AI practitioners, policy analysts, and data scientists working on LLM-related problems β trend analysis, capability forecasting, cost-performance modeling, and competitive intelligence.
π Dataset Files
| File | Rows | Description |
|---|---|---|
models_catalog.csv |
113 | Model metadata: org, release date, params, type, access |
benchmark_scores.csv |
1,276 | Long format: model Γ benchmark Γ score |
pricing_history.csv |
1,187 | Monthly API pricing per model (USD per 1M tokens) |
compute_estimates.csv |
113 | Training FLOPs, GPU hours, cost, energy, CO2 |
capability_milestones.csv |
57 | Major AI events with significance scores |
Total: ~2,750 rows Β· ~250 KB
π’ Organizations Covered
Closed Frontier OpenAI Β· Anthropic Β· Google DeepMind Β· xAI Β· Microsoft
Open Weights Meta Β· DeepSeek Β· Mistral Β· Alibaba Qwen Β· 01.AI Β· TII
Early Era Chinchilla Β· PaLM Β· BLOOM Β· OPT Β· GLaM Β· Switch Transformer
Chinese Frontier WuDao 2.0 Β· ERNIE 3.0 Β· HyperCLOVA Β· YaLM 100B
π Benchmarks Tracked
| Benchmark | Type | Max Score |
|---|---|---|
| MMLU | Knowledge | 100 |
| MMLU-Pro | Knowledge Hard | 100 |
| HumanEval / HumanEval+ | Coding | 100 |
| MBPP | Coding | 100 |
| GSM8K | Math Grade | 100 |
| MATH | Math Competition | 100 |
| AIME 2024 | Math Olympiad | 100 |
| GPQA Diamond | Science PhD | 100 |
| HellaSwag | Commonsense | 100 |
| ARC-Challenge | Reasoning | 100 |
| TruthfulQA | Truthfulness | 100 |
| BBH (BIG-Bench Hard) | Reasoning Hard | 100 |
| SWE-Bench Verified | Agentic Coding | 100 |
| LiveCodeBench | Live Coding | 100 |
| MMMU | Multimodal | 100 |
| Chatbot Arena ELO | Human Eval | 1500 |
β‘ Quick Start
import pandas as pd
# Load models catalog
models = pd.read_csv("hf://datasets/hmnshudhmn24/llm-benchmarks-capabilities-2020-2026/models_catalog.csv")
# Load benchmark scores
scores = pd.read_csv("hf://datasets/hmnshudhmn24/llm-benchmarks-capabilities-2020-2026/benchmark_scores.csv")
# Load pricing history
pricing = pd.read_csv("hf://datasets/hmnshudhmn24/llm-benchmarks-capabilities-2020-2026/pricing_history.csv")
# Load compute estimates
compute = pd.read_csv("hf://datasets/hmnshudhmn24/llm-benchmarks-capabilities-2020-2026/compute_estimates.csv")
# Load capability milestones
milestones = pd.read_csv("hf://datasets/hmnshudhmn24/llm-benchmarks-capabilities-2020-2026/capability_milestones.csv")
print(models.head())
π‘ Suggested Use Cases
- Benchmark Saturation Forecasting β MMLU went from 32% (2020) to 95% (2025). Predict saturation for GPQA Diamond and SWE-Bench.
- Price vs Capability Analysis β Plot Arena ELO vs blended API price. Track Pareto-optimal models per quarter.
- Open vs Closed Model Gap β Measure the capability gap between open weights and closed models across time.
- Scaling Law Validation β Plot training FLOPs vs benchmark scores. Test empirical scaling exponents.
- Reasoning Model Premium β Measure score lift of reasoning models vs chat models on math and coding benchmarks.
- Cost Reduction Trajectory β Track price per Arena ELO point over time. How fast is intelligence-per-dollar growing?
- Competitive Organization Analysis β Head-to-head benchmark matrix per organization per quarter.
- Chinese vs Western Capability Race β DeepSeek, Qwen, ERNIE vs OpenAI, Anthropic, Google.
- Downloads last month
- 88