No. 01 · Instrument
Evaluation
Benchmarks and Evaluation
We work on the science of LLM evaluation: what benchmarks measure, where they mislead, and how to build ones that hold up.
What this partition defines
The science of how a benchmark is built, what it claims to measure, and what it actually captures.
























