Reasoning with Machines Lab · University of OxfordIssue No. 18 · MMXXVI

A public-interest lab studying how machines reason.

The Reasoning with Machines Lab is a research group at the Oxford Internet Institute. We work on the science of evaluating language models and agentic systems, on AI safety, and on how these systems are used in healthcare, law, and public life.

Partner with us Read our research

Radcliffe Camera, the Bodleian Libraries, University of Oxford. — Radcliffe Camera, Bodleian Libraries, OxfordPhotograph · Public Domain

In residence: 15 researchers
Most recent venue: ICML (Spotlight); May 2026
Published works: 10+ this cycle
Open positions: DPhil, 2026 entry; Rolling applications

Index · In press

ICML (Spotlight)May 2026ICMLMay 2026ICMLMay 2025Nature MedicineFebruary 2026NeurIPS Datasets and BenchmarksNovember 2025NeurIPS LLM Lifecycle WorkshopNovember 2025EMNLPNovember 2025EMNLPSeptember 2025Information FusionFebruary 2025ICLRApril 2026ICML (Spotlight)May 2026ICMLMay 2026ICMLMay 2025Nature MedicineFebruary 2026NeurIPS Datasets and BenchmarksNovember 2025NeurIPS LLM Lifecycle WorkshopNovember 2025EMNLPNovember 2025EMNLPSeptember 2025Information FusionFebruary 2025ICLRApril 2026

Programmes

Four long-running questions

We work on four questions. Each is published in the open.

We treat language models the way a public laboratory treats any new instrument: measure first, theorise second, publish the full method so others can check the work.

Open methods
Open code & data
Open critique

§ 1.1
Evaluation
Benchmarks and Evaluation
We work on the science of LLM evaluation: what benchmarks measure, where they mislead, and how to build ones that hold up.
RecentStrategic Navigation or Stochastic Search? How Agents and Humans Reason Over Document Collections
§ 1.2
Safety
AI Safety and Security
We work on bias, toxicity, and agentic misalignment, and on the technical and governance tools that address them.
RecentA Positive Case for Faithfulness: LLM Self-Explanations Help Predict Model Behavior
§ 1.3
Agentic
Agentic AI for Science
Agentic systems for scientific work. We focus on keeping them reliable, transparent, and grounded in the domain.
RecentStrategic Navigation or Stochastic Search? How Agents and Humans Reason Over Document Collections
§ 1.4
Human-AI
Human–AI Interaction
Empirical studies of how people use AI in high-stakes settings: healthcare, law, and policy.

The same systems now answering medical questions, drafting policy, and mediating public information should be measured with the same care as any other tool that affects public life.

– Lab statement of purpose

In focus

Featured publication, this cycle

ICML (Spotlight)May 2026

Strategic Navigation or Stochastic Search? How Agents and Humans Reason Over Document Collections

A benchmark that tells real navigation apart from stochastic search when agents work over document collections.

Ł Borchmann, J Van Landeghem, M Turski, S Padarha, RO Kearns, A Mahdi, et al.

Benchmarks and EvaluationAgentic AI

Read the paper

Recent publications

10 works · 2025–2026

What we have been publishing.

Peer-reviewed papers at ICML, NeurIPS, ICLR, EMNLP, and Nature Medicine. The lab keeps no private benchmarks. Read the work.

A complete bibliography, including pre-prints and unreviewed working papers, is published on the lab's GitHub.

See full bibliography

People of the lab

15 researchers in residence

A small group of researchers, working in the open.

Principal Investigator

Prof. Adam Mahdi

Adam leads OxRML. The group studies how language models reason, how people work with them, and how agentic systems behave on real scientific and decision-making tasks. He won the Oxford Teaching Excellence Award in 2025.

Oxford Internet Institute, University of Oxford

Director2026

Researchers, DPhil students, & affiliates

Alphabetical, by focus

Felix KronesDPhil StudentMultimodal AI, digital health
Djavan De ClercqDPhil StudentAI and food security, LLMs
Andrew M. BeanDPhil StudentLLM evaluations, human–LLM interaction
Yushi YangDPhil StudentLLM & agentic post-training, AI alignment
Harry MayneDPhil StudentLLM interpretability, AI safety, LLM evaluations
Jessica RodriguesDPhil StudentKnowledge graphs, metascience
Guy ParsonsDPhil StudentHealthcare AI, digital health
Karolina KorgulDPhil StudentAI safety, agentic AI
Ryan Othniel KearnsDPhil StudentScience of evals, reasoning in LLMs
Shreyansh PadarhaDPhil StudentAI for science, AI safety, LLM evaluations
Mia KussmanMSc StudentHuman–LLM interaction, LLM evaluations
Caleb TanMSc StudentLLM evaluations, reasoning
Sebastian PetricVisiting Policy FellowLLMs and financial time series
Tristan NaidooResearch AffiliatePublic health AI, LLM evaluations

How to work with the lab

Open to partners worldwide

We collaborate with people who care about getting AI right.

Three ways in. Each begins with a conversation, and each ends with published outputs the public can read. We do not run NDAs over findings; we work on questions that benefit from being in the open.

Workshops for industry teams
On-site sessions for product and ML teams on evaluation, safety, and agent reliability.
Half-day to multi-week formats. For teams shipping LLM products in healthcare, finance, retail, and government.
Book a workshop
Tools co-built with engineering partners
We work with engineering partners to turn lab work into tools other teams can run.
Evaluation harnesses, safety dashboards, agentic-research platforms. We build them with partners we trust, carrying the research methods through to the code.
See our builds
Research partnerships
Applied research collaborations with foundations, governments, and large companies.
Multi-year programmes: shared roadmaps, sponsored DPhil studentships, named labs.
Start a conversation

On the public record

University of OxfordHost institution
Oxford Internet InstituteAffiliated department
Nature MedicinePublished 2026
ICMLSpotlight & papers, 2026
NeurIPSDatasets & Benchmarks, 2025
ICLRAccepted, 2026
EMNLPMultiple, 2025

A public-interest lab studying how machines reason.

We work on four questions. Each is published in the open.

Benchmarks and Evaluation

AI Safety and Security

Agentic AI for Science

Human–AI Interaction

Strategic Navigation or Stochastic Search? How Agents and Humans Reason Over Document Collections

What we have been publishing.

A small group of researchers, working in the open.

Researchers, DPhil students, & affiliates

We collaborate with people who care about getting AI right.

Workshops for industry teams

Tools co-built with engineering partners

Research partnerships

On the public record

Quarterly updates from the lab, no noise.