Reasoning · with · Machines

We study
evaluation, safety & reasoning
in machines.

An empirical research group at the Oxford Internet Institute. We study LLM evaluation, safety, reasoning, and the agentic systems built from them.

Portrait of Prof. Adam Mahdi
Prof. Adam MahdiPrincipal Investigator

Oxford Internet Institute, University of Oxford

Adam leads OxRML. The group studies how language models reason, how people work with them, and how agentic systems behave on real scientific and decision-making tasks. He won the Oxford Teaching Excellence Award in 2025.

Radcliffe Camera, Oxford
Radcliffe Camera · Oxford
51.7548° N1.2544° W
§ 01Four research areas

Our research.

Four themes the lab works across. Every paper, collaboration, and DPhil project sits somewhere in this frame.

EDawn, first principles
01/04
Evaluation

Benchmarks and Evaluation

We work on the science of LLM evaluation: what benchmarks measure, where they mislead, and how to build ones that hold up.

SSky, open horizon
02/04
Safety

AI Safety and Security

We work on bias, toxicity, and agentic misalignment, and on the technical and governance tools that address them.

WEvening, sustained inquiry
03/04
Agentic

Agentic AI for Science

Agentic systems for scientific work. We focus on keeping them reliable, transparent, and grounded in the domain.

NNight, the long watch
04/04
Human-AI

Human–AI Interaction

Empirical studies of how people use AI in high-stakes settings: healthcare, law, and policy.

§ 02Selected works

Recent publications.

See all papers
§ 03The team

The people behind the work.

DPhils, MSc students, and visiting researchers. Each focuses on one of the four areas above.

Meet the full lab
01Felix Krones

Felix Krones

DPhil Student

Multimodal AI, digital health

02Djavan De Clercq

Djavan De Clercq

DPhil Student

AI and food security, LLMs

03Andrew M. Bean

Andrew M. Bean

DPhil Student

LLM evaluations, human–LLM interaction

04Yushi Yang

Yushi Yang

DPhil Student

LLM & agentic post-training, AI alignment

05Harry Mayne

Harry Mayne

DPhil Student

LLM interpretability, AI safety, LLM evaluations

06Jessica Rodrigues

Jessica Rodrigues

DPhil Student

Knowledge graphs, metascience

07Guy Parsons

Guy Parsons

DPhil Student

Healthcare AI, digital health

08Karolina Korgul

Karolina Korgul

DPhil Student

AI safety, agentic AI

§ 04Work with us

Three ways to work with the lab.

We work with foundations, governments, and enterprises that take AI seriously and have patience for empirical work.

Pillar 01

Workshops for industry teams

On-site sessions for product and ML teams on evaluation, safety, and agent reliability.

Half-day to multi-week formats. For teams shipping LLM products in healthcare, finance, retail, and government.

Book a workshop
Pillar 02

Tools co-built with engineering partners

We work with engineering partners to turn lab work into tools other teams can run.

Evaluation harnesses, safety dashboards, agentic-research platforms. We build them with partners we trust, carrying the research methods through to the code.

See our builds
Pillar 03

Research partnerships

Applied research collaborations with foundations, governments, and large companies.

Multi-year programmes: shared roadmaps, sponsored DPhil studentships, named labs.

Start a conversation

Where the work has been received

  • University of OxfordHost institution
  • Oxford Internet InstituteAffiliated department
  • Nature MedicinePublished 2026
  • ICMLSpotlight & papers, 2026
  • NeurIPSDatasets & Benchmarks, 2025
  • ICLRAccepted, 2026
  • EMNLPMultiple, 2025

The lab newsletter

A quarterly note from the lab. Nothing else.

New papers, open positions, partnership opportunities, and what we have been reading.

Unsubscribe in one click. We never share your email.