Field journal · Oxford Internet Institute51°45′07″N · 01°15′17″W

Entry (1) · headword

OxRML

/ˈɒks.ɹəm.l̩/

n.< ENDONYM < Oxford + R(easoning) + M(achines) + L(ab)

Reasoning with Machines Lab, a research group at the University of Oxford; cf. also Oxford Internet Institute.

(1a)
mission

Weadvancethescienceofmachinereasoning.wead-vancethescien-ceofmachin-ereason-ing1PLV.PRSDEFNGENNN.NMLZwepush.forwardtheepistemeofmachineinference

‘We advance the science of machine reasoning.’

Abstract

An empirical research group at the Oxford Internet Institute. We study LLM evaluation, safety, reasoning, and the agentic systems built from them.

[v.]Partner with us [v.]Read our research

Keywordsevaluationsafetyagentic systemshuman–LLM interactionreasoningbenchmarks

(2)

Programme of inquiry

programmeofinquiryNGENN.NMLZ

Four research currents, parsed below as noun phrases. The tree structure is not decorative: it shows what modifies what, and the order of attachment. Each line runs over years, not quarters.

(2a)[EVAL]

Benchmarks and Evaluation

We develop the science of LLM evaluation: how to measure what models do, where current benchmarks mislead, and how to build ones that hold up.

cf. Measuring what matters (NeurIPS D&B, 2025); LingOly-TOO (ICLR, 2026).

(2b)[SAFE]

AI Safety and Security

Bias, toxicity, agentic misalignment. We study where AI fails and build the technical and governance tools that address those failures.

cf. TRAP (ICML, 2025); DPO Reduces Toxicity (EMNLP, 2025).

(2c)[AGNT]

Agentic AI for Science

We build agentic systems for scientific knowledge synthesis and discovery. The work is on keeping these agents reliable, transparent, and grounded in their domain.

cf. Strategic Navigation or Stochastic Search? (ICML Spotlight, 2026).

(2d)[HMAI]

Human–AI Interaction

We run empirical studies on how people use AI for high-stakes decisions in healthcare, law, and policy.

cf. Reliability of LLMs as medical assistants (Nature Medicine, 2026).

(3)

Corpus & references

corpusandreferencesN.PLCONJN.PL

Below: three featured papers with a full morphological parse of their load-bearing terms, then a denser reference list of the recent corpus.

(3a)

Fig. 1

Strategic Navigation or Stochastic Search? How Agents and Humans Reason Over Document Collections

Ł Borchmann, J Van Landeghem, M Turski, S Padarha, RO Kearns, A Mahdi, et al.

ICML (Spotlight)·May 2026

Morphological analysis

StrategicNavigationorStochasticSearch?strat-eg-ic  stochast-ic ADJ  ADJ goal-directed  random-walk 

A benchmark that tells real navigation apart from stochastic search when agents work over document collections.

+Benchmarks and Evaluation+Agentic AI

(3b)

Fig. 2

A Positive Case for Faithfulness: LLM Self-Explanations Help Predict Model Behavior

H Mayne, JS Kang, D Gould, K Ramchandran, A Mahdi, NY Siegel

ICML·May 2026

Morphological analysis

FaithfulnessSelf-Explanationsfaith-ful-nessself-explan-ation-sN.NMLZN.PLfidelityauto.gloss

LLM self-explanations are usually dismissed as unreliable. Measured the right way, they predict model behavior.

+AI Safety and Alignment+Benchmarks and Evaluation

(3c)

Fig. 3

It's a TRAP! Task-Redirecting Agent Persuasion Benchmark for Web Agents

K Korgul, Y Yang, A Drohomirecki, P Błaszczyk, W Howard, L Aichberger, C Russell, P H S Torr, A Mahdi, A Bibi

ICML·May 2025

Morphological analysis

Task-RedirectingPersuasiontask-re-direct-ingper-suad-ionADJ.PTCPN.NMLZgoal.hijackingsocial.engineering

A benchmark for whether web agents can be socially engineered into abandoning the user's task. Today's agents fall for it.

+Benchmarks and Evaluation+Agentic AI+AI Safety and Alignment

Further entries (3d) to (3j)

(3d)

Reliability of LLMs as medical assistants for the general public: a randomized preregistered study

AM Bean, RE Payne, G Parsons, HR Kirk, J Ciro, R Mosquera-Gómez, S Hincapié, AS Ekanayaka, L Tarassenko, L Rocher, A Mahdi

Nature Medicine
February 2026

(3e)

Measuring what matters: Construct validity in large language model benchmarks

AM Bean, RO Kearns, A Romanou, FS Hafner, H Mayne, J Batzner, et al.

NeurIPS Datasets and Benchmarks
November 2025

(3f)

Evaluating LLM-as-a-Judge under Multilingual, Multimodal and Multi-domain Constraints

S Padarha, E Semenova, B Vidgen, A Mahdi, S A Hale

NeurIPS LLM Lifecycle Workshop
November 2025

(3g)

How Does DPO Reduce Toxicity? A Mechanistic Neuron-Level Analysis

Y Yang, F Sondej, H Mayne, A Lee, A Mahdi

EMNLP
November 2025

(3h)

LLMs Don't Know Their Own Decision Boundaries: The Unreliability of Self-Generated Counterfactual Explanations

H Mayne, RO Kearns, Y Yang, AM Bean, E Delaney, C Russell, A Mahdi

EMNLP
September 2025

(3i)

Review of multimodal machine learning approaches in healthcare

F Krones, U Marikkar, G Parsons, A Szmul, A Mahdi

Information Fusion
February 2025

(3j)

LingOly-TOO: Disentangling Reasoning from Knowledge with Templatised Orthographic Obfuscation

J Khouja, K Korgul, S Hellsten, L Yang, V Neacsu, H Mayne, RO Kearns, A Bean, A Mahdi

ICLR
April 2026

(4)

Informants

in-form-ant-sN.AGT.PL‘collaborating speakers’

The lab roster, set as a field-journal informant register. Each entry is tagged with a feature bundle parsed from their focus, and a register code: DPH (DPhil), MSC (MSc), VIS (visiting), AFF (affiliate).

(4a)

Felix Krones

[DPH]

DPhil Student

[+MMOD, +MED]

spec. Multimodal AI, digital health

(4b)

Djavan De Clercq

[DPH]

DPhil Student

[+FOOD, +LLM]

spec. AI and food security, LLMs

(4c)

Andrew M. Bean

[DPH]

DPhil Student

[+EVAL, +HMAI, +LLM]

spec. LLM evaluations, human–LLM interaction

(4d)

Yushi Yang

[DPH]

DPhil Student

[+ALIGN, +AGNT, +POST, +LLM]

spec. LLM & agentic post-training, AI alignment

(4e)

Harry Mayne

[DPH]

DPhil Student

[+INTERP, +SAFE, +EVAL, +LLM]

spec. LLM interpretability, AI safety, LLM evaluations

(4f)

Jessica Rodrigues

[DPH]

DPhil Student

[+KG, +META]

spec. Knowledge graphs, metascience

(4g)

Guy Parsons

[DPH]

DPhil Student

[+MED]

spec. Healthcare AI, digital health

(4h)

Karolina Korgul

[DPH]

DPhil Student

[+SAFE, +AGNT]

spec. AI safety, agentic AI

(4i)

Ryan Othniel Kearns

[DPH]

DPhil Student

[+EVAL, +REAS, +META, +LLM]

spec. Science of evals, reasoning in LLMs

(4j)

Shreyansh Padarha

[DPH]

DPhil Student

[+SAFE, +EVAL, +LLM]

spec. AI for science, AI safety, LLM evaluations

(4k)

Mia Kussman

[MSC]

MSc Student

[+EVAL, +HMAI, +LLM]

spec. Human–LLM interaction, LLM evaluations

(4l)

Caleb Tan

[MSC]

MSc Student

[+EVAL, +REAS, +LLM]

spec. LLM evaluations, reasoning

(4m)

Sebastian Petric

[VIS]

Visiting Policy Fellow

[+FIN, +LLM]

spec. LLMs and financial time series

(4n)

Tristan Naidoo

[AFF]

Research Affiliate

[+MED, +EVAL, +PUBHL, +LLM]

spec. Public health AI, LLM evaluations

Feature key
EVAL = evaluation · SAFE = safety · AGNT = agentic · HMAI = human–AI · INTERP = interpretability · ALIGN = alignment · MED = healthcare · LLM = LLM core · REAS = reasoning · MMOD = multimodal · POST = post-training · META = metascience · FIN = finance · PUBHL = public health.

(5)

Fieldwork & engagement

field-workanden-gage-mentNCONJN.NMLZ

Three modalities by which industry, government, and foundation partners work with the lab.

(5a)

work-shop-sN.PLtraining.sessions

Workshops for industry teams

On-site sessions for product and ML teams on evaluation, safety, and agent reliability.

Half-day to multi-week formats. For teams shipping LLM products in healthcare, finance, retail, and government.

Book a workshop

(5b)

co-build-sN.PLjoint.constructions

Tools co-built with engineering partners

We work with engineering partners to turn lab work into tools other teams can run.

Evaluation harnesses, safety dashboards, agentic-research platforms. We build them with partners we trust, carrying the research methods through to the code.

See our builds

(5c)

part-ner-ship-sN.NMLZ.PLmulti-year.alliances

Research partnerships

Applied research collaborations with foundations, governments, and large companies.

Multi-year programmes: shared roadmaps, sponsored DPhil studentships, named labs.

Start a conversation

Consulted parties & venues

[01]University of Oxford · Host institution
[02]Oxford Internet Institute · Affiliated department
[03]Nature Medicine · Published 2026
[04]ICML · Spotlight & papers, 2026
[05]NeurIPS · Datasets & Benchmarks, 2025
[06]ICLR · Accepted, 2026
[07]EMNLP · Multiple, 2025

Benchmarks and Evaluation

AI Safety and Security

Agentic AI for Science

Human–AI Interaction

Strategic Navigation or Stochastic Search? How Agents and Humans Reason Over Document Collections

A Positive Case for Faithfulness: LLM Self-Explanations Help Predict Model Behavior

It's a TRAP! Task-Redirecting Agent Persuasion Benchmark for Web Agents

Felix Krones

Djavan De Clercq

Andrew M. Bean

Yushi Yang

Harry Mayne

Jessica Rodrigues

Guy Parsons

Karolina Korgul

Ryan Othniel Kearns

Shreyansh Padarha

Mia Kussman

Caleb Tan

Sebastian Petric

Tristan Naidoo

Workshops for industry teams

Tools co-built with engineering partners

Research partnerships

A quarterly note from the lab. Nothing else.