person

Hjalmar Wijk

METR researcher; AI R&D evaluations

Researcher at METR; co-author of evaluation studies on whether frontier models can autonomously do AI R&D tasks at the level of professional researchers.

current Researcher, METR

Strategy positions

Evals-drivenendorses

Capability/risk evals gate deployment; evals are the load-bearing artefact

Argues standardized AI-R&D benchmarks, where models are evaluated on the very work that would fuel recursive self-improvement, are an important safety signal we currently lack.

We measure how well frontier models can perform AI R&D tasks compared to human researchers. The gap is closing in some specific dimensions and that is what an early-warning system should be tracking.

¶ articleRE-Bench: Evaluating frontier AI R&D capabilities of language model agents against human experts· METR· 2024· faithful paraphrase

Closest strategy neighbours

by jaccard overlap

Other people whose strategy tags overlap with Hjalmar Wijk's. Overlap is on tag identity, not stance; opposites can show up if they reference the same tags.

Aleksander Mądry
shared 1 · J=1.00
MIT; ex-OpenAI head of preparedness
Alex Meinke
shared 1 · J=1.00
Apollo Research; deceptive alignment evaluations
Ali Rahimi
shared 1 · J=1.00
Google Brain ML researcher; 'Alchemy' speech
Anna Rogers
shared 1 · J=1.00
IT University of Copenhagen; LLM benchmarking critique
Arati Prabhakar
shared 1 · J=1.00
White House OSTP director (2022–2025)
Beth Barnes
shared 1 · J=1.00
Founder of METR; dangerous capability evaluations

Record last updated 2026-04-25.