strategy tag
Security mindset.
Treat safety as adversarial security; assume systems break under attack
stated endorsers
6
no opposers yet
profiled endorsers
0
248 on the board total
endorser p(doom)
·
no estimates on record
quotes by endorsers
6
just for this tag
People on the record.
6Daniel Kang
UIUC; LLM agents and AI security
Argues LLM agents are already capable enough to weaponize publicly disclosed vulnerabilities; calls for evaluations and red-team frameworks that match the speed of capability progress.
We show that GPT-4 agents can autonomously exploit one-day vulnerabilities in real-world systems with high success rates given just a CVE description. The capability gap is closing faster than security research is.

Nicholas Carlini
Anthropic adversarial-ML researcher; ex-Google Brain
Argues ML systems are routinely broken by simple attacks and that the field treats safety claims with insufficient adversarial scrutiny.
I think the difficulty of attacking machine learning models is grossly overestimated, and the difficulty of defending them grossly underestimated.
Nicolas Papernot
U Toronto / Vector Institute; ML privacy and security
Argues that the training data, model, and deployment surface of ML systems each need security analysis as rigorous as that applied to mature software systems.
Machine learning is software. The same threat models that govern software supply chains apply, but with the additional surface of the data pipeline.
Riley Goodside
Scale AI; prompt engineering pioneer
Argues prompt-engineering insights are inseparable from security research; many widely cited LLM failure modes were first surfaced through informal prompt experiments rather than formal evaluation.
“Exploiting GPT-3 prompts with malicious inputs that order the model to ignore its previous directions.”
Context: First publicly documented prompt-injection attack against GPT-3.

Simon Willison
Independent developer; co-creator of Django; LLM tools
Argues prompt injection is a structurally unfixable vulnerability in current LLM architectures and that any application that mixes trusted instructions with untrusted input has a security defect by design.
Prompt injection isn't a 'we'll fix it later' bug. It's a fundamental property of how these models work, and we have to design applications around the assumption that it can't be patched away.
Vitaly Shmatikov
Cornell Tech; ML privacy and security
Argues ML systems leak training data in predictable ways; the field treats privacy as an afterthought when it should be foundational.
We can extract verbatim training examples from large language models with no special access. Privacy in ML is not a future problem; it is a present, pervasive failure.