AGI Strategies

strategy tag

Security mindset.

Treat safety as adversarial security; assume systems break under attack

stated endorsers

6

no opposers yet

profiled endorsers

0

248 on the board total

endorser p(doom)

·

no estimates on record

quotes by endorsers

6

just for this tag

People on the record.

6

Daniel Kang

UIUC; LLM agents and AI security

endorses

Argues LLM agents are already capable enough to weaponize publicly disclosed vulnerabilities; calls for evaluations and red-team frameworks that match the speed of capability progress.

We show that GPT-4 agents can autonomously exploit one-day vulnerabilities in real-world systems with high success rates given just a CVE description. The capability gap is closing faster than security research is.
§ paperLLM Agents can Autonomously Exploit One-day Vulnerabilities· arXiv· 2024-04· faithful paraphrase
Nicholas Carlini

Nicholas Carlini

Anthropic adversarial-ML researcher; ex-Google Brain

endorses

Argues ML systems are routinely broken by simple attacks and that the field treats safety claims with insufficient adversarial scrutiny.

I think the difficulty of attacking machine learning models is grossly overestimated, and the difficulty of defending them grossly underestimated.
blogWhy I attack neural networks· nicholas.carlini.com· 2024· faithful paraphrase

Nicolas Papernot

U Toronto / Vector Institute; ML privacy and security

endorses

Argues that the training data, model, and deployment surface of ML systems each need security analysis as rigorous as that applied to mature software systems.

Machine learning is software. The same threat models that govern software supply chains apply, but with the additional surface of the data pipeline.
articleNicolas Papernot, research page· papernot.fr· 2024· faithful paraphrase

Riley Goodside

Scale AI; prompt engineering pioneer

endorses

Argues prompt-engineering insights are inseparable from security research; many widely cited LLM failure modes were first surfaced through informal prompt experiments rather than formal evaluation.

“Exploiting GPT-3 prompts with malicious inputs that order the model to ignore its previous directions.”

Context: First publicly documented prompt-injection attack against GPT-3.

tweetPrompt injection on GPT-3· X· 2022-09-12· direct quote
Simon Willison

Simon Willison

Independent developer; co-creator of Django; LLM tools

endorses

Argues prompt injection is a structurally unfixable vulnerability in current LLM architectures and that any application that mixes trusted instructions with untrusted input has a security defect by design.

Prompt injection isn't a 'we'll fix it later' bug. It's a fundamental property of how these models work, and we have to design applications around the assumption that it can't be patched away.
blogPrompt injection: What's the worst that can happen?· simonwillison.net· 2023· faithful paraphrase

Vitaly Shmatikov

Cornell Tech; ML privacy and security

endorses

Argues ML systems leak training data in predictable ways; the field treats privacy as an afterthought when it should be foundational.

We can extract verbatim training examples from large language models with no special access. Privacy in ML is not a future problem; it is a present, pervasive failure.
§ paperExtracting Training Data from Large Language Models· arXiv / USENIX Security· 2021· faithful paraphrase