AI x-risk threat model overview

The alignment problem from a deep learning perspective - Ngo et al.

Eli Tyre on AI risk

Argument for AI x-risk from competent malign agents - Katja Grace @ AI Impacts

Nate Soares talk @ Google

Leopold I Aschenbrenner

Leopold II Aschenbrenner

Superintelligence as a service

AI risk overview: lit review

Nathan barnard: “Why I’ve become much less convinced of AI risk”

Nanda overview of threat models Modelling Transformative AI project

Natural Selection Favors AIs over Humans - Dan Hendrycks

Daniel Eth AI risk overview

AI risk interactive argument

Risks from Learned Optimization - Evan Hubinger et. al

My notes on risks from learned optimisation

The basic reasons I expect AGI ruin - Bensinger

Tweet thread replies

Michael Plant Twitter question and replies

Set Sail For Fail? On AI risk - Nintil

Chalmers Twitter question and replies

Steve Spalding P(doom)

Eliezer 5-step argument that alignment is hard on Twitter

Survey on AI existential risk scenarios - Sam Clarke

Response to Katja Grace's AI x-risk counterarguments - Johannes Treutlein

[Canonical] Without specific countermeasures, the easiest path to transformative AI likely leads to AI takeover - Ajeya Cotra

[Canonical] Why AI alignment could be hard with modern deep learning - Ajeya Cotra

[Canonical] Is Power-Seeking AI an Existential Risk? Joe Carlsmith

[Canonical] AGI Ruin: A List of Lethalities - Yudkowsky

Collection of arguments to expect (outer and inner) alignment failure? - Sam Clarke

Classifying sources of AI x-risk - Sam Clarke

Quintin Pope against Carlsmith

Plans for success

A Playbook for AI Risk Reduction (focused on misaligned AI) - Holden Karnofsky

Arguments against AI risk

Nora Belrose

Why I am Not An AI Doomer - Sarah Constantin

My Objections to "We’re All Gonna Die with Eliezer Yudkowsky" - Quentin Pope

Scott Alexander

Against Carlsmith - Thorstad

Sammy Martin, Ben Garfinkel on “Scrutinising AI risk arguments”

[Canonical Counterarguments to the basic AI x-risk case - Katja Grace]

Matthew Barnett low P(Doom)

Diminishing Returns in Machine Learning Part 1 - Brian Chau

[Rohit Krishnan]

Scrutinising classic AI risk arguments - Ben Garfinkel

Artificial intelligence is a familiar-looking monster, say Henry Farrell and Cosma Shalizi

To Imagine AI, Imagine No AI - Robin Hanson

Why transformative AI is really, really hard to achieve

[Robin Hanson]

[Tyler Cowen]

https://www.lesswrong.com/tag/object-level-ai-risk-skepticism

Object-Level AI Risk Skepticism - LessWrong

A Contra AI FOOM Reading List

AI Forecasting

Katja Grace algorithmic progress

Davidson report on explosive economic growth

Roodman on bioanchors

Transformative AGI by 2043 is <1% likely - Ted Sanders

Will AI cause explosive economic growth? Dietrich Vollrath

Matthew Barnett

Evaluating the historical value misspecification argument

ARC/OP Threat Model

Is the Universal Prior Malign? - Christiano

Worst case guarentees - Christiano

The Solomonoff Prior is Malign - Mark Xu

Without specific countermeasures, the easiest path to transformative AI likely leads to AI takeover - Ajeya Cotra

Takeoff speeds - Paul Christiano Takeoff Speeds and Discontinuities - Sammy Martin and Daniel Eth

DeepMind Threat Models

Refining the Sharp Left Turn threat model, part 1: claims and mechanisms - Deepmind

DeepMind team - threat models and plans

Threat Model Literature Review - Deepmind

Clarifying AI X-risk - Deepmind [higher level]

Value Learning Sequence - Rohin Shah

Things I am personally confused about

Literature Review on Goal-Directedness - Adah Shimi

Deconfusing goal-directedness - Adam Shimi

What is most confusing to you about AI stuff? - Sam Clarke

AI strategy

AI strategy nearcasting - Karnofsky

How might we align transformative AI if it’s developed very soon? - Karnofsky

Important, actionable research questions for the most important century -Karnofsky

Personal thoughts on careers in AI strategy and policy - Carrick Flynn

Alignment or threat model misconceptions

Reward is not the optimisation target - Alex Turner

Talking about inner and outer alignment - Rohin Shah

AI risk specifics

The Pointers Problem: Human Values Are A Function Of Humans' Latent Variables - John Wentworth

Evolution provides no evidence for the sharp left turn - Quintin Pope

Yudkowsky contra Pope

What Multipolar Failure Looks Like, and Robust Agent-Agnostic Processes (RAAPs) - Andrew Critch

Advanced artificial agents intervene in the provision of reward - [Michael K. Cohen](https://onlinelibrary.wiley.com/authored-by/Cohen/Michael+K.)Marcus Hutter, [Michael A. Osborne](https://onlinelibrary.wiley.com/authored-by/Osborne/Michael+A.)

AGI safety career advice - Richard Ngo

Think carefully before calling RL policies "agents" - Alex Turner

Infosec

Schneier on security mindset

Technical alignment field overview

Buck Shlegeris talk the state of alignment

Eli Lifland overview of alignment

How do we become confident in the safety of a machine learning system? - Evan Hubinger

An overview of 11 proposals for building safe advanced AI - Hubinger

Scalable oversight - @anthrupad

Mech interp - @anthrupad

Alignment theory - @anthrupad

Deceptive alignment

Gradient hacking - Richard Ngo

Distillation of “How likely is deceptive alignment?”

How likely is deceptive alignment? - Hubinger

Monitoring for deceptive alignment - Hubinger

Steinhardt Blog posts

Measurement, Optimization, and Take-off Speed

MIRI Threat Models

Why alignment is hard, and where to start - Yudkowsky talk

The Rocket Alignment Problem - Yudkowsky

Arbital summary - Yudkowsky

The Sequences - Yudkowsky