AI x-risk inbox/reading list/lit review

[ ] Clean up MIRI threat model section
[ ] Make a separate page for canonical resources on each topic
[ ] Make a list of question-reading pairs i.e.:
- [ ] To generate more pairs, look into: Eli Lifland overview of alignment. What does Thomas Larsen think is the main alignment problem? {look at Dylan Hadfield-Mennel here}
- [ ] What is the main MIRI threat model? (Like what are the main considerations going into the expected AI outcome?) Talk about inner alignment, sharp left turn, distribution shift

AI x-risk threat model overview

The alignment problem from a deep learning perspective - Ngo et al.

Eli Tyre on AI risk

Argument for AI x-risk from competent malign agents - Katja Grace @ AI Impacts

Nate Soares talk @ Google

Leopold I Aschenbrenner

Leopold II Aschenbrenner

Superintelligence as a service

AI risk overview: lit review

Nathan barnard: “Why I’ve become much less convinced of AI risk”

Nanda overview of threat models Modelling Transformative AI project

Natural Selection Favors AIs over Humans - Dan Hendrycks

Daniel Eth AI risk overview

AI risk interactive argument

Risks from Learned Optimization - Evan Hubinger et. al

My notes on risks from learned optimisation

The basic reasons I expect AGI ruin - Bensinger

Tweet thread replies

Michael Plant Twitter question and replies

Set Sail For Fail? On AI risk - Nintil

Chalmers Twitter question and replies

Steve Spalding P(doom)

Eliezer 5-step argument that alignment is hard on Twitter

Survey on AI existential risk scenarios - Sam Clarke

Response to Katja Grace's AI x-risk counterarguments - Johannes Treutlein

[Canonical] Without specific countermeasures, the easiest path to transformative AI likely leads to AI takeover - Ajeya Cotra

[Canonical] Why AI alignment could be hard with modern deep learning - Ajeya Cotra

[Canonical] Is Power-Seeking AI an Existential Risk? Joe Carlsmith

[Canonical] AGI Ruin: A List of Lethalities - Yudkowsky

Collection of arguments to expect (outer and inner) alignment failure? - Sam Clarke

Classifying sources of AI x-risk - Sam Clarke

[2306.06924] TASRA: a Taxonomy and Analysis of Societal-Scale Risks from AI (arxiv.org)

Quintin Pope against Carlsmith

Plans for success

**A Playbook for AI Risk Reduction (focused on misaligned AI) - Holden Karnofsky**

Arguments against AI risk

Nora Belrose

Why I am Not An AI Doomer - Sarah Constantin

My Objections to "We’re All Gonna Die with Eliezer Yudkowsky" - Quentin Pope

Scott Alexander

Against Carlsmith - Thorstad

Sammy Martin, Ben Garfinkel on “Scrutinising AI risk arguments”

[Canonical Counterarguments to the basic AI x-risk case - Katja Grace]

Matthew Barnett low P(Doom)

Diminishing Returns in Machine Learning Part 1 - Brian Chau

[Rohit Krishnan]

Scrutinising classic AI risk arguments - Ben Garfinkel

Artificial intelligence is a familiar-looking monster, say Henry Farrell and Cosma Shalizi

To Imagine AI, Imagine No AI - Robin Hanson

Why transformative AI is really, really hard to achieve

[Robin Hanson]

[Tyler Cowen]

https://www.lesswrong.com/tag/object-level-ai-risk-skepticism

Object-Level AI Risk Skepticism - LessWrong

A Contra AI FOOM Reading List

Takeoff speeds - Paul Christiano Takeoff Speeds and Discontinuities - Sammy Martin and Daniel Eth

DeepMind Threat Models

Refining the Sharp Left Turn threat model, part 1: claims and mechanisms - Deepmind

DeepMind team - threat models and plans

Threat Model Literature Review - Deepmind

Clarifying AI X-risk - Deepmind [higher level]

Value Learning Sequence - Rohin Shah

Things I am personally confused about

Literature Review on Goal-Directedness - Adah Shimi

Deconfusing goal-directedness - Adam Shimi

What is most confusing to you about AI stuff? - Sam Clarke

AI strategy

AI strategy nearcasting - Karnofsky

**How might we align transformative AI if it’s developed very soon? - Karnofsky**

Important, actionable research questions for the most important century -Karnofsky

Personal thoughts on careers in AI strategy and policy - Carrick Flynn

Alignment or threat model misconceptions

Reward is not the optimisation target - Alex Turner

Talking about inner and outer alignment - Rohin Shah

AI risk specifics

The Pointers Problem: Human Values Are A Function Of Humans' Latent Variables - John Wentworth

Evolution provides no evidence for the sharp left turn - Quintin Pope

What Multipolar Failure Looks Like, and Robust Agent-Agnostic Processes (RAAPs) - Andrew Critch

Advanced artificial agents intervene in the provision of reward - [Michael K. Cohen](https://onlinelibrary.wiley.com/authored-by/Cohen/Michael+K.), Marcus Hutter, [Michael A. Osborne](https://onlinelibrary.wiley.com/authored-by/Osborne/Michael+A.)

AGI safety career advice - Richard Ngo

**Think carefully before calling RL policies "agents" - Alex Turner**

Infosec

Schneier on security mindset

AI risk arguments

General heuristics (AI as a weapon, second-species)
Arguments about intelligence/optimisiation

Heavy tailedness

Max Daniel on heavy tailedness

Ben Kuhn: “Light-tailed distributions most often occur because the outcome is the result of many independent contributions, while heavy-tailed distributions often arise from the result of processes that are multiplicative or self-reinforcing.

Misc.

The Long-Term Significance of Reducing Global Catastrophic Risks - Nick Beckstead