for AI becoming autonomous and stubborn — untangling two fundamental dynamics of AI progress


Epistemic status

I have about Probably yes (~80%) confidence that the world is moving in this direction. After reading Nate Soares's essay Ability to solve long-horizon tasks correlates with wanting things in the behaviorist sense in summer 2024, many of my key uncertainties were resolved. I've been processing these insights since then.

That said, there are a few assumptions, uncertainties, or counterarguments that could invalidate this. You can read more in the bullets below. This isn’t a comprehensive list—just an exercise to expand thinking and nudge myself out of confirmation bias and similar dynamics.