AI becoming autonomous and stubborn


There are two key dynamics in AI development that many researchers have explored and considered, but that don’t seem to fully reach other audiences. I think understanding these is crucial to grasping where AI is headed and what that means for the world.

I remember seeing this topic differently, and then something finally shifted after reading this article. This is my attempt to explain these ideas in a way that feels intuitive to people across different backgrounds, levels of technical expertise, and perspectives on AI. My hope is that more people will start thinking about these dynamics and recognize the work needed to steer AI toward a safer, more coordinated future.

I’m fairly certain this is the general direction we’re headed, but there are still assumptions and uncertainties. I discuss these in more detail in the Epistemic Status and Antithesis page.

Epistemic status and antithesis


So to wrap it in one paragraph… People are gonna build AIs that are agentic—ai that will run autonomously and exhibit goal-directed, purposeful behavior. We are after this simply because it free up human time and attention. Well… this work is very much underway. Examples: AI Digests lets you use an agent to demo buying items from a website / Anthropic is enabling agents to interact with computers directly / Open AI’s Deep Research completes multi-step research tasks / Hypothetical Minds research on developing agent's theory of mind to collaborate and compete with other agents / Metta AI training agents to care in socially complex environments / METR research on when AI systems will be able to independently complete long-horizon projects / Manus is building an AI agent that “bridges minds and actions”. At the same time, AIs will exhibit what I call a stubborn property. Through trial and error, these agents will try to overcome obstacles and continuously attempt to expand the set of tasks they can autonomously fulfill. We want this because the most profit and value is in solving automation in ever new areas—think moving beyond AI helping with writing emails to having AI systems that autonomously run businesses.

image.png

As tasks become more complex, they increasingly require engagement with aspects of reality not yet integrated into digital processes, forcing AI to bridge gaps where no digital infrastructure or pipelines exist. To expand into these complex areas, AIs must actively figure out how to accomplish tasks despite various constraints, stubbornly trying different approaches until they succeed. As Nate Soares writes:

Because the way to achieve long-horizon targets in a large, unobserved, surprising world that keeps throwing wrenches into one's plans, is probably to become a robust generalist wrench-remover that keeps stubbornly reorienting towards some particular target no matter what wrench reality throws into its plans … so you've built a generalized obstacle-surmounting engine. You've built a thing that excels at noticing when a wrench has been thrown in its plans, and at understanding the wrench, and at removing the wrench or finding some other way to proceed with its plans. – Nate Soares, Ability to solve long-horizon tasks correlates with wanting things in the behaviorist sense

If AIs are autonomous, bring value, and the prices of them will be similar as today there will be an enormous number of AI agents, perhaps an order of magnitude more than humans today. What can be consequences of large number of AIs trying to autonomously expand the automation frontier, trying to bridge gaps of reality not yet automated?

This doesn’t necessarily imply subjective experience or will—that’s a different and more speculative discussion. However, in practice, it may not matter much since these systems will behave as if they are self-directed. This is because there will be a vast number of autonomous AI agents run by different actors and algorithms, each motivated by different things, with conflicting goals and competing for limited resources. Their sheer scale will make monitoring and control difficult, and their owners may be unwilling to deactivate them or fail to recognize when doing so is necessary.

At this scale, we won't be noticing AIs working as intended—instead we'll be seeing one’s that are stubborn — constantly finding and exploiting niches, figuring out how to automate more effectively. The automation of any field will involve a lot of trial and error, with systems persistently working toward their goals. Given the sheer number of attempts, some will push forward even when their actions cause harm, create vulnerabilities, or exploit systems.

Crucially, by default, artificial systems are not attuned to all human values, preferences, and not understanding of different ecosystems and interdependencies. This remains true even if the majority of actors have good intentions. First it’s an ambiguous task because human needs, values, and preferences change over time and some conflict across different countries, cultures, companies and social groups. Second, there is this tricky dynamic. An AI's primary goal—pursuing its core objective—is much simpler than considering its broader impacts. Think of the primary goal as a single point in space evolving as AI explores through trial and error, searching for strategies to reach it. In contrast, understanding how AIs actions influence reality is the entire space around it. It’s a much more complex thing to understand and control as the AI's actions (be it running a businesses, persuading people to buy or do something) go through our established realities, ecosystems, cultures, organisms, and interconnected world with all its dependencies.

There is also a crucial asymmetry — it's a lot harder to create something constructive than destructive. In the history of technology first was dynamite then was a combustion engine. First was the atomic bomb, then was a nuclear electricity plant. In order to make something constructive one need to make it safe, control many moving parts, sync a variety of processes together. This is especially concerning when AI can be used in vulnerable fields such as: synthetic biology, social persuasion, weapons development, financial systems, or cybersecurity. If we scale autonomous technology and broaden its influence, we also increase the chances that some agent will do something both harmful and highly impactful.

There will certainly be countermeasures, control systems, and regulations put in place. Most creators will likely make their best efforts—or be incentivized—to implement guardrails and safety protocols. However, the fragility and complexity of our systems, combined with the vast space of possibilities AI will operate in, will make maintaining a stability challenging.

All of this may radically transform the world as we know it. It could take many forms and manifest in various ways, but here are some speculative ideas illustrating what it may look like: