It’s become no secret in the past year that code execution is the preferred way for agents to interact with the world, specifically as opposed to tool calling.

Why code execution > tool calling

There are many reasons why having agents execute code is preferable to tool calling, the first and most pertinent is that code execution allows you to arbitrarily chain tools together without having to perform inference on the results in between calls.

Arbitrary chaining tool calls / context preservation

A simple example is scraping a website and using AI to find the industry of the company. This will require two “tools”: 1. web scrape tool, 2. generate AI tool.

const scrapedMarkdown = await scrapeWebsite({ url: "orangeslice.ai" });
const industry = await generateText({
  prompt: "find the industry of this company ${scrapedMarkdown}",
});

return industry // Sales

An agent is able to find the industry of a company while preserving context for any other request. The only tokens it used were the output tokens on generating the code and the input tokens of the exact industry and nothing else.

However, it’s not just context / efficiency. Code execution here enables something that would be impossible for the agent without it: finding the industry of 100 companies. Instead of performing inference of each of the company’s websites (making it dumber every time), it can “offload” the inference to effectively what is a subagent. You can see how much more powerful this is than tool calling quickly.

Existing training data

The other good argument for code execution over tool calling is that models have an enormous corpus of training data for how to write code and interact with a file system. As developers continue to use coding agents to build software, model companies will continue to make coding performance a high priority.

Other less relevant arguments

Other arguments for code execution tend to mention how you don’t have to “inject” all the tool (function) definitions to the agent upfront, but this is solved fairly easily with a “search tools” tool which allows the agent to go and find a tool instead of having them all at the ready in its context.

MCP vs. existing HTTP APIs

MCP was a standard invented by Anthropic that allows you to define the functions/tools you want agents to have access to (including intructions on how to use them) for your app

I used to mistakenly conflate MCP with tool calling, when they are actually two separate concerns. You can easily convert an MCP server to something you can call via code just by wrapping each endpoint in a function.

The arguments for MCP over existing HTTP APIs have more to do with agent specific instructions and auth, something I’m not personally too concerned about and I don’t think is that relevant of a conversation.

The argument for durable code execution