@Zi Dong July 10, 2025

Hi, I’m Zi and I’m building NenAI, a developer platform to help software teams productionize computer use. If you’re reading this, you might be interested in joining me as a founding member of NenAI. Let me break down what we do.

https://www.loom.com/share/69e6488695444fa39c1ff87eb46008f0

Computer use— the technology

Computer use is the ability for Vision Language Action Models (VLAs) to take a natural language prompt (i.e. “Book a check-up appointment for my patient Mark Biddle”) and to execute it by directly interacting with the computer like a human would (using mouse clicks, typing on the keyboard etc.). The computer use loop is as follows:

  1. Plan the task based on the prompt (i.e. “To book a check-up appointment, I need to first find the patient page. Then, I need to click the appointments button. Then, I need to…”)
  2. Execute the task using the appropriate tool (i.e. Launch the electronic health record application, Click on the search bar)
  3. Evaluate whether the state of the UI is bringing the system closer to the task (i.e. “The application shows a login screen, which I need complete to access the main EHR functionality)

This is an actively evolving area of research, with some notable benchmarks being OSWorld, Webvoyager from academia, and Webbench from product-first companies in the browser space. Generally, the consensus is that the research is sufficiently mature for use cases and applications to be built on top of them.

Productionize— why sell to developers?

Imagine you are building voice agent for primary care physicians. Doctors are busy, and they’ll much prefer to dictate follow-ups rather than using a UI to enter data into the system. You need to process the voice data and understand a doctor’s intent, but you also need to actuate these into systems of operations accurately and reliably (in the healthcare case— electronic health record systems).

How would you do this? In some cases, you can use an API like POST /appointments to update the system, but not all system have APIs, and not all functionality can are exposed or completely expressed by APIs. Instead, you can use computer use to enter this data like a human would.

However, computer use primitives from providers like Anthropic and OpenAI are merely raw APIs, not production-ready platforms. They lack the essential capabilities for authoring, tooling, and observability that are required to operate such a system reliably at scale. Try it out—

  1. Go to Anthropic computer use demo https://github.com/anthropics/anthropic-quickstarts/tree/main/computer-use-demo and try this out.
  2. Now, do it 100 times. Imagine the infra, observability, and tooling needed to make this happen reliably.

Enterprise UI— the opportunity

Agents and automation has been a key investment theme in the past year. Many early winners have emerged in the enterprise space, including agents that specialize in prospecting and sales, customer support, and enterprise knowledge management. Alongside these more visible companies, there is a parallel wave in the vertical AI space— companies that focus on specific industries such as healthcare, insurance, logistics, or functions such as finance, HR, and operations. This market is currently 5.1B (source) and could 10x to ~50B by 2030. The main driver of this growth is the recognition that agentic workflows that can automate human labor needs to be specific to workflow, industry, and role nuances (source).

One key technical challenge in building vertical AI is the access to core systems serving these vertical industries. Unlike enterprise functions (think Slack, Glean, Rippling etc.) which have converged around standardized APIs (and therefore can make use of both conventional engineering infrastructure, and emerging agentic infrastructure like MCP), vertical software is a much more scattered landscape. Many user interfaces lack APIs, and those which do don’t have full API/UI parity. This has made it challenging for startups building in the space, especially when their mid-market and enterprise strategy requires integration with existing systems.