By Sean Cai

More Private Data Markets Comps/Analysis on Substack or on seancai.com


The “Mechanical Turk,” the 18th century version of a black box machine for a multi-turn reasoning task (chess)

The “Mechanical Turk,” the 18th century version of a black box machine for a multi-turn reasoning task (chess)

Industrial Historians should be salivating at what is going on in data markets today. We’ve built the bona fide combustion engine and now prospecting all around the world for data (oil).

Production of white collar knowledge work is undergoing a radical, victorian era-like industrial shift. The TAM is all of human labor, but its spread out across several subcategories developing within human data where newer players can outcompete generalized incumbents. And even though Mercor offers acqui-hires to many of them, private capital markets, though frothy, are nothing like those in the early 20th century. Even if data titans wanted to employ Standard Oil-like acquisition plays to vertically monopolize, macro trends are driving the supply chain to be split.

Data contracts are much more easy to eat away at than they were 2 years ago. The market is much more mature and knowledge asymmetries fade as more miners enter the data markets. Throughout this entire year, I’ll explore the notion that, in absence of being able to innovate at the continual learning paradigm level, we need to dramatically redefine how we collect data and realistically transform it into evals to match the new SOTA models and deployment practices.

The Industrial Age and the Information Age

In the early 18th century, as the most optimistic and foolhardy industrialists of the early waves of the industrial revolution posited the limits of industrial innovations they could imagine, they landed on machines that, through no obvious mechanism, could produce miraculous outcomes. Such were the misguided attempts at creating machines that “spoke” to emulate speech in Viennese courts, the machines that sought to emulate human reasoning at some scale by some French inventors, and machines that outcompeted humans on reasoning tasks, such as the infamous mechanical turk made to impress Empress Maria Theresa in the image above.

Evidently, there were a few technological breakthroughs that needed to happen before we could even get to a point of delegating high level reasoning to machines. Today, we have the best chances ever at realizing what the inventors of the mechanical turk had in their wildest dreams. And increasingly, we deploy systems that abstract away low level reasoning, spawning modern age white collar luddites and over-investment bubbles, but who ultimately contribute to massive improvements to white collar economies of scale.

Coal, iron, sulfur, lead, and a mishmash of other physical materials powered the physical labor revolution of the Victorian industrial age. Data, alone in its many modalities and white collar representations, will power the white collar labor revolution of the Information age. Soon, as even general purpose robotics mature, it might power a blue collar labor revolution as well, abstracting away low level reasoning at all levels of the economy.

We always face an issue between balancing pattern-matching on our most applicable historical examples and generalizing to new situations with considerations to new technologies. In this case, let’s examine the things that are most probably generalizable from the last economic revolution, and the things that are not:

What Industrial History Actually Generalizes

What’s different today: