A research finding on how to prevent AI agents from editing source-of-truth documents in structured knowledge bases.


The Problem

AI agents operating on knowledge bases (Notion, Obsidian, Confluence, or any structured document system) cannot distinguish between pages that define rules and pages that follow them. An agent asked to "update the project status" will edit the project template with the same confidence it edits the project report. Both look like documents. The agent has no basis for treating them differently unless you provide one.

The standard approach is per-page tagging: label every page with its role. This fails at scale. A workspace with 500 pages requires 500 labeling decisions, and every new page is untagged until someone remembers to classify it. Missed tags revert the agent to guessing.

The Reframe

This is not a graph inference problem. It is a convention-reading problem.

In a designed knowledge base, the governing-vs-derivative distinction is already encoded in the workspace structure. The workspace designer chose which databases hold rules, which hold deliverables, which hold intake. Every page inherits a role from its container. The information is there. The agent just needs to be told to read it.

The key mechanism: container-level designation, not page-level tagging. The human marks 5-10 databases as "governing." Every row in a governing database inherits the GOVERNING role. Everything else defaults to DERIVATIVE or lower. This scales because the human maintains the convention at the database level (rare, structural decisions) rather than the page level (frequent, high-friction).

Classification Priority Chain

  1. CONTAINER: Which database or location holds this page?
  2. STRUCTURE: Is this a database schema, template, or structural page?
  3. CONTENT: Do keywords or patterns indicate a specific role?
  4. DEFAULT: Uncertain between GOVERNING and DERIVATIVE?

This chain is deliberately ordered by cost of error. Container checks are cheap and high-confidence. Content checks are expensive and lower-confidence. The chain exits early when confidence is high.

The GOV_W_UPSTREAM Finding

A binary governing/derivative taxonomy is insufficient. Empirical testing on 60 pages with human-labeled ground truth revealed a category that binary models miss: pages that inherit authority from an upstream source but also exercise authority in their own domain.

Example: a department-specific policy page that inherits from an organization-wide policy but adds its own binding rules. It is derivative of the upstream source. It is governing for everything downstream. A binary model must choose one. Either choice produces a dangerous outcome: classifying it as DERIVATIVE allows the agent to edit binding rules; classifying it as GOVERNING prevents legitimate updates that track upstream changes.

The solution is a third category: GOV_W_UPSTREAM (Governing With Upstream). These pages get the protections of governing pages (agent asks before editing) with the additional constraint that changes must not contradict the declared upstream source.

In testing, a binary taxonomy produced dangerous misclassifications on 10% of pages (6 of 60). Adding GOV_W_UPSTREAM eliminated all 6 dangerous errors.