It can be difficult for a casual reader to tell the difference between text written by an LLM and a human. Watermarking could be a partial solution. ****I am excited by recent work in the field, especially the watermarking tool SynthID (developed by the good folks at DeepMind), which embeds watermarks (imperceptible by humans yet detectable) into AI-generated images, audio, text, and video, across Google’s generative AI consumer products. In this blog post, I focus on SynthID-Text, which is a text watermarking scheme.
We need to make an important distinction: SynthID-Text does not answer the question “Is this text written by an LLM?”, but rather “Is this text written by a specific LLM that uses SynthID-Text watermarking?”. For example, it cannot detect text from other LLMs (GPT-4, Claude, Llama) that don’t use SynthID-Text.
Before discussing watermarking, let’s recap how LLMs generate text. Let $x_{<t}=x_1,...,x_{t-1}$ be a sequence of $t-1$ input tokens from the vocabulary. An LLM computes the probability distribution $p_{\text{LM}}(\cdot|x_{<t})$ of the next token $x_t$ given the preceding text $x_{<t}$. Note that $p_{\text{LM}}(\cdot|x_{<t})$ is the probability distribution over all possible tokens in the vocabulary. It is not $p_{\text{LM}}(x_t|x_{<t})$, since this is the probability of the specific token $x_t$ that was actually chosen, which is just a single number, not a distribution.
For example, suppose $x_{<t}$ represents “My favorite fruit is”. To generate the next token, the LLM first computes the probability distribution, which could e.g. look like:
| Token | $p_{\text{LM}}(\cdot|x_{<t})$ | | --- | --- | | mango | 0.50 | | lychee | 0.30 | | papaya | 0.15 | | durian | 0.05 | | [remaining tokens in vocabulary] | 0 |
Then, $x_t$ is sampled from this distribution. In this example, “mango” has the highest probability of being selected (50%), while “durian” has the lowest (5%). The LLM generates a complete response by repeating this two-step process for each subsequent token position: compute the probability distribution $p_{\text{LM}}(\cdot|x_{<t})$ from the updated context, then sample the next token.
LLM text generation. The process stops when the maximum length is reached or an end-token is generated. Source: Fig. 1 in the SynthID-Text paper.
Building on this understanding of how LLMs generate text, let’s look at watermarking. As we’ll see, watermarking modifies the generation of each token.
A generative watermarking scheme embeds watermarks during the content generation process of AI models (e.g., text, images, video, and audio). For text generation, such a scheme consist of three components: a random seed generator, a sampling algorithm, and a scoring function.
Here’s how these components work together during generation: at each generation step $t$, the random seed generator provides a random seed $r_t = \text{hash}(\text{watermarking key}, x_{<t})$. The sampling algorithm uses this $r_t$ to bias the selection of the next token $x_t$ from $p_{\text{LM}}(\cdot|x_{<t})$. This biased sampling creates correlations between $x_t$ and $r_t$.
Generative watermarking during LLM text generation. Source: Fig. 1 in the SynthID-Text paper.
The correlations between $x_t$ and $r_t$ are exactly what the scoring function measures during watermark detection. The detector compares this score against a predefined threshold to classify the text as watermarked or non-watermarked. Like any binary classifier, false positives and false negatives are possible. More on watermark detection later!