Welcome! In this post, we’ll be taking a character-by-character look at the source code of the BioNTech/Pfizer SARS-CoV-2 mRNA vaccine.

Update: The other up and coming vaccines are described in The Genetic Code and Proteins of the Other Covid-19 Vaccines I want to thank the large cast of people who spent time previewing this article for legibility and correctness. All mistakes remain mine though, but I would love to hear about them quickly at bert@hubertnet.nl or @PowerDNS_Bert

Now, these words may be somewhat jarring - the vaccine is a liquid that gets injected in your arm. How can we talk about source code?

This is a good question, so let’s start off with a small part of the very source code of the BioNTech/Pfizer vaccine, also known as BNT162b2, also known as Tozinameran also known as Comirnaty.

First 500 characters of the BNT162b2 mRNA. Source:

The BNT162b2 mRNA vaccine has this digital code at its heart. It is 4284 characters long, so it would fit in a bunch of tweets. At the very beginning of the vaccine production process, someone uploaded this code to a DNA printer (yes), which then converted the bytes on disk to actual DNA molecules.

A Codex DNA BioXp 3200 DNA printer

Out of such a machine come tiny amounts of DNA, which after a lot of biological and chemical processing end up as RNA (more about which later) in the vaccine vial. A 30 microgram dose turns out to actually contain 30 micrograms of RNA. In addition, there is a clever lipid (fatty) packaging system that gets the mRNA into our cells.

Update: Derek Lowe of the famous In the pipeline blog over at Science has written a comprehensive post “RNA Vaccines And Their Lipids“ which neatly explains the lipid and delivery parts of the vaccines that I am not competent to describe. Luckily Derek is! Update 2: Jonas Neubert and Cornelia Scheitz have written this awesome page with loads of detail on how the vaccines actually get produced and distributed. Recommended!

RNA is the volatile ‘working memory’ version of DNA. DNA is like the flash drive storage of biology. DNA is very durable, internally redundant and very reliable. But much like computers do not execute code directly from a flash drive, before something happens, code gets copied to a faster, more versatile yet far more fragile system.

For computers, this is RAM, for biology it is RNA. The resemblance is striking. Unlike flash memory, RAM degrades very quickly unless lovingly tended to. The reason the Pfizer/BioNTech mRNA vaccine must be stored in the deepest of deep freezers is the same: RNA is a fragile flower.

Each RNA character weighs on the order of 0.53·10⁻²¹ grams, meaning there are around 6·10¹⁶ characters in a single 30 microgram vaccine dose. Expressed in bytes, this is around 14 petabytes, although it must be said this consists of around 13,000 billion repetitions of the same 4284 characters. The actual informational content of the vaccine is just over a kilobyte. SARS-CoV-2 itself weighs in at around 7.5 kilobytes.

Update: In the original post these numbers were off. Here is a spreadsheet with the correct calculations.

The briefest bit of background

DNA is a digital code. Unlike computers, which use 0 and 1, life uses A, C, G and U/T (the ‘nucleotides’, ‘nucleosides’ or ‘bases’).

In computers we store the 0 and 1 as the presence or absence of a charge, or as a current, as a magnetic transition, or as a voltage, or as a modulation of a signal, or as a change in reflectivity. Or in short, the 0 and 1 are not some kind of abstract concept - they live as electrons and in many other physical embodiments.

In nature, A, C, G and U/T are molecules, stored as chains in DNA (or RNA).

In computers, we group 8 bits into a byte, and the byte is the typical unit of data being processed.