How Tailscale works

People often ask us for an overview of how Tailscale works. We’ve been putting off answering that, because we kept changing it! But now things have started to settle down.

Let’s go through the entire Tailscale system from bottom to top, the same way we built it (but skipping some zigzags we took along the way). With this information, you should be able to build your own Tailscale replacement… except you don’t have to, since our node software is open source and we have a flexible free plan.

The data plane: WireGuard®

Our base layer is the increasingly popular and excellent open source WireGuard package (specifically the userspace Go variant, wireguard-go). WireGuard creates a set of extremely lightweight encrypted tunnels between your computer, VM, or container (which WireGuard calls an “endpoint” and we’ll call a “node”), and any other nodes in your network.

Let’s clarify that: most of the time, VPN users, including WireGuard users, build a “hub and spoke” architecture, where each client device (say, your laptop) connects to a central “concentrator” or VPN gateway.

Hub-and-spoke networks

Figure 1. A traditional hub-and-spoke VPN.

This is the easiest way to set up WireGuard, because each node in the network needs to know the public key, public IP address, and port number of each other node it wants to connect directly to. If you wanted to fully connect 10 nodes, then that would be 9 peer nodes that each node has to know about, or 90 separate tunnel endpoints. In a hub-and-spoke, you just have one central hub node and 9 outliers (with “spokes” connecting them), which is much simpler. The hub node usually has a static IP address and a hole poked in its firewall so it’s easy for everyone to find. Then it can accept incoming connections from nodes at other IP addresses, even if those nodes are themselves behind a firewall, in the usual way that client-server Internet protocols do.

Hub-and-spoke works well, but it has some downsides. First of all, most modern companies don’t have a single place they want to designate as a hub. They have multiple offices, multiple cloud datacenters or regions or VPCs, and so on. In traditional VPN setups, companies configure a single VPN concentrator, and then set up secondary tunnels (often using IPsec) between locations. So remote users arrive at the VPN concentrator in one place, then have their traffic forwarded to its final destination in another place.

This traditional setup can be hard to scale. First of all, remote users might or might not be close to the VPN concentrator; if they’re far away, then they incur high latency connecting to it. Secondly, the datacenter they want to reach might not be close to the VPN concentrator either; if it’s far away, they incur high latency again. Imagine this case, with a worker in New York trying to reach a server in New York, through the company’s VPN concentrator or BeyondCorp proxy at their head office in San Francisco, not that I’m bitter:

Figure 2(a). Inefficient routing through a traditional VPN concentrator.

Luckily, WireGuard is different. Remember how we said it creates a “set of extremely lightweight tunnels” above? The tunnels are light enough that we can create a multi-hub setup without too much trouble:

Figure 2(b). A WireGuard multipoint VPN routes traffic more efficiently.

The only catch is that now each of the datacenters needs a static IP address, an open firewall port, and a set of WireGuard keys. When we add a new user, we’ll have to distribute the new key to all five servers. When we add a new server, we’ll have to distribute its key to every user. But we can do it, right? It’s only about 5 times as much work as one hub, which is not that much work.

Mesh networks

So that’s hub-and-spoke networks. They’re not too hard to set up with WireGuard, although a bit tedious, and we haven’t really talked about safe practices for key management yet (see below).

Still, there’s something fundamentally awkward about hub-and-spoke: they don’t let your individual nodes talk to each other. If you’re old like me, you remember when computers could just exchange files directly without going to the cloud and back. That was how the Internet all used to work, believe it or not! Sadly, developers have stopped building peer-to-peer apps because the modern Internet’s architecture has evolved, almost by accident, entirely into this kind of hub-and-spoke design, usually with the major cloud providers in the center charging rent.