Autoscaling Monzo: How we optimise our platform to be just the right size

https://monzo.com/blog/2020/10/19/autoscaling-monzo

To keep Monzo running smoothly and continue serving the needs of our customers, it's important our technology platform is scaled appropriately. Practically speaking, when you do anything with your Monzo account, like open the Monzo app or use your debit card, a computer server in our platform has to do some work. There’s a limit to how much work each server can do, so when more customers do more things we need to add more servers.

We've built Monzo in the cloud so we can take advantage of near-infinite capacity, but it is our responsibility to determine exactly what resources we need. We've been on a journey from setting this manually to today where we rely on automated systems to take care of this for us.

We’re building Monzo on microservices and Kubernetes

It’s no secret that Monzo is all-in on microservices on Kubernetes. In fact, almost every system we run does so on top of Kubernetes, with the only notable exceptions being some stateful infrastructure like Cassandra and etcd. There are many reasons why Kubernetes works well for us:

Standard APIs for managing software: Kubernetes provides a consistent API for deploying, maintaining and interacting with applications. We’ve become very efficient at operating software using Kubernetes primitives.

Built-in resiliency primitives: Kubernetes solves many of the resiliency concerns that operators of distributed systems need to think about. Built in controls loops manage the software running in the cluster to constantly close the gap between the declared state and what’s observed live. Having our microservices auto-heal in the face of failures in the underlying infrastructure is both powerful and necessary when operating at scale.

Efficient use of infrastructure: Historically, operators had to decide where to run their application to make best use of the underlying infrastructure. With more monolithic applications, it was reasonable to run each application with a 1:1 mapping against a particular server. With the introduction of service-oriented architectures, running each instance of a microservice on its own server would be enormously inefficient*. Kubernetes helps by managing the placement of workloads into a cluster, with operators only needing to define how much CPU and memory each needs.

I’m deliberately excluding more recent developments in virtualisation like Firecracker

Sizing things appropriately is hard

Kubernetes uses a bin packing algorithm to decide where to run each workload. It knows what capacity is free on each server in the cluster, and when an operator deploys something new, it considers how big it is and where it would fit best.

Conceptually, if each server in the cluster is represented by a van, and each workload as a box to be packed into a van, Kubernetes is responsible for deciding which box goes which van. This is a robust process, but it begs the question: as an application owner, how do I know how big to make my box?

Typically, the process goes something like:

Make an educated guess
Get it wrong
Wait for your service to exhaust its allocated resources
Add a generous margin so it never* happens again

it always happens again. This process is rarely a one-off; changes to the service itself, changes in how it’s used, and/or growth in users mean it’s likely you’ll need to reassess the requirements of your service on an ongoing basis.