Controlling outbound traffic from Kubernetes

https://monzo.com/blog/controlling-outbound-traffic-from-kubernetes

At Monzo, the Security Team's highest priority is to keep your money and data safe. And to achieve this, we're always adding and refining security controls across our banking platform.

Late last year, we wrapped up a major networking project which let us control internal traffic in our platform (read about it here). This gave us a lot of confidence that malicious code or an intruder compromising an individual microservice wouldn't be able to hurt our customers.

Since then, we've been thinking about how we can add similar security to network traffic leaving our platform. A lot of attacks begin with a compromised platform component 'phoning home' — that is, communicating with a computer outside of Monzo that is controlled by the attacker. Once this communication is established, the attacker can control the compromised service and attempt to infiltrate deeper into our platform. We knew that if we could block that communication, we'd stand a better chance at stopping an attack in its tracks.

In this blog, we'll cover the journey to build our own solution for controlling traffic leaving our platform (you can read more about our platform here). This project formed part of our wider effort to move towards a platform with less trust, where applications need to be granted specific permissions instead of being allowed to do whatever they like. Our project contributed to this by allowing us to say that a specific service can communicate to an external provider like GitHub, but others can't.

Check out other posts from Security Team for more projects we've completed to get closer to zero trust.

We started by identifying where traffic leaves the platform

With a few exceptions, we previously didn't have filtering on what outbound traffic could send. This means that we started in the dark about which services needed to talk to the internet and which didn't. This is similar to where we started our previous network isolation project, when we didn't to know what services talk to what other services.

But unlike when we started the network isolation project, tools and processes we built for that project are now at our disposal: a combination of calico-accountant and packet logging allowed us to rapidly identify which of our microservices actually talk to the internet, and what external IPs each talks to.

Even with IP information, it wasn't always trivial to find out what domains each service talked to. This is because many external providers our services talk to use CDNs like AWS CloudFront, where a single IP serves many websites. For simpler cases, we'd simply read the code for the service to find out what hostnames are used. If this isn't possible, we'd log the outgoing packets of the service carefully to identify their destinations.

Once we put together a (surprisingly small) spreadsheet on what services used what external hostnames and ports, we began a design process to think about what an ideal long term solution would look like.

We had a few requirements:

We wanted to be able to write policies for outgoing traffic from specific source services to specific destination hostnames. We have thousands of services running in our platform, each serving a small but distinctive purpose; and we cannot just have one universal list of allowed destinations, such as saying 'all services can talk to GitHub'. Some destinations on the internet used by our services would allow a certain degree of attacker control — anyone can host something on GitHub, for example. Therefore we want to limit 'dangerous' destinations to only the services that need them.
It must be possible to allow a specific DNS name. Allowing traffic to IPs is super easy already with Calico — our Kubernetes networking stack — but over time IPs change for most DNS names. For a long-term solution, we need to be able to allow traffic to a domain name like google.com, and at any point in time thereafter, traffic to resolved IPs of google.com, should just work.
We wanted to be able to reliably alert when we detect packets whose sources or destinations aren't allowed, like we can for our internal network controls.

We realised almost no drop-in solution on the market could check all the boxes, and it'd take a fairly long process to implement our ideal solution. So we decided to ship a simple solution first, and iterate on it.

We started by identifying where traffic leaves the platform

We started with port based filtering