https://monzo.com/blog/controlling-outbound-traffic-from-kubernetes

At Monzo, the Security Team's highest priority is to keep your money and data safe. And to achieve this, we're always adding and refining security controls across our banking platform.

Late last year, we wrapped up a major networking project which let us control internal traffic in our platform (read about it here). This gave us a lot of confidence that malicious code or an intruder compromising an individual microservice wouldn't be able to hurt our customers.

Since then, we've been thinking about how we can add similar security to network traffic leaving our platform. A lot of attacks begin with a compromised platform component 'phoning home' — that is, communicating with a computer outside of Monzo that is controlled by the attacker. Once this communication is established, the attacker can control the compromised service and attempt to infiltrate deeper into our platform. We knew that if we could block that communication, we'd stand a better chance at stopping an attack in its tracks.

https://s3-us-west-2.amazonaws.com/secure.notion-static.com/d24a6469-5e86-42ea-b916-25f1c44874aa/Untitled__2_.png

In this blog, we'll cover the journey to build our own solution for controlling traffic leaving our platform (you can read more about our platform here). This project formed part of our wider effort to move towards a platform with less trust, where applications need to be granted specific permissions instead of being allowed to do whatever they like. Our project contributed to this by allowing us to say that a specific service can communicate to an external provider like GitHub, but others can't.

Check out other posts from Security Team for more projects we've completed to get closer to zero trust.

We started by identifying where traffic leaves the platform

With a few exceptions, we previously didn't have filtering on what outbound traffic could send. This means that we started in the dark about which services needed to talk to the internet and which didn't. This is similar to where we started our previous network isolation project, when we didn't to know what services talk to what other services.

But unlike when we started the network isolation project, tools and processes we built for that project are now at our disposal: a combination of calico-accountant and packet logging allowed us to rapidly identify which of our microservices actually talk to the internet, and what external IPs each talks to.

Even with IP information, it wasn't always trivial to find out what domains each service talked to. This is because many external providers our services talk to use CDNs like AWS CloudFront, where a single IP serves many websites. For simpler cases, we'd simply read the code for the service to find out what hostnames are used. If this isn't possible, we'd log the outgoing packets of the service carefully to identify their destinations.

Once we put together a (surprisingly small) spreadsheet on what services used what external hostnames and ports, we began a design process to think about what an ideal long term solution would look like.

We had a few requirements:

https://s3-us-west-2.amazonaws.com/secure.notion-static.com/a9f1b9f6-391d-4672-b39c-30929eba13db/Untitled__3_.png

We realised almost no drop-in solution on the market could check all the boxes, and it'd take a fairly long process to implement our ideal solution. So we decided to ship a simple solution first, and iterate on it.

We started with port based filtering