https://www.hashicorp.com/c2m

1607401618-nomad-hero.jpg

HashiCorp Nomad scheduled 2,000,000 Docker containers on 6,100 hosts in 10 AWS regions in 22 minutes.

Intro

In March 2016, we demonstrated HashiCorp Nomad could run 1 million containers across 5,000 hosts. Four years and 10 major releases later we set out to replicate the original challenge at higher scale:

Untitled Database

Nomad, borrowing terminology from Google’s Borg scheduler, refers to its basic unit of work as a job. Each job is composed of many tasks. A task is an application to run, which is a Docker container in this test. Nomad can also schedule other tasks such as VMs, binaries, etc, but only Docker containers were used as part of C2M.

Thank you to the AWS Spot team for providing the credits and support necessary to run the infrastructure for C2M on AWS.

C2M Results

Nomad scheduled 2 million containers in 22 minutes and 14 seconds at an average rate of nearly 1,500 containers per second. The above graph demonstrates Nomad’s scheduling performance is nearly linear. The number of containers already placed does not negatively affect the placement of future containers. In the graph above, the Y-axis is the number of containers and the X-axis is the elapsed time. The first line, Scheduled, represents when a container was assigned to a node. The second line, Running, represents when a container was running on its assigned node.

A Global Deployment on AWS Spot

Our partners at Amazon Web Services generously provided the credits to run C2M on AWS with all of the workloads placed on AWS Spot Instances. EC2 Spot was chosen for this project as it offers savings of up to 90% with no commitment other than pay-by-the-second on the same AWS EC2 instances used by On-Demand. Spot instances are spare capacity sold at a discount and may be interrupted by AWS should supplies diminish. Stateless, fault-tolerant, and decoupled workloads are a great fit for Spot, and are a common model within container workloads on Nomad, as well as other scheduling platforms. Total costs for this run were reduced by 68% in aggregate using popular instance types and sizes in a mix of day and night locales. The following map shows the location of all instances with the white dot representing the schedulers colocated in us-east-1 region (in North Virginia).

Spread across 10 AWS regions that spanned the globe, a distinct auto-scaling group (ASG) was created in each to launch the client virtual machines. Target capacity for each ASG was requested by specifying 18 TB of RAM, the restraining resource for the test deployment, rather than machine or CPU count. Having defined multiple instance types that could contribute as clients in the Nomad cluster, AWS used a capacity-optimized allocation strategy to provision instances from the deepest pools of machine types in each of the different availability zones across the 10 regions. The capacity-optimized allocation strategy can add to the durability of Spot instance fleets. In this particular test, however, none of the 6,100 instances were interrupted.

In the chart below, you can see how that footprint spread across instance types in each region. Learn more about Spot instances here.

The 3 i3.16xlarge instances represent the 3 Nomad schedulers on reserved instances. The single m5a.large was a reserved instance used for command and control of C2M’s test harness. All other instances were Spot instances provisioned by the ASG. The speed with which ASG can provide thousands of hosts made iteratively developing and testing C2M possible. All 6,100 instances were running and registered as ready to receive work from the Nomad schedulers within 6 minutes and 30 seconds.

C2M was run once all 6,100 instances had registered.

Differences from C1M

Nomad has evolved significantly from version 0.3.1 used for C1M back in 2016 to the 1.0 beta used for C2M. While C2M tried not to diverge drastically from C1M, we did try to evolve our benchmark to focus more on Nomad’s global scalability and less on raw scheduling throughput (containers placed per second). We shifted our focus as we learned that Nomad’s throughput already exceeded users’ expectations, deciding that Nomad’s global scalability offers more relevant infrastructure opportunities.

Scheduling Algorithm