Writing code and deploying it to AWS Lambda is as easy as baking a cake (depending on the type of cake). Lambda performs the heavy lifting for you, from provisioning to scaling. But where is the magic happening and how does it actually work under the hood? Lets find out together!

Lambda is split into a control plane and data plane. Each plane is responsible for a specific set of activities in the service. The Control Plane provides management APIs and manages integrations with all AWS services. Whilst the Data Plane is Lambda's Invoke API that triggers Lambda function invocations, this explanation is still very abstract but things will become clearer over time.

Deployment Interoperability

When deploying your Lambda function, you can either define a container image which is stored in Amazon ECR registry or deploy the code through a .zip file. You can specificy the location of an object in Amazon S3 by defining this in a CloudFormation template or through the CLI. If you're uploading the .zip through the console, it will be stored in an inaccessible S3 bucket.

Function code which is uploaded using the ZIP format is optimized once, and then is stored in an encrypted format using an AWS-managed key and AES-GCM. Functions uploaded to Lambda using the container image format are also optimized. Your idle Lambda function is actually just stored as a .zip file residing in Amazon S3.

The container image is downloaded from its original source, optimized into chunks, and then stored as encrypted chunks using an encryption method which uses a combination of AES-CTR, AES-GCM, and a SHA-256 MAC. The encryption method allows Lambda to securely deduplicate encrypted chunks.

Breaking down the architecture

Lambda creates the Execution environment (worker) on a fleet of EC2 instances. These workers are bare metal Nitro instances which are launched in a seperate inaccessible AWS account. These workers have hardware-virtualized MVMs (Micro Virtual Machines) created by Firecracker (Linux's Kernel-based Virtual Machine). Workers have a lease lifetime of 14 hours, when a worker approaches this maximum, no further invocations are forwarded to the worker and the worker is terminated. Each worker has the ability to host one concurrent invocation, but is being reused if multiple invocations of the same function occur. Lambda never reuses an execution environment across multiple functions.

All communication between workers is encrypted using AES with Galois. Customers cannot directly interact with a worker as it's hosted in a network isolated VPC in the Lambda AWS service accounts.

Load Balancing/Scaling

Lambda intentionally concentrates the load on the smallest possible busy sandboxes. As a sandbox is busy in a very binary way (either it's invoked or it's idle). So just by counting the sandboxes that are currently busy you can get a very clear picture of what kind of load is on the system. Lambda takes advantage of statistical multiplexing by copying very different kind of workloads onto the same server opposed to copying the same workload on the server (if you copy a function its workload and make it run 'concurrently', it's high likely possible that if one spikes in CPU usage another one will do too). The more uncorrelated workloads you put on a machine, the better they behave in aggregate (flat out the spikes). Because AWS opperates at such immense scale they are able to find these uncorrelated workloads, you can obviously not do this on a low scale computing platform. To summarize it, AWS picks uncorrelated workloads that pack together well.

Worker layers

The machines that we run our workloads on are called Workers, these are cut-up by Firecracker (which we'll dive into later on) to result into multiple isolated environments. The function code is your code which is provisioned by the Worker Manager to download and initialize your Lambda package on the Worker. The Lambda runtimes are built-in runtimes that Lambda supports, such as: .NET Core, NodeJs, Python, etc. The contents of the sandbox is a Linux guest kernel (AWS stripped off most kernels features). The next layer is the guest-OS (Amazon Linux 2), AWS runs multiple Amazon Linux distributions on a Worker (reaching from hundreds to thousands) isolated from eachother by virtualization using the Linux KVM feature. The bottom two layers are bound to the actual instance, which is the host-OS and the provisioned hardware.

The guest-OS layer is bound to a single AWS account, multiple functions can run on this same guest-OS. The boundary that's put between accounts is virtualization, due to security reasons. Underneath the sandbox layer is the same kind of technology that powers containerization. As you might know; containers do not really exist, they are rather isolated through a set of tools that the Linux Kernel provides us. Lambda makes use of these tools, such as cgroups which allows us to for example set apart the maximum memory footprint of a function. Seccomp: which is sort of a firewall for the Linux Kernel, for example to restrict a syscall from passing in only a set of arguments opposed to all available arguments within the syscall. Lambda restricts the isolated environment from performing any additional syscalls through Seccomp besides to what it's supposed to be doing.

As you can see in the preceding image 'Firecracker' is mentioned. I have explicitly not touched this topic yet as there's a seperate section later on about Firecracker.

Synchronous Execution Path

Synchronous invocations are typically referred to as invocations that are returning a result to a client that's awaiting this result. This might be something as simple as invoking one function that computes something and then using the result in another function.