eBPF enables users to trace application activity down to a very low level for better performance analysis
Let’s say you’re a doctor. You know that the human body is tremendously complex, with multiple systems operating and interacting simultaneously. You also understand that sometimes things can go wrong and a person gets sick. Or there might be symptoms that history suggests are potential signs of trouble. How do you determine what is going on? What metrics can you collect that will reveal medically valuable information? And what tools are available to do that?
The same issue that has challenged physicians for centuries is one that IT professionals now face: When you’re troubleshooting a complex system, what diagnostics do you measure, how do you measure them and what do you do with your findings?
While the human body has not changed much in recent years, that’s not true for IT. And while diagnostic troubleshooting for both medicine and IT has evolved, the systems they measure have changed at different rates. IT complexity has rapidly accelerated in the cloud era, changing the locus of core information about system operation and the metrics needed to monitor it. One method which has been devised for capturing high-value information about container-based workloads is the extended Berkeley Packet Filter, or eBPF, a Linux kernel technology developed at the Lawrence Berkeley National Laboratory. With eBPF, kernel events can be correlated with network flow data to pinpoint which users, processes and containers are showing abnormal behavior.
eBPF is an exciting, newer technology that is gaining in popularity in the Linux ecosystem, and rightfully so, as it enables the developers to get a low-level overview of their application without having to do all the heavy lifting. It’s interesting to underline that eBPF doesn’t offer any new functionality, but using older technology would require the developer to get into assembly, rendering the effort particularly demanding.
Let’s wind the clock a couple of years. It was originally introduced as Berkley Packet Filter, in a paper published in 1992, as a rule-based mechanism to filter and capture network packets. In essence, it described a framework where filters would be implemented to run in a register-based Virtual Machine, inside the Linux kernel. Although the idea to run user-defined programs inside the kernel was indeed ingenious, the original design had certain restrictions, as the instruction set of the VM didn’t keep up with the hardware development. So eBPF, or extended BPF, came into existence with a new design that leveraged the advances in hardware and the novelty of the idea, with increased performance and much wider applicability.
eBPF programs run inside the kernel; they are attached to a code path and whenever that code path is traversed, the program executes. This decoupling of the kernel and eBPF program increases the development time as the developer doesn’t have to recompile the kernel each time the eBPF program is changed. eBPF is useful for both packet processing as well as performance analysis and monitoring, as eBPF programs can be attached to tracepoints, kprobes and even perf events. As you may have already guessed, attaching user-space programs inside the kernel can cause serious security and stability issues; thus, a series of tests are performed on each eBPF program before it’s loaded.
Originally, it was quite difficult to write eBPF programs, but as more developers spent time on the technology, new tools emerged such as BCC, a toolchain and collection for creating tracing and manipulation programs. While coding directly in eBPF continues to be extremely difficult, BCC and bpftrace allow the developer to write eBPF programs in a somewhat higher-level language, such as C, which is then compiled to eBPF bytecode. On top of that, bcc offers an interface for python and lua, which means that the user can write an eBPF program in C and then directly compile and load it to the kernel using python. With that level of abstraction and the simplicity of eBPF, the possibilities are endless.
BCC has an impressive collection of more than 100 eBPF tracing tools, which in turn can be used by users to write their own eBPF programs. The image below, albeit a bit outdated, offers a great overview of all the different eBPF tracing tools that exist for different parts of a Linux system.
With eBPF, users can trace application activity down to a very low level, to tracing kernel function calls and Virtual File System calls. The magic when it comes to container monitoring is that we can do all this natively from the Linux kernel, without needing to compile a new kernel module, which some container-optimized operating systems disallow altogether in an attempt to keep the OS footprint as minimal as possible. Thus, we can run these eBPF programs inside the Linux kernel, collect the data and store it in user-space, where it can be visualized and acted upon. Furthermore, as far as the host OS is concerned, a container is really just a Linux process. The trick for container monitoring is to be able to isolate the events that are generated from the container processes versus the rest of the machine so that we increase the signal-to-noise ratio for effective container monitoring.
To better illustrate the point, imagine the following scenario. You have created an application, but you notice that the system becomes increasingly slow when it runs. While there are a dozen different possibilities, using eBPF we can see, for example, that our process creates an aberrant number of open file descriptors using the open() syscall. Failing to properly close the files could indeed lead to a system slowdown, as the available file descriptors are reduced considerably. Another example would be to use eBPF to simulate a particularly low-level network issue that is intermittent, thus restricting our ability to debug it properly and try different solutions. It is important to note again that, while all those would be possible with different tools, eBPF makes it easy to write a program and load it directly into the kernel.
Currently, there are a number of open source projects that not only empower the user’s monitoring capabilities for a Linux-based system but also enable the user to monitor their container-based applications. Tracee is a good example for tracing consisting of a high-level Go program that loads and attaches the eBPF program to the kernel. For every container that is spawned after tracee, the user will be able to see detailed information about the events that are fired by the containers.
Another interesting example is Cilium, which uses eBPF to provide secure network connectivity and load balancing, leveraging the ability of eBPF to filter and drop packets based on rules. Interestingly, eBPF can trace kernel functions, but can’t drop their execution, while it can for incoming network packets. For observability, ntop and InfluxData have partnered to offer eBPF monitoring for containers, while Netdata is offering out-of-the-box eBPF monitoring for system and application monitoring.
As you can see, eBPF truly provides “Linux superpowers,” as coined by Brendan Gregg, senior performance architect at Netflix, in his famous talk about eBPF back in 2016. With the proliferation of container technology and demanding network applications due to the widespread adoption of Kubernetes, eBPF could function as a fundamental component of monitoring technology that can keep up with networking requirements while keeping resource overhead as low as possible.
However, while inspections using eBPF can be valuable, they will often need to be coordinated with independent network metrics to answer specific questions and identify root causes. As the systems themselves are expanding, a monitoring program should be equally comprehensive, providing operators with an expansive view of whatever is going on to detect any problems that might be lurking in the background. The toolkit for making those sorts of inspections and obtaining those types of insights is growing. And just as IT systems’ evolution is bound to continue, so, too, will the kit of diagnostic tools needed to maintain those systems’ health. It may be just what the doctor ordered.