Let’s face it, eBPF is the hot topic of the day in the Linux and Kubernetes world: at KubeCon EU in Valencia it was hard to find a project not talking about it, and KubeCon US in October will probably be even more intense. And on September 28 and 29 there will be the virtual eBPF summit 2022, a two-day event packed with details, use cases and more, all around eBPF. Sounds like you should absolutely join the fun! But before we do that, let’s have a look at what all the fuzz is about. Why are people calling this revolutionary, why is everyone investing heavily in this, and why could this be interesting for you?
In this blog we will shed some light on the basics of eBPF, add in a few examples and use cases, and thus lay the foundation upon which you can continue your eBPF journey yourself.
Some eBPF history
Have you ever tried to add a cool new function to your Linux kernel? Probably not, as not everybody is a Linux kernel hacker… but let’s just pretend.
The comic strip really is spot-on: It is hard to get new features into the Linux kernel – and it takes a long time until they really arrive where you need them. In comes BPF: developed almost 30 years ago by Van Jacobson and Steven McCanne, BPF enabled efficient network package filters on BSD. The architecture of BPF allows a user space process to submit program code which defines which packets should be filtered. The program code is compiled just-in-time and runs in a small VM, there is no need to recompile the kernel or provide kernel version specific modules.
Fast-forward to 2014, this technology was brought to Linux by Daniel Borkman and Alexei Starovoitov. They extended it to also cover non-networking use cases – hence the name extended BPF, eBPF. eBPF can be used to “to safely and efficiently extend the capabilities of the kernel without requiring to change kernel source code or load kernel modules.” (Source: ebpf.io). This makes it possible to define, run and even change sandboxed programs in the Linux kernel on the fly without changing the kernel source code. And this is revolutionary!
The technology was quickly picked up: early adopters like CloudFlare and Facebook usually leveraged the advanced networking capabilities eBPF offers.
eBPF and K8s… a perfect match
2014 was also the year of the first Kubernetes commit – and over time it became clear that especially many K8s use cases can benefit from eBPF: in Kubernetes, networks are created dynamically and IPs are ephemeral and thus cease to be a reliable source of identity for management and monitoring. Tooling needs to catch up with this development – traditional firewalls, load balancers, but also tools used for monitoring, debugging and tracing struggle to provide value in dynamic and fast changing, Kubernetes based environments. One way to catch up and innovate is to use eBPF.
One example of innovation was the creation of Cilium, a CNI and cloud native dataplane for Kubernetes which is built upon eBPF and offers a variety of advanced networking, security and observability use cases.
So how does it work?
As mentioned, eBPF is a framework to extend the Linux Kernel. It is a general purpose engine with a minimal instruction set, allowing to run programs in the kernel to customize the kernel behavior. So how does it work?
The first thing to understand is that eBPF programs are event-driven: they are run when the Kernel or any application meets a hook point. Such hooks are system calls, function entries or exists, Kernel tracepoints, network events, etc.
The eBPF programs can be written in C or Rust. With the help of the LLVM compiler the code is transformed into eBPF bytecode, which is then loaded into the Kernel and verified there. Note that in many cases, eBPF is not used directly, but projects building on it are leveraged. Check out bcc, bpftrace or Cilium for examples.
When an eBPF program is loaded into the kernel, it is just-in-time compiled, as mentioned previously. However, there is one step just before that: the verification! After all, eBPF programs work in kernelspace. If they fail, they can take down the entire kernel and thus the entire machine. To avoid this, the verification ensures that eBPF programs are safe to run. The verification checks multiple areas:
- Is termination of the program ensured? For example unbound loops must be prevented.
- Is memory safety guaranteed? Memory access must not happen out-of-bound, for example.
- Make sure that type safety is there. Prevent type confusion bugs, for example.
This is by no means a simple task. When non-root users are allowed to inject eBPF programs, the verifier is an attractive target to try to bypass security features in the OS kernel. Also making the verifier working correctly and for example ensuring that all loops are properly tested is non-trivial.
Once the program was verified, and the just-in-time compilation happened, it can run. However, when the program is running, how is data exchanged? After all, if we want to use the program to generate observability data, how do we get to the data from the depth of the kernel? This is where the eBPF maps come in, one of the big reasons for the “extended” in eBPF.
Maps are generic data structures that are defined by the eBPF programs. Simply put they are key-value stores that can be used to share data between kernel space and user space. Some examples for data to be moved are metrics from the eBPF program to the user space or configuration data from user space to the eBPF program.
There is one more thing how eBPF programs can retrieve data: these are eBPF helper functions. They interact with the system or context in which they work, and for example can print debug messages, get IDs from running processes, etc.
To get a better idea of how this all works , let’s have a look at a real life eBPF program, opensnoop. Yes, there is some C code involved but don’t let this scare you away! Opensnoop traces “open()” syscalls, and was written by Brendan Gregg as part of the BCC project. It outputs what processes are opening what files – this can be a lot, depending on the load of your machine:
Code language: PHP (php)
PID COMM FD ERR PATH 1920 systemd-oomd 10 0 /proc/meminfo 4270 ThreadPoolForeg 162 0 /home/liquidat/.config/google-chrome/Default/IndexedDB/https_play.instruqt.com_0.indexeddb.leveldb 4270 ThreadPoolForeg -1 2 /home/liquidat/.config/google-chrome/Default/IndexedDB/https_play.instruqt.com_0.indexeddb.blob 2059 abrt-dump-journ 4 0 /var/log/journal/64cdbe2295fa408d91e2c6e432915875/system.journal
How is this information generated? Opensnoop attaches an eBPF program to the open syscalls (there are two). The sourcecode is quite straightforward:
SEC("tracepoint/syscalls/sys_enter_open") int tracepoint__syscalls__sys_enter_open(struct trace_event_raw_sys_enter* ctx)
The first line identifies the system to attach to, the second one defines the function to run in that case. The argument inside of the function is defining what kind of information will be received, and how to handle this.
Inside the function we see helper functions identifying the process id with a helper function, and finally the data is written to a eBPF map (sourcecode):
bpf_map_update_elem(&start, &pid, &args, 0);
Afterwards, all you have to do is run some user space code which attaches to the right eBPF map and listens for the event buffer to check for new entries in the eBPF map and print them to the screen!
In 2014, eBPF brought super powers to the Linux kernel. And the revolution started at that time is still taking place, extending to more and more use cases. More and more projects which are working close to the hardware layer or have needs in networking, security or observability are looking at eBPF, and Kubernetes is right in the middle of it.
If you want to get a better understanding of how eBPF works, I recommend checking out the book “What is eBPF” from Liz Rice. The opensnoop example above is taken from that book, and Liz goes into way more into details in her book, also examining how the user space side of the tooling works. There is also a hands-on lab accompanying the book which walks you through some of the opensnoop example steps.
Also, join the eBPF Summit, a virtual event, targeted at DevOps, SecOps, platform architects, security engineers, and developers on September 28 and 29, 2022.
Last but not least, there recently was a webinar series by some of the project creators and maintainers of eBPF (and Cilium), called “How the Hive came to be”: one about the history of eBPF, one deep dive and one looking at Cilium as a real world use case of eBPF. Find the recordings at isovalent.com/events by filtering for “Webinars”