I have followed eBPF development from afar, so I don't exactly where it's at. I have a ... semantics question: do people really refer to eBPF as BPF ?
This is probably bothering me more than it should, but why do overload terms when a more correct solution is available from the start ? The BPF virtual machine is not exactly new. For example, tcpdump supports BPF. Not eBPF, though.
If we start referring to eBPF as BPF, then pretty soon other OSes using BPF will be referred as having "incomplete BPF implementations" because Linux has eBPF and we incorrectly refer to it as BPF.
Understandable confusion, I've seen it referred to both ways. We decided to go with BPF for this blog post since that appears to be the official abbreviation.
From "BPF Performance Tools" by Brendan Gregg:
"Extended BPF is often abbreviated as eBPF, but the official abbreviation is still BPF, without the "e," so throughout this book I use BPF to refer to extended BPF. The kernel contains only one execution engine, BPF (extended BPF), which runs both extended BPF and "classic" BPF programs."
Just a quick note that 19.04 Ubuntu reached EOL on Jan 23rd, 2020. Ubuntu has several releases that support 4.18+ kernels [0]. I recommend using 19.10, but you can also use a more recent HWE kernel for the 18.04 LTS [1].
What's the threat model being addressed here? If someone is trying to act maliciously there must be a thousand ways around calling exec (for example just mapping a program and jumping to its main function accomplishes the same thing).
We're trying to raise structured behavioral information about what is happening in a session to the cluster administrator.
That means we don't just provide information about what's executing, but also what files are being opened and TCP connections being established. Other avenues of expansion you may see this feature venture into: how were files changed, support for other protocols, support for other events (bind, listen, accept).
However we are not claiming this approach is not subvertable, but we do want to raise the bar for attackers and make it easier for cluster administrators to understand what is happening within their system.
Doesn't that just end up calling open() and mmap()? Might not have access to the args passed through at that point, but that's going to leave a trail and of course anything interesting the mapped program does will end up going through syscalls(opening other "files").
Though it should be noted that's not quite the same thing as execve. Execve does a lot of things in addition to running the main function (privilege transitions like setuid being just one example).
Of course; in addition to kernel setup this will also skip over initializers in the binary and other things that the C runtime does before main. Needless to say, this is mostly only useful as a fun side effect of PIE executables.
Mr Jones, this and all of your articles are just delightful. Can you share any early feedback from the field or end-user testing? Have folks been happy that this ticks compliance checkboxes even if the current solution may be subverted by root users?
So far we have gotten positive feedback. While this feature does not protect against root doing something malicious, it does allow admins to capture what root was doing up until they did something malicious and link that information to an identity (if using SSO).
Along with this feature we rolled out a Workflow API [1] that can be used to request role elevation. Once you add in session termination (which we are aiming for in the next release), you will have a powerful set of features that will allow you to start users out with limited access to your cluster with the ability to request more privileges and potentially automatically termination your session (and user) if you're found to be doing some malicious.
When is a capability like this turned on? Is teleport running all of the time (seems like a huge log of data, no?), or only when anomalous behavior is detected? or... something else?
Since the author, russjones, seems to be here, I'd like to ask a question regarding writing the actual BPF programs. I've been writing a term paper about BPF verification, the in-kernel verifier and research like PREVAIL [1], so I'm curious.
Is writing valid BPF programs really that "hard"? E.g., does one often have to rewrite programs bc. the verifier wouldn't accept them?
Do you see a need to extend BPF with more capabilities? (bounded loops have been added in Kernel 5.3, but maybe something else)
I never thought about needing streams of information like this, but now that I am, this is a great way to provide general trace-tooling for containers and other things!