Using BPF to Transform SSH Sessions into Structured Events

russjones · on Feb 27, 2020

Author of the post here, happy to answer any questions.

marios · on Feb 27, 2020

I have followed eBPF development from afar, so I don't exactly where it's at. I have a ... semantics question: do people really refer to eBPF as BPF ? This is probably bothering me more than it should, but why do overload terms when a more correct solution is available from the start ? The BPF virtual machine is not exactly new. For example, tcpdump supports BPF. Not eBPF, though. If we start referring to eBPF as BPF, then pretty soon other OSes using BPF will be referred as having "incomplete BPF implementations" because Linux has eBPF and we incorrectly refer to it as BPF.

russjones · on Feb 27, 2020

Understandable confusion, I've seen it referred to both ways. We decided to go with BPF for this blog post since that appears to be the official abbreviation.

From "BPF Performance Tools" by Brendan Gregg:

"Extended BPF is often abbreviated as eBPF, but the official abbreviation is still BPF, without the "e," so throughout this book I use BPF to refer to extended BPF. The kernel contains only one execution engine, BPF (extended BPF), which runs both extended BPF and "classic" BPF programs."

botch_san · on Feb 27, 2020

Here is a short explanation from Brendan Gregg https://youtu.be/7pmXdG8-7WU?t=885

dr_hooo · on Feb 27, 2020

afaik, BPF has been retroactively renamed to classic BPF (cBPF).

Zebdan · on Feb 28, 2020

Great article!

Just a quick note that 19.04 Ubuntu reached EOL on Jan 23rd, 2020. Ubuntu has several releases that support 4.18+ kernels [0]. I recommend using 19.10, but you can also use a more recent HWE kernel for the 18.04 LTS [1].

[0] https://ubuntu.com/about/release-cycle

[1] https://wiki.ubuntu.com/Kernel/LTSEnablementStack

thedance · on Feb 27, 2020

What's the threat model being addressed here? If someone is trying to act maliciously there must be a thousand ways around calling exec (for example just mapping a program and jumping to its main function accomplishes the same thing).

russjones · on Feb 27, 2020

We're trying to raise structured behavioral information about what is happening in a session to the cluster administrator.

That means we don't just provide information about what's executing, but also what files are being opened and TCP connections being established. Other avenues of expansion you may see this feature venture into: how were files changed, support for other protocols, support for other events (bind, listen, accept).

However we are not claiming this approach is not subvertable, but we do want to raise the bar for attackers and make it easier for cluster administrators to understand what is happening within their system.

jsjohnst · on Feb 28, 2020

This product is more for compliance (like PCI-DSS) than it is for “real” security.

Rapzid · on Feb 28, 2020

> for example just mapping a program and jumping to its main function accomplishes the same thing

Without a syscall? Perhaps to open()..

ghostpepper · on Feb 28, 2020

Those two are functionally equivalent but they aren't really the same level of difficulty, are they?

saagarjha · on Feb 28, 2020

Here's how you'd exec:

  execv("/path/to/binary", (char *[]){"binary", NULL});

And here's a way to do that without exec:

  (((int (*)(int, char **))dlsym(dlopen("/path/to/binary", RTLD_LAZY), "main")))(1, (char *[]){"binary", NULL});

A bit uglier, but not all that much harder.

Rapzid · on Feb 28, 2020

> dlopen

Doesn't that just end up calling open() and mmap()? Might not have access to the args passed through at that point, but that's going to leave a trail and of course anything interesting the mapped program does will end up going through syscalls(opening other "files").

saagarjha · on Feb 28, 2020

Trying to get stuff into your memory that wasn't there before is going to require at least one syscall.

cyphar · on Feb 28, 2020

Though it should be noted that's not quite the same thing as execve. Execve does a lot of things in addition to running the main function (privilege transitions like setuid being just one example).

saagarjha · on Feb 28, 2020

Of course; in addition to kernel setup this will also skip over initializers in the binary and other things that the C runtime does before main. Needless to say, this is mostly only useful as a fun side effect of PIE executables.

aberoham · on Feb 27, 2020

Mr Jones, this and all of your articles are just delightful. Can you share any early feedback from the field or end-user testing? Have folks been happy that this ticks compliance checkboxes even if the current solution may be subverted by root users?

russjones · on Feb 27, 2020

Thank you Abe! The kind comments are appreciated!

So far we have gotten positive feedback. While this feature does not protect against root doing something malicious, it does allow admins to capture what root was doing up until they did something malicious and link that information to an identity (if using SSO).

Along with this feature we rolled out a Workflow API [1] that can be used to request role elevation. Once you add in session termination (which we are aiming for in the next release), you will have a powerful set of features that will allow you to start users out with limited access to your cluster with the ability to request more privileges and potentially automatically termination your session (and user) if you're found to be doing some malicious.

[1] https://gravitational.com/blog/workflow_api/

0xff00ffee · on Feb 28, 2020

When is a capability like this turned on? Is teleport running all of the time (seems like a huge log of data, no?), or only when anomalous behavior is detected? or... something else?

anitil · on Feb 28, 2020

Really interesting use of BPF! A nice coincidence that my work's just started using Teleport.

kalium_xyz · on Feb 28, 2020

BPF is extremely awesome. I cant wait to see more projects using it.

justlexi93 · on Feb 28, 2020

It's just that Linux's eBPF system has been extended far, far beyond packet filtering.

cptwunderlich · on Feb 28, 2020

Since the author, russjones, seems to be here, I'd like to ask a question regarding writing the actual BPF programs. I've been writing a term paper about BPF verification, the in-kernel verifier and research like PREVAIL [1], so I'm curious.

Is writing valid BPF programs really that "hard"? E.g., does one often have to rewrite programs bc. the verifier wouldn't accept them? Do you see a need to extend BPF with more capabilities? (bounded loops have been added in Kernel 5.3, but maybe something else)

Thank you.

[1] https://vbpf.github.io/

saber6 · on Feb 28, 2020

I never thought about needing streams of information like this, but now that I am, this is a great way to provide general trace-tooling for containers and other things!

Very interesting post. Thanks for sharing.

lstamour · on Feb 28, 2020

Sure thing, but just to clarify a point you might not be making, don't put shells and SSH in your containers: https://github.com/GoogleContainerTools/distroless#why-shoul...