BPF

Learning objectives:

  1. Understand the purpose of the Linux kernel's BPF subsystem

  2. Understand how to peform basic tracing and analysis with the bpftrace tool

Overview

  1. What is BPF?

  2. Where did BPF come from?

  3. Major features of BPF

  4. Probes

  5. BCC: BPF Compiler Collection

  6. Using bpftrace and some internals

What is BPF?

  1. BPF stands for BSD Packet Filter

  2. Original BPF: packet filter

  3. Modern BPF: general-purpose in-kernel virtual machine

History of BPF: motivation

Efficient packet filtering

History of BPF: origins

1993 paper

Huge performance gain for SunOS

History of BPF: in Linux

2013 rewrite

  1. Generalize BPF instruction set

  2. Verified just-in-time compilation from C to BPF to native code

  3. Subsystems can provide BPF functions

Major features

Tracing and observability

  1. Main focus of later examples

Major features

Event-driven programming rather than task-driven

Major features

Runtime verification

Major features

Networking

Example: XDP: Express data path

Major features

Performance profiling

  1. e.g. perf

Dynamic instrumentation: History

Dyanmic instrumentation considered harmful

  1. Used in 1990s: same technique debuggers use to place breakpoints

  2. e.g. kerninst, which has a really old website

Dynamic instrumentation: Linux rejects

DProbes by IBM rejected in 200

  1. Many preferred C over bytecode

  2. Rejection used as case study in this paper

Dynamic instrumentation: rusing popularity

Promoted by Sun first, and not for Linux!

  1. DTrace released to wide acclaim in Solaris 10

  2. Scripting lanuage for DTrace was called D

  3. Not to be confused with the D programming language

Note on next couple of examples

Looking at probe types, not syntax details

kprobes

Dynamic kernel code instrumentation

  1. Replace first byte of instruction with breakpoint

  2. Similar concept: kretprobes for function return

demo: tracing do_nanosleep

do_nanosleep() source

uprobes

Dynamic userspace program instrumentation

demo: tracing bash's readline function

Readline definition

tracepoints

statically defined: more stable interface

  1. sys_enter#sname in include/linux/syscalls.h

  2. tracepoint metadata defined here: include/trace/events/syscalls.h

demo: tracepoint probe type for open syscall

  1. open syscall definition

Using BCC: BPF compiler collection

Set of tools and libraries

demo

bcc/tools/opensnoop.py

bpftrace

The simpler, more lightweight option

demo

bpftrace/tools/opensnoop.bt

  1. less well-featured than BCC version

bpftrace syntax

Similar to awk

demo

simple awk program

bpftrace invocation

  1. #!/bin/bpftrace to write a script

  2. To run a oneliner from shell: bpftrace -e

  3. To list possible probes: bpfrace -l

    1. Can use * wildcards

bpftrace maps

Key value store

  1. Use @<id>

  2. can be unnamed

demo

Count syscalls invoked per-process

demo

Histogram of bytes read by a process

bpftrace live stacks

Use kstack and ustack builtins

Note

bpftrace reads on a per-cpu basis

demo

sudo bpftrace -e 'k:ksys_read {printf("%s\n", kstack);}'

demo

sudo bpftrace -e 'k:ksys_read {printf("%s\n", ustack);}'

bpftrace internals

bpftrace compilation

bpftrace code to AST

bpftrace lexer source and parser source

  1. Generates LLVM Intermediate Representation using this target

demo

The bptrace -d option

BTF

BPF type format from kernel

demo

strace a bpf program: bpf(2) system call in action

BPF JIT

All BPF programs JITed

  1. Used to be an interpreted option

In-kernel verification

Hundreds of error returns!

  1. kernel/bpf_verifier.c

In-kernel representation

  1. LLVM IR to bytecode

  2. Bytecode to native code

demo

Use bpftool to see translated and jitted BPF program

BPF kernel entry points

Example: definition of bpf_get_current_pid_tgid()

Further exploration

include/uapi/linux/{bpf,bpf_common,filter}.h

detour: unlikely()?

Compiler optimization for code arrangement

  1. See include/linux/compiler.h

Entry point for further exploration

See more: bpftrace kselftests

  1. General selftest info

  2. Good first contribution

Summary

Linux BPF is an in-kernel general-purpose execution engine

Summary

BPF programing is event-driven

  1. Various probe types provide different triggers for invoking BPF

Summary

  1. kprobes provide dynamic kernel instrumentation

  2. uprobes provide dynamic userspace instrumentation

  3. tracepoints provide a more stable, static kernel tracing interface

Summary

BPF refers to Linux eBPF, a major rewrite of classic BPF

  1. Classic BPF is an optimization for network packet filtering

Summary

BCC and bpftrace are two common frontends to the kernel's BPF subsystem

  1. Use BCC for more serious tool development

  2. Use bpftrace for quick interaction and prototyping tools

Summary

bpftrace is an awk-like scripting lanuage

  1. provides a quick, modern way to see what's going on inside the kernel

End

Check out Brendan Gregg's work

  1. Good talk on more internals

  2. Good book to dive into more depth