Understand the purpose of the Linux kernel's BPF subsystem
Understand how to peform basic tracing and analysis with the bpftrace tool
What is BPF?
Where did BPF come from?
Major features of BPF
Probes
BCC: BPF Compiler Collection
Using bpftrace and some internals
BPF stands for BSD Packet Filter
Original BPF: packet filter
Modern BPF: general-purpose in-kernel virtual machine
Efficient packet filtering
1993 paper
Huge performance gain for SunOS
2013 rewrite
Generalize BPF instruction set
Verified just-in-time compilation from C to BPF to native code
Subsystems can provide BPF functions
Tracing and observability
Event-driven programming rather than task-driven
Runtime verification
Networking
Example: XDP: Express data path
Performance profiling
perf
Dyanmic instrumentation considered harmful
Used in 1990s: same technique debuggers use to place breakpoints
e.g. kerninst, which has a really old website
DProbes by IBM rejected in 200
Many preferred C over bytecode
Rejection used as case study in this paper
Promoted by Sun first, and not for Linux!
DTrace released to wide acclaim in Solaris 10
Scripting lanuage for DTrace was called D
Not to be confused with the D programming language
Looking at probe types, not syntax details
Dynamic kernel code instrumentation
Replace first byte of instruction with breakpoint
Similar concept: kretprobes for function return
do_nanosleep
do_nanosleep() source
do_nanosleep()
Dynamic userspace program instrumentation
Readline definition
statically defined: more stable interface
sys_enter#sname in include/linux/syscalls.h
sys_enter#sname
tracepoint metadata defined here: include/trace/events/syscalls.h
Set of tools and libraries
bcc/tools/opensnoop.py
The simpler, more lightweight option
bpftrace/tools/opensnoop.bt
Similar to awk
simple awk program
#!/bin/bpftrace to write a script
#!/bin/bpftrace
To run a oneliner from shell: bpftrace -e
bpftrace -e
To list possible probes: bpfrace -l
bpfrace -l
*
Key value store
Use @<id>
@<id>
can be unnamed
Count syscalls invoked per-process
Histogram of bytes read by a process
Use kstack and ustack builtins
kstack
ustack
bpftrace reads on a per-cpu basis
sudo bpftrace -e 'k:ksys_read {printf("%s\n", kstack);}'
sudo bpftrace -e 'k:ksys_read {printf("%s\n", ustack);}'
bpftrace lexer source and parser source
The bptrace -d option
-d
BPF type format from kernel
strace a bpf program: bpf(2) system call in action
bpf(2)
All BPF programs JITed
Hundreds of error returns!
LLVM IR to bytecode
Bytecode to native code
Use bpftool to see translated and jitted BPF program
bpftool
Example: definition of bpf_get_current_pid_tgid()
bpf_get_current_pid_tgid()
include/uapi/linux/{bpf,bpf_common,filter}.h
Compiler optimization for code arrangement
See more: bpftrace kselftests
General selftest info
Good first contribution
Linux BPF is an in-kernel general-purpose execution engine
BPF programing is event-driven
kprobes provide dynamic kernel instrumentation
kprobes
uprobes provide dynamic userspace instrumentation
uprobes
tracepoints provide a more stable, static kernel tracing interface
tracepoints
BPF refers to Linux eBPF, a major rewrite of classic BPF
BCC and bpftrace are two common frontends to the kernel's BPF subsystem
BCC
bpftrace
Use BCC for more serious tool development
Use bpftrace for quick interaction and prototyping tools
bpftrace is an awk-like scripting lanuage
Check out Brendan Gregg's work
Good talk on more internals
Good book to dive into more depth