Before we start, I want to highlight the difference between bpftrace and eBPF programs.
eBPF Programs: eBPF programs are a bit more complex and require writing two separate sections of code - one for userspace, and one for kernel space. These programs are not limited to tracing, and can be used for any number of tasks ranging from performance enhancing to deep security.
bpftrace : It is focused solely on tracing and abstracts away a lot of the complexity, allowing us to get started instrumenting and tracing right out of the box with an easy scripting language.
Alright - now that we have that out of the way, let's look at some userspace instrumentation. We'll cover uprobe and uretprobe. These allow us to instrument the beginning and end(return) of a user-level function execution respectively. i.e uprobe will instrument the beginning, and uretprobe - as the name suggests - will instrument the end.
What is instrumentation and why would we want to instrument?
In this blog series, I will intentionally digress from the core subject to provide the reader with a broader understanding. Instrumentation refers to the process of adding monitoring and measuring capabilities to a program. Think of it like a debugger, but instead of something like gdb which pauses the execution of a program to allow an engineer to debug/change anything in the process flow, uprobe based instrumentation happens purely in runtime - execution is not paused. In the context of a uprobe, when the specified instruction is called (the start of the function) the probe temporarily redirects execution to a specified handler (in this case the eBPF program). We can then collect data from the system at this point, modify the process state, or even change the instruction entirely. As discussed earlier, using bpftrace will limit us to tracing and reading the system state at these points. Instrumenting code in this way adds a performance cost. Let's move on.
Optional Reading : To dive deep into uprobe tracing read - https://docs.kernel.org/trace/uprobetracer.html
Finding functions to instrument
You can use objdump
or nm
to find functions to instrument. objdump
is a tool to display information about object files. An object file is a file containing object code, that is - the result of compiling source code. It includes machine code (instructions that the processor can execute), data sections, symbols, and other metadata depending on how it was compiled. Object files are usually an intermediate step while building a full program. A linker takes one or more object files and combines them to produce a final executable or a library.
└──╼ $objdump -tT /bin/bash | grep readline
0000000000137960 g DO .bss 0000000000000008 Base rl_readline_state
00000000000bfb50 g DF .text 0000000000000312 Base readline_internal_char
00000000000bef90 g DF .text 0000000000000157 Base readline_internal_setup
000000000008c6e0 g DF .text 000000000000009c Base posix_readline_initialize
00000000000c0530 g DF .text 000000000000009a Base readline
0000000000135cd4 g DO .bss 0000000000000004 Base bash_readline_initialized
000000000012da50 g DO .data 0000000000000008 Base rl_readline_name
0000000000128930 g DO .data 0000000000000004 Base rl_readline_version
000000000008c9b0 g DF .text 000000000000084b Base initialize_readline
000000000012e018 g DO .bss 0000000000000004 Base current_readline_line_index
000000000012e028 g DO .bss 0000000000000008 Base current_readline_prompt
00000000000954f0 g DF .text 0000000000000041 Base pcomp_set_readline_variables
000000000012e020 g DO .bss 0000000000000008 Base current_readline_line
00000000000bf0f0 g DF .text 0000000000000120 Base readline_internal_teardown
000000000012892c g DO .data 0000000000000004 Base rl_gnu_readline_p
-t will print the symbol table entries of the file
-T will print the dynamic symbol table entries of the file. This is only meaningful in a case where shared / dynamically linked libraries have been used.
g means Global - meaning that it's a global symbol accessible from other object files
DO - Data Object - a symbol in the data section
DF - Data Function - A function in the text (code) section
To know more about sections etc, please read about object files and assembly. It's good to know as much as you can.
Let's hook the start and endpoints of the readline function in bash.
┌─[root@darkmatter]─[~]
└──╼ #bpftrace -e 'uprobe:/bin/bash:readline { printf("arg0: %s\n", str(arg0)); }'
Attaching 1 probe...
arg0: ┌─[apocalypse0@darkma
arg0: ┌─[apocalypse0@darkma
arg0: ┌─[apocalypse0@darkma
arg0: ┌─[apocalypse0@darkma
If you run the above command, open another terminal window and type some commands you'll keep seeing your prompt. The above output leads us to believe that the arg0 of the readline() function, when called from bash, is indeed my prompt. Let's confirm it by running man 3 readline
So readline will take in 1 argument - that is the prompt, and that is indeed what it's returning. I have no idea why it's truncated, and I will not be pursuing that. On some of my systems it is truncated, on others it is not.
Now to get the return value. The same man page, further down states
"RETURN VALUE readline returns the text of the line read. A blank line returns the empty string. If EOF is encountered while reading a line, and the line is empty, NULL is returned. If an EOF is read with a non-empty line, it is treated as a newline."
Let's try and get that
┌─[✗]─[root@darkmatter]─[~]
└──╼ #bpftrace -e 'uretprobe:/bin/bash:readline { printf("readline: \"%s\"\n", str(retval)); }'
Attaching 1 probe...
readline: "clear"
readline: "ls"
readline: "top"
readline: "id"
readline: "whoami"
readline: "who"
readline: "last"
According to the docs at https://github.com/iovisor/bpftrace/blob/master/docs/reference_guide.md#3-uprobeuretprobe-dynamic-tracing-user-level, the arguments can be accessed as follows
uprobe: arg0, arg1, ..., argN
uretprobe: retval
We can get the prompt and the command as follows
┌─[root@parrot]─[/home/apocalypse0/Documents/BPF]
└──_ #cat bpftrace_bash.bt
uprobe:/bin/bash:readline { printf("%s: ", str(arg0)); }
uretprobe:/bin/bash:readline { printf("%s\n ", str(retval)); }
┌─[root@parrot]─[/home/apocalypse0/Documents/BPF]
└──_ #bpftrace bpftrace_bash.bt
Attaching 2 probes...
┌─[apocalypse0@parrot: id
┌─[apocalypse0@parrot: whoami
Filters
Now let's write our first filter. Since we are now looking at bash, and instrumenting the readline to get the prompt, as well as the command entered, how about we attempt to get the stdout?
One way to do this is to trace the write() syscall. We know that stdout is represented by the file descriptor. Let's take a look at a couple of things. The first is the man 2 write
page.
The second thing we can look at is the kernel tracing format file for the entry of the write() syscall. By now you should be familiar with all of this so I will not be explaining things in too much detail, and the screenshots alone should suffice.
Let us attempt to write our first filter:
Syntax: /filter/
Filters (also known as predicates) can be added after probe names. The probe still fires, but it will skip the action unless the filter is true.
┌─[root@parrot]─[/sys/kernel/tracing/events/syscalls/sys_enter_write]
└──╼ #bpftrace -e 't:syscalls:sys_enter_write /args->fd == 1/ { printf("%s\n", str(args->buf)); }'
In the above example, we are attaching a tracepoint at the entry of the write() syscall. As indicated in the screenshot above, the format of this includes some of the variables available to us - which are:
fd : File descriptor
buf : What is being written
Since these are pointers, we need to dereference them before we print them. In the filter section, we specify that only if the file descriptor is equal to 1, the action should be performed. That is, we only want to print the stdout. Beware : depending on what your system is doing, this output could be incredibly noisy. It's an interesting way to take a sneak peek under the hood and see what's happening inside the system - it's like spying on all the programs in your computer! If your system is doing pretty much nothing, fire up another terminal window and run a few commands - you should see the output of those commands.
Try out different combinations of this - try and combine knowledge from the previous lessons with this to write some programs, play about, break things, instrument things - and grow your understanding.
Conclusion
We learned about userspace instrumentation in this blog. It is a powerful feature. You could instrument routines in libc and intercept pretty much anything that you'd like - if you know what you want to look for.