Lecture 04: Monday, 10 June 2024

Announcements

  1. E0 initial submission: due tonight

  2. P0 released: due in two weeks

  3. CLARIFICATION: When re-submitting your peer review, reply to the original cover and not your own email

    1. When grading your peer review, we will only look at the latest reply to the latest cover letter

    2. Change the line "set sort=threads" to "set sort=reverse-threads" to make the newest emails appear first, which may help

  4. Suggestion for homework workflow: make a private fork of the ILKD_submissions repository and push your local changes to GitHub as a backup

  5. Please don't hesitate to ask any questions in #questions

    1. There are no stupid questions
  6. Take note of the new "Practical Reference" section at the bottom of this page


Review and quick detour

  1. Last week, we started building a minimal Linux distribution

    1. During Lecture 02, we compiled and booted the kernel using QEMU and a stub init program packaged in an initramfs

    2. During Lecture 03, we compiled busybox to upgrade our userspace with basic core Linux utilities and a shell

  2. As a reminder, busybox is a single binary containing many common Linux CLI utilities like mkdir, ln, and cat

    1. Some of these utlities, like ps and reboot don't work because we are missing the /proc filesystem
  3. Let's take a brief detour to add a couple of kernel-backed filesystems to our system

    1. A kernel backed filesystem is one whose contents are generated by the kernel rather than a storage medium

    2. The /proc fileysystem provides access to internal kernel data structures and exposes a number of configuration knobs

    3. The sysfs filesystem mounted at /sys is similar but structured differently

    4. The most important features is that /proc provides information about running processes

    5. We can use the mount utility to add /proc and /sys to our system

    6. The the kernel immediately populates the contents of both

    7. We can add commands to our init script to mount these filesystems automatically

    8. We will return to /proc and /sys later in the course


Lecture Summary

  1. We add a C compiler and C library to our system

    1. We use The Tiny C Compiler (tcc), the tiny C compiler, since it is tiny and has some fun features

    2. We use The GNU C Library (glibc) as our C library, since it's the most widely used

  2. Build the C compiler (tcc) and C library (glibc) and install into the initramfs

    1. We create a directory tree to package as a cpio archive that we can use as the kernel's initial filesystem (initramfs)

    2. Without a dynamic linker, executable binaries must be statically linked

    3. The C runtime includes the headers and libraries necessary to run a C program, as well as a dynamic linker

    4. First, we configure, build, and install tcc into our root filesystem directory tree

    5. Second, we configure, build, and install glibc into our root filesystem directory tree

  3. We compile and run a "Hello world" C program to demonstrate that our system works

    1. Our first attempt yields a couple of "file not found" errors that we can fix by specifying additional include and link paths

    2. We eliminate the need to specify these options at tcc invokation by defining environment variables in our init script

    3. With these fixes, tcc works as expected and we build and run "hello world" successfully

  4. With our VM containing core utilities, a C compiler, and a C runtime, we have a minimal Linux distribution ready to roll

    1. This is the main purpose of this lecture
  5. We now turn our attention to some of the advanced features of C used frequently by the Linux kernel

    1. We will look at stringification and token/string concatenation

    2. We will see some variations of for_each*

    3. We will see examples of assembly source files that combine usage of assembly and C macros


Practical Reference

This section contains a rundown of the commands and scripts we use in this demo.

To begin, create a directory and either link to or install a built Linux source tree in the linux subdirectory.

Starting the system quickly: start_vm.sh

To avoid needing to keep rebuilding the cpio archive by hand and manually editing and invoking QEMU, we can use a simple script to package whatever is in the rootfs directory as a usable initramfs and immediately run QEMU.

find . will list files inside the subtree of the filesystem starting the current directory. All paths listed in the output are relative to the current directory.

The cpio utility requires a list of paths to files to include in the archive. The program reads this list from the standard input stream. Therefore, we can pipe the standard output stream from find . into cpio -co to create a cpio-formatted archive of a filesystem directory tree starting from the current directory.

$ cat start_vm.sh
#!/bin/sh

# package our initial root filesystem tree for use as initramfs
cd rootfs
find . | cpio -co > ../rootfs.cpio
cd ..

# invoke QEMU with our kenrel image and the initramfs from above
qemu-system-aarch64 \
    -machine virt \         # machine type (virt is a general purpose option)
    -cpu cortex-a53 \       # cpu model (cortex-a53 is an arbitrary choice -- it's used in one of the raspberry pi computers)
    -smp 1 \                # smp = symetric multi-processing, and we specify that we only require a single virutal CPU core
    -m 1024 \               # m = memory, and we only need 1024MB
    -kernel linux/arch/arm64/boot/Image \   # path to the Linux kernel image
    -initrd rootfs.cpio \   # path to the file containing either the initial root filesystem (initrams) or initial ramdisk (initrd)
    -display none \         # don't display any video output
    -serial stdio \         # connect the terminal's standard input/output to the serial console
    -no-reboot \            # exit instead of rebooting when the system halts
    -append "console=ttyAMA0 panic=-1"  # add these arguments to the Linux kernel boot commandline options
                                        # console=ttyAMA0 will use the AMA0 device serial port as the main system console
                                        #   this is the main serial port on the raspberry pi that we are sort of virtualizing
                                        # panic=-1 will set the kernel to reboot immediately in the case of a kernel panic

More information about the arguments can be found in this documentation

Our init script

This is the /init we used at the end of L03. This will be installed in our root filesystem as /init.

$ cat rootfs/init
#!/bin/ash

exec ash

Recall that /bin/ash is a symlink to /bin/busybox generated by the busybox build system. We initially booted our system without these symlinks (and just the /busybox binary) and therefore we needed a different init script

$ cat rootfs/init
#!/busybox ash

exec /busybox ash

I made the mistake of attempting to use the first script in this situation which led to some confusion, however this second script fixed the problem and booted correctly, though using our system was annoying since every command needed to be prefixed by /busybox, e.g. /busybox mkdir.

This /init script is enhanced to automatically mount /proc and /sys

$ cat rootfs/init
#!/bin/ash
# mount <device> <path> -t <filesystem type>

mkdir /proc
mount none /proc -t proc

mkdir /sys
mount none /sys -t sysfs

exec ash

We have a short article about /proc available containing material we will return to later in the course.

Make sure this script is executable! Otherwise, the kernel will fail to run /init and fall back to trying to execute several other paths before reaching a panic.

Build busybox for our rootfs:

Build busybox from source.

$ git clone git://busybox.net/busybox.git
$ cd busybox && git checkout 1_36_stable    # Use the the latest stable branch instead of master
$ make defconfig                            # Generate the default build configuration file .config

We patch the default configuration to build busybox as a statically-linked binary and disable the tc utility that breaks compilation.

@@ -40,7 +40,7 @@ CONFIG_FEATURE_SYSLOG=y
 #
 # Build Options
 #
-# CONFIG_STATIC is not set
+CONFIG_STATIC=y
 # CONFIG_PIE is not set
 # CONFIG_NOMMU is not set
 # CONFIG_BUILD_LIBBUSYBOX is not set
@@ -968,8 +968,8 @@ CONFIG_PSCAN=y
 CONFIG_ROUTE=y
 CONFIG_SLATTACH=y
 CONFIG_SSL_CLIENT=y
-CONFIG_TC=y
-CONFIG_FEATURE_TC_INGRESS=y
+# CONFIG_TC is not set
+# CONFIG_FEATURE_TC_INGRESS is not set
 CONFIG_TCPSVD=y
 CONFIG_UDPSVD=y
 CONFIG_TELNET=y

These options should set using a tool like make menuconfig

For make menuconfig, users of newer gcc versions may need to patch the busybox source like so:

diff --git a/scripts/kconfig/lxdialog/check-lxdialog.sh b/scripts/kconfig/lxdialog/check-lxdialog.sh
index 5075ebf2d..4e138366d 100755
--- a/scripts/kconfig/lxdialog/check-lxdialog.sh
+++ b/scripts/kconfig/lxdialog/check-lxdialog.sh
@@ -47,7 +47,7 @@ trap "rm -f $tmp" 0 1 2 3 15
 check() {
         $cc -x c - -o $tmp 2>/dev/null <<'EOF'
 #include CURSES_LOC
-main() {}
+int main() {}
 EOF
        if [ $? != 0 ]; then
            echo " *** Unable to find the ncurses libraries or the"       1>&2

To build and install in the rootfs, assuming busybox and the rootfs reside in the same parent directory:

make
make install
cp -r _install/* ../rootfs

This is where we left off at the end of L03, but with the addition of /proc and /sys mounted by /init seen above.

Clone tinycc

git clone git://repo.or.cz/tinycc.git

If for some reason the mob branch is broken, this was theHEAD commit used in this demo: 3b943bec5de423e234b5f92d9a8f110ad66a85a1

Configure and build:

./configure
make

Note: There is no longer any necessity to compile a statically linked binary since we are about to add a dynamic linker. One may chose to use either a statically linked or dynamically linked library at their own discretion.

Since there is no obvious equivalent of DESTDIR we can use this simple script to install tcc in our rootfs:

#!/bin/sh

ROOTFS=../rootfs

mkdir -p $ROOTFS/bin
mkdir -p $ROOTFS/lib/tcc/include
mkdir -p $ROOTFS/include

cp tcc $ROOTFS/bin/tcc

cp libtcc1.a $ROOTFS/lib/tcc/libtcc1.a
cp runmain.o $ROOTFS/lib/tcc/runmain.o

cp bt-exe.o $ROOTFS/lib/tcc/bt-exe.o
cp bt-log.o $ROOTFS/lib/tcc/bt-log.o
cp bcheck.o $ROOTFS/lib/tcc/bcheck.o

cp include/float.h $ROOTFS/lib/tcc/include/float.h
cp include/stdalign.h $ROOTFS/lib/tcc/include/stdalign.h
cp include/stdarg.h $ROOTFS/lib/tcc/include/stdarg.h
cp include/stdatomic.h $ROOTFS/lib/tcc/include/stdatomic.h

cp include/stdbool.h $ROOTFS/lib/tcc/include/stdbool.h
cp include/stddef.h $ROOTFS/lib/tcc/include/stddef.h
cp include/stdnoreturn.h $ROOTFS/lib/tcc/include/stdnoreturn.h
cp include/tccdefs.h $ROOTFS/lib/tcc/include/tccdefs.h
cp include/tgmath.h $ROOTFS/lib/tcc/include/tgmath.h
cp include/varargs.h $ROOTFS/lib/tcc/include/varargs.h
cp tcclib.h $ROOTFS/lib/tcc/include/tcclib.h

cp libtcc.a $ROOTFS/lib/libtcc.a
cp libtcc.h $ROOTFS/include/libtcc.h

Get the GNU C library (glibc) and create build and staging directories:

git clone git://sourceware.org/git/glibc.git
mkdir glibc-build /tmp/glibc-staging
cd glibc && git checkout release/2.39/master

glibc has a reputation for being difficult to build from source, but we have relatively few obstacles to deal with to accomplish our purpose

Note of little importance: GNU/Hurd mentioned in INSTALL

Configure glibc to use the standard /usr prefix and build, installing with headers into staging and then the rootfs:

cd ../glibc-build
../glibc/configure --prefix=/usr
DESTDIR=/tmp/glibc-staging make install
DESTDIR=/tmp/glibc-staging make install-headers
cp -r /tmp/glibc-staging/* ../rootfs

At this point, we have a dynamic linker so we don't need to compile busybox as a static binary. We can recompile busybox to use the dynamic linker and our system will continue to work

Because of some configuration quirks, we will need to pass -I/lib/tcc/include -L/lib/tcc as arguments to each invocation of tcc.

To avoid the need to type this every time, we can modify init like so:

#!/bin/ash

# mount <device> <path> -t <filesystem type>

mkdir /proc
mount none /proc -t proc

mkdir /sys
mount none /sys -t sysfs

export CPATH="/lib/tcc/include"
export LIBRARY_PATH="/lib/tcc"

exec ash

At this point, we can compile and run C programs in our minimal Linux distribution.

One fun little feature of tcc is the ability to run C files as a script.

Using a hello world program like this:

$ cat hello.c
#!/bin/tcc -run -I/lib/tcc/include -L/lib/tcc

#include <stdio.h>

int main(void) {
    printf("Hello, world!\n");
    return 0;
}

We can execute as follows:

$ chmod +x hello.c
$ ./hello.c
Hello, world!

Funky C interlude

Is this valid?

int main(void) {
    int;
    ;short
    ;;;;;int;;;
    ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
    ;;;float;
    void;;;;;;
    return 0;
}

What is the meaning of this C statement: 0["Hello"]?

How about: 5["Hello"]?

This course assumes advanced knowledge of the C language and compilation process.

We have a short article that breaks down the four stages of the C source to binary compilation process for anyone who would like a quick refresher.

A quick preprocessor-centric tour of some funky looking kernel code

  1. Definition of functions by macros using token concatenation, such as some of the first couple of macros defined in the arm64-specific atomic.h. Make sure to also be aware of stringification and the fact that adjacent string literals are concatenated by the preprocessor

  2. for_each* macros like for_each_prime_number and list_for_each

  3. A combination of C macros and arm64 assembly macros in the arm64 entry source, where Linux defines the entry points for arm64 syscalls. We will return to this later.


msg = (silence)
whoami = None
singularity v0.4 (staging) https://github.com/underground-software/singularity