Another Japan in the World

Jun Aruga's blog.

qemu-user-static - part 2 register

qemu-user-static enables us to run multi architecture environment such as ARM on x86_64 like this.

$ docker run --rm --privileged multiarch/qemu-user-static:register --reset

$ docker run --rm -t multiarch/ubuntu-debootstrap:arm64-bionic uname -a
Linux 28c784e9c7bc 4.4.0-101-generic #124~14.04.1-Ubuntu SMP Fri Nov 10 19:05:36 UTC 2017 aarch64 aarch64 aarch64 GNU/Linux

On the part 1, we leaned the qemu-user-static is to create the small container including QEMU binary (qemu-foo-static) or the tar.gz file then upload it to the DockerHub container registry.

The part 2, we dig how the register container works reading the source code.

What does the register work?

First, register/Dockerfile file on [1] is the top level file. busybox is small Linux distribution.

FROM busybox
ENV QEMU_BIN_DIR=/usr/bin
ADD ./register.sh /register
ADD https://raw.githubusercontent.com/qemu/qemu/master/scripts/qemu-binfmt-conf.sh /qemu-binfmt-conf.sh
RUN chmod +x /qemu-binfmt-conf.sh
ENTRYPOINT ["/register"]

What is https://raw.githubusercontent.com/qemu/qemu/master/scripts/qemu-binfmt-conf.sh /qemu-binfmt-conf.sh?

Download it to check it.

$ wget https://raw.githubusercontent.com/qemu/qemu/master/scripts/qemu-binfmt-conf.sh /qemu-binfmt-conf.sh

It is a script managed at QEMU GitHub repository [2] https://github.com/qemu/qemu/blob/master/scripts/qemu-binfmt-conf.sh

Next ENTRYPOINT ["/register"] is called as the entry point. In above case, register.sh --reset is called.

register/register.sh

The steps in the file are

  1. Check if /proc/sys/fs/binfmt_misc directory exists
  2. mount binfmt_misc -t binfmt_misc /proc/sys/fs/binfmt_misc if /proc/sys/fs/binfmt_misc/register files does not exist
  3. find /proc/sys/fs/binfmt_misc -type f -name 'qemu-*' -exec sh -c 'echo -1 > {}' \; if --reset option is specified.
  4. exec /qemu-binfmt-conf.sh --qemu-suffix "-static" --qemu-path="${QEMU_BIN_DIR}" $@

Check if /proc/sys/fs/binfmt_misc directory exists

When I checked /proc/sys/fs/binfmt_misc/ directory on my host os (Fedora), below is the result.

$ ls -l /proc/sys/fs/binfmt_misc/
total 0
-rw-r--r-- 1 root root 0 Apr  7 14:55 qemu-aarch64
-rw-r--r-- 1 root root 0 Apr  7 14:55 qemu-aarch64_be
...
-rw-r--r-- 1 root root 0 Apr  7 14:55 qemu-xtensaeb
--w------- 1 root root 0 Apr  7 14:55 register
-rw-r--r-- 1 root root 0 Apr  7 14:55 status

So, step 2. mount binfmt_misc and 3. find .. is to create zero byte files /proc/sys/fs/binfmt_misc/qemu-*.

What is "binfmt_misc"? I found an article [3] [4]

According to [3]

The binfmt-support package contains a helper script to easily register/unregister binary formats with the kernel using the binfmt_misc module.

Install qemu, binfmt-support, and qemu-user-static:

apt-get install qemu binfmt-support qemu-user-static

There is a sub deb package of qemu deb package. https://packages.debian.org/sid/qemu-user-static

According to [4]

5.3.9.2. /proc/sys/fs/ This directory contains an array of options and information concerning various aspects of the file system, including quota, file handle, inode, and dentry information. The binfmt_misc/ directory is used to provide kernel support for miscellaneous binary formats.

exec /qemu-binfmt-conf.sh --qemu-suffix "-static" --qemu-path="${QEMU_BIN_DIR}" $@

Below is the actual command to be executed.

./qemu-binfmt-conf.sh --qemu-suffix "-static" --qemu-path="/usr/bin" --reset

Build the register container by myself to check the qemu-binfmt-conf.sh's behavior.

qemu-binfmt-conf.sh was downloaded above.

Add debug set -x.

$ vi register/qemu-binfmt-conf.sh
...
set -x
...

$ vi register/register.sh

$ git diff register/register.sh
 
set -x

QEMU_BIN_DIR=${QEMU_BIN_DIR:-/usr/bin}

$ vi register/Dockerfile
...
# ADD https://raw.githubusercontent.com/qemu/qemu/master/scripts/qemu-binfmt-conf.sh /qemu-binfmt-conf.sh <= Comment out this line.
ADD ./qemu-binfmt-conf.sh /qemu-binfmt-conf.sh <= Add this line.
RUN chmod +x /qemu-binfmt-conf.sh
ENTRYPOINT ["/register"]

$ docker build --rm -t junaruga/qemu-user-static:register register
...
Successfully tagged junaruga/qemu-user-static:register

$ docker image ls -a  | grep junaruga/qemu-user-static
junaruga/qemu-user-static    register              50384f9a6262        9 seconds ago       1.23MB

$ docker run --rm --privileged  junaruga/qemu-user-static:register --reset

Here is the output log.

Seeing qemu-binfmt-conf.sh buttom lines, the 2 main steps (functions) are below.

qemu_check_bintfmt_misc
qemu_set_binfmts

qemu_check_bintfmt_misc is to check valid system system status.

qemu_check_bintfmt_misc() {
    # load the binfmt_misc module
    if [ ! -d /proc/sys/fs/binfmt_misc ]; then
      if ! /sbin/modprobe binfmt_misc ; then
          exit 1
      fi
    fi
    if [ ! -f /proc/sys/fs/binfmt_misc/register ]; then
      if ! mount binfmt_misc -t binfmt_misc /proc/sys/fs/binfmt_misc ; then
          exit 1
      fi
    fi

    qemu_check_access /proc/sys/fs/binfmt_misc/register
}

qemu_target_list is for ${qemu_target_list}, run $BINFMT_SET (= qemu_register_interpreter function).

qemu_target_list="i386 i486 alpha arm armeb sparc32plus ppc ppc64 ppc64le m68k \
mips mipsel mipsn32 mipsn32el mips64 mips64el \
sh4 sh4eb s390x aarch64 aarch64_be hppa riscv32 riscv64 xtensa xtensaeb \
microblaze microblazeel or1k x86_64"
qemu_set_binfmts() {
    # probe cpu type
    host_family=$(qemu_get_family)

    # register the interpreter for each cpu except for the native one

    for cpu in ${qemu_target_list} ; do
        magic=$(eval echo \$${cpu}_magic)
        mask=$(eval echo \$${cpu}_mask)
        family=$(eval echo \$${cpu}_family)

        if [ "$magic" = "" ] || [ "$mask" = "" ] || [ "$family" = "" ] ; then
            echo "INTERNAL ERROR: unknown cpu $cpu" 1>&2
            continue
        fi

        qemu="$QEMU_PATH/qemu-$cpu"
        if [ "$cpu" = "i486" ] ; then
            qemu="$QEMU_PATH/qemu-i386"
        fi

        qemu="$qemu$QEMU_SUFFIX"
        if [ "$host_family" != "$family" ] ; then
            $BINFMT_SET
        fi
    done
}

qemu_register_interpreter registers a content to /proc/sys/fs/binfmt_misc/register procfs file. When a stdout content redirected to the file /proc/sys/fs/binfmt_misc/register. The bahabior of > is not like normal file.

qemu_register_interpreter() {
    echo "Setting $qemu as binfmt interpreter for $cpu"
    qemu_generate_register > /proc/sys/fs/binfmt_misc/register
}
qemu_generate_register() {
    flags=""
    if [ "$CREDENTIAL" = "yes" ] ; then
        flags="OC"
    fi
    if [ "$PERSISTENT" = "yes" ] ; then
        flags="${flags}F"
    fi

    echo ":qemu-$cpu:M::$magic:$mask:$qemu:$flags"
}

https://en.wikipedia.org/wiki/Binfmt_misc

The executable formats are registered through the special purpose file system binfmt_misc file-system interface (usually mounted under part of /proc). This is either done directly by sending special sequences to the register procfs file or using a wrapper like Debian-based distributions binfmt-support package[3] or systemd's systemd-binfmt.service[4][5].

Below code for each target cpu is what the register is doing.

echo ":qemu-$cpu:M::$magic:$mask:$qemu:$flags" > /proc/sys/fs/binfmt_misc/register

On next blog, I dig the compatible images. [5]

References