We sometimes write.

Of course we cannot always share details about our work with customers, but nevertheless it is nice to show our achievements and share some solutions.

Debian Wheezy LXC container not running on Stretch host with Kernel 4.18 bpo anymore (a vsyscall story)

Published on May 13th 2019 - see original post


While migrating an old host still running with Debian Wheezy (7) and one remaining Linux container (also running with Wheezy), I came across an interesting problem.

The problem

The new host runs with Debian Stretch (9) and a Kernel 4.18 from stretch-backports to better support the AMD Ryzen processor (see article Linux going AMD Ryzen with Debian 9 (Stretch)). After the rootfs of the container was synced from the Wheezy LXC host to the Stretch LXC host, the container didn't start up. Or better said, it started up, but then died a few seconds later:

root@stretch ~ # lxc-start -n wheezycontainer -d
root@stretch ~ # lxc-ls -f|grep wheezycontainer
wheezycontainer  RUNNING 1         -      192.168.15.177 -   
root@stretch ~ # lxc-ls -f|grep wheezycontainer
wheezycontainer  STOPPED 1         -      -              -   

To figure out what's going on, I decided to chroot into the container and see for myself.

root@stretch ~ # mount --bind /proc /var/lib/lxc/wheezycontainer/rootfs/proc
root@stretch ~ # mount --bind /sys /var/lib/lxc/wheezycontainer/rootfs/sys
root@stretch ~ # mount --bind /dev /var/lib/lxc/wheezycontainer/rootfs/dev
root@stretch ~ # chroot /var/lib/lxc/wheezycontainer/rootfs /bin/sh
\[\033[01;31m\]\u\[\033[01;33m\]@\[\033[01;36m\]\h \[\033[01;33m\]\w \[\033[01;35m\]$ \[\033[00m\]
\[\033[01;31m\]\u\[\033[01;33m\]@\[\033[01;36m\]\h \[\033[01;33m\]\w \[\033[01;35m\]$ \[\033[00m\]
\[\033[01;31m\]\u\[\033[01;33m\]@\[\033[01;36m\]\h \[\033[01;33m\]\w \[\033[01;35m\]$ \[\033[00m\]ps auxf
Signal 11 (SEGV) caught by ps (procps-ng version 3.3.3).
ps:display.c:59: please report this bug

Holy sh!t, I did not see that coming! All the commands I tried returned in a segmentation fault:

\[\033[01;31m\]\u\[\033[01;33m\]@\[\033[01;36m\]\h \[\033[01;33m\]\w \[\033[01;35m\]$ \[\033[00m\]top
signal 11 (SEGV) was caught by top, please
see http://www.debian.org/Bugs/Reporting

The reason

A look into dmesg (at least this command works!) revealed something interesting:

 \[\033[01;31m\]\u\[\033[01;33m\]@\[\033[01;36m\]\h \[\033[01;33m\]\w \[\033[01;35m\]$ \[\033[00m\]dmesg | tail
[494840.491954] ps[24332] vsyscall attempted with vsyscall=none ip:ffffffffff600400 cs:33 sp:7ffe33393128 ax:ffffffffff600400 si:1000 di:0
[494880.210536] top[24794] vsyscall attempted with vsyscall=none ip:ffffffffff600400 cs:33 sp:7ffc63fe5158 ax:ffffffffff600400 si:2244e90 di:7ffc63fe5178
\[\033[01;31m\]\u\[\033[01;33m\]@\[\033[01;36m\]\h \[\033[01;33m\]\w \[\033[01;35m\]$ \[\033[00m\]

Here we can see that both commands (ps and top) used before attempted to use a vsyscall. What exactly is a vsyscall? On my research I came across a very good explanation in a mailing list post by Nathaniel Smith. Here are some excerpts of it:

The "vsyscall" mechanism is a clever hack/trick that Linux uses to speed up some syscalls (most notably gettimeofday), and involves the kernel injecting some code in processes' memory maps, that glibc then knows to call. [...] the kernel always injected it at the same fixed address, so glibc was hard-coded to just "know" that e.g. gettimeofday was at 0xffffffffff600400. [...]

In these less trusting times, these hard-coded addresses are considered a security risk (they violate ASLR etc.), so they were deprecated a long time ago.

[...] if you try to run an old binary that blindly uses the hardcoded addresses then it will segfault as soon as it tries to call gettimeofday.

Debian has recently flipped the switch to disable this on their kernels, so if you're running a recent Debian testing or unstable (kernel 4.8 or better) [...]

The workaround is to reboot and add the option 'vsyscall=emulate' to the kernel command line.

Another good hint can be found in a public forum post which describes the same dmesg log entries.

I'm running several Debian Stretch servers and this Ryzen machine is the only one using the 4.18 Kernel from backports, this Wheezy container issue only happens on the newer Kernel. But in general that means that in the future (probably with Debian Buster becoming stable) we will see compatibility issues with Wheezy containers.

The Workarounds

So what's the solution? There is no satisfying solution to this. There are workarounds:

Whatever workaround you may chose, you'll have to do more work than you probably expected. ;-)

What about other distributions or Kernel versions?

Does this only happen on a newer Debian system or on others as well? You can check /boot/config-version-arch and grep for VSYSCALL. When CONFIG_LEGACY_SYSCALL_EMULATE is set to "y", then this is the same as adding the option "vsyscall=emulate" to the Kernel's cmdline (see above). When CONFIG_LEGACY_VSYSCALL_NONE is set to "y", this will tell the Kernel to block legacy vsyscalls.

Here's a comparison of the 4.18 Kernel from stretch-backports and the Kernel 4.9 from the original repositories:

root@stretch ~ # cat /boot/config-4.18.0-0.bpo.1-amd64  | grep -i CONFIG_LEGACY_VSYSCALL
# CONFIG_LEGACY_VSYSCALL_EMULATE is not set
CONFIG_LEGACY_VSYSCALL_NONE=y

root@stretch ~ # cat /boot/config-4.9.0-8-amd64  | grep -i CONFIG_LEGACY_VSYSCALL
# CONFIG_LEGACY_VSYSCALL_NATIVE is not set
CONFIG_LEGACY_VSYSCALL_EMULATE=y
# CONFIG_LEGACY_VSYSCALL_NONE is not set

Let's check out some other distributions and Kernel versions:

Distribution / Version
 Kernel Version 
 Legacy vsyscall disabled?
Debian 9 (Stretch)
 4.9.x  No
Debian 9 (Stretch)
 4.18.x.bpo  Yes!
Debian 10 (Buster) RC1
 4.19.0
 Yes!
Ubuntu 14.04 (Trusty)
 3.19.0  N/A (options not set or not available)
Ubuntu 16.04 (Xenial)
 4.4.0  No
Ubuntu 18.04 (Bionic)
 4.15.0  No
Ubuntu 19.04 (Disco)
 5.0.0
 No
CentOS 7
 3.10.0  N/A (options not set or not available)
RHEL 7
 3.10.0  N/A (options not set or not available)
RHEL 8
 4.18.0
 No
OpenSuSE Leap 15
 4.12.14
 No
Fedora 30
 5.0.14
 No