Debian Wheezy LXC container not running on Stretch host with Kernel 4.18 bpo anymore (a vsyscall story)

Published on May 13th 2019

While migrating an old host still running with Debian Wheezy (7) and one remaining Linux container (also running with Wheezy), I came across an interesting problem.

The problem

The new host runs with Debian Stretch (9) and a Kernel 4.18 from stretch-backports to better support the AMD Ryzen processor (see article Linux going AMD Ryzen with Debian 9 (Stretch)). After the rootfs of the container was synced from the Wheezy LXC host to the Stretch LXC host, the container didn't start up. Or better said, it started up, but then died a few seconds later:

root@stretch ~ # lxc-start -n wheezycontainer -d
root@stretch ~ # lxc-ls -f|grep wheezycontainer
wheezycontainer RUNNING 1 - 192.168.15.177 -
root@stretch ~ # lxc-ls -f|grep wheezycontainer
wheezycontainer STOPPED 1 - - -

To figure out what's going on, I decided to chroot into the container and see for myself.

root@stretch ~ # mount --bind /proc /var/lib/lxc/wheezycontainer/rootfs/proc
root@stretch ~ # mount --bind /sys /var/lib/lxc/wheezycontainer/rootfs/sys
root@stretch ~ # mount --bind /dev /var/lib/lxc/wheezycontainer/rootfs/dev
root@stretch ~ # chroot /var/lib/lxc/wheezycontainer/rootfs /bin/sh
\[\033[01;31m\]\u\[\033[01;33m\]@\[\033[01;36m\]\h \[\033[01;33m\]\w \[\033[01;35m\]$ \[\033[00m\]
\[\033[01;31m\]\u\[\033[01;33m\]@\[\033[01;36m\]\h \[\033[01;33m\]\w \[\033[01;35m\]$ \[\033[00m\]
\[\033[01;31m\]\u\[\033[01;33m\]@\[\033[01;36m\]\h \[\033[01;33m\]\w \[\033[01;35m\]$ \[\033[00m\]ps auxf
Signal 11 (SEGV) caught by ps (procps-ng version 3.3.3).
ps:display.c:59: please report this bug

Holy sh!t, I did not see that coming! All the commands I tried returned in a segmentation fault:

\[\033[01;31m\]\u\[\033[01;33m\]@\[\033[01;36m\]\h \[\033[01;33m\]\w \[\033[01;35m\]$ \[\033[00m\]top
signal 11 (SEGV) was caught by top, please
see http://www.debian.org/Bugs/Reporting

The reason

A look into dmesg (at least this command works!) revealed something interesting:

\[\033[01;31m\]\u\[\033[01;33m\]@\[\033[01;36m\]\h \[\033[01;33m\]\w \[\033[01;35m\]$ \[\033[00m\]dmesg | tail
[494840.491954] ps[24332] vsyscall attempted with vsyscall=none ip:ffffffffff600400 cs:33 sp:7ffe33393128 ax:ffffffffff600400 si:1000 di:0
[494880.210536] top[24794] vsyscall attempted with vsyscall=none ip:ffffffffff600400 cs:33 sp:7ffc63fe5158 ax:ffffffffff600400 si:2244e90 di:7ffc63fe5178
\[\033[01;31m\]\u\[\033[01;33m\]@\[\033[01;36m\]\h \[\033[01;33m\]\w \[\033[01;35m\]$ \[\033[00m\]

Here we can see that both commands (ps and top) used before attempted to use a vsyscall. What exactly is a vsyscall? On my research I came across a very good explanation in a mailing list post by Nathaniel Smith. Here are some excerpts of it:

The "vsyscall" mechanism is a clever hack/trick that Linux uses to speed up some syscalls (most notably gettimeofday), and involves the kernel injecting some code in processes' memory maps, that glibc then knows to call. [...] the kernel always injected it at the same fixed address, so glibc was hard-coded to just "know" that e.g. gettimeofday was at 0xffffffffff600400. [...]

In these less trusting times, these hard-coded addresses are considered a security risk (they violate ASLR etc.), so they were deprecated a long time ago.

[...] if you try to run an old binary that blindly uses the hardcoded addresses then it will segfault as soon as it tries to call gettimeofday.

Debian has recently flipped the switch to disable this on their kernels, so if you're running a recent Debian testing or unstable (kernel 4.8 or better) [...]

The workaround is to reboot and add the option 'vsyscall=emulate' to the kernel command line.

Another good hint can be found in a public forum post which describes the same dmesg log entries.

I'm running several Debian Stretch servers and this Ryzen machine is the only one using the 4.18 Kernel from backports, this Wheezy container issue only happens on the newer Kernel. But in general that means that in the future (probably with Debian Buster becoming stable) we will see compatibility issues with Wheezy containers.

The Workarounds

So what's the solution? There is no satisfying solution to this. There are workarounds:

As mentioned by Nathaniel Smith above, you can add the option vsyscall=emulate to the Kernel's cmdline (most of the times in /etc/default/grub on Debian systems). But this will allow vsyscalls which violate ASLR. And ASLR (address space layout randomization) is a good mechanism against zero day exploits!
You can still run the Wheezy container as a full virtual machine, as long as you install a Kernel inside the container.
Upgrade or migrate the Wheezy container and the software running on it to a newer Debian. Of course that's wishful thinking and is not always possible (depending on the age of the running software).

Whatever workaround you may chose, you'll have to do more work than you probably expected. ;-)

What about other distributions or Kernel versions?

Does this only happen on a newer Debian system or on others as well? You can check /boot/config-version-arch and grep for VSYSCALL. When CONFIG_LEGACY_SYSCALL_EMULATE is set to "y", then this is the same as adding the option "vsyscall=emulate" to the Kernel's cmdline (see above). When CONFIG_LEGACY_VSYSCALL_NONE is set to "y", this will tell the Kernel to block legacy vsyscalls.

Here's a comparison of the 4.18 Kernel from stretch-backports and the Kernel 4.9 from the original repositories:

root@stretch ~ # cat /boot/config-4.18.0-0.bpo.1-amd64 | grep -i CONFIG_LEGACY_VSYSCALL
# CONFIG_LEGACY_VSYSCALL_EMULATE is not set
CONFIG_LEGACY_VSYSCALL_NONE=y

root@stretch ~ # cat /boot/config-4.9.0-8-amd64 | grep -i CONFIG_LEGACY_VSYSCALL
# CONFIG_LEGACY_VSYSCALL_NATIVE is not set
CONFIG_LEGACY_VSYSCALL_EMULATE=y
# CONFIG_LEGACY_VSYSCALL_NONE is not set

Let's check out some other distributions and Kernel versions:

Distribution / Version	Kernel Version	Legacy vsyscall disabled?
Debian 9 (Stretch)	4.9.x	No
Debian 9 (Stretch)	4.18.x.bpo	Yes!
Debian 10 (Buster) RC1	4.19.0	Yes!
Ubuntu 14.04 (Trusty)	3.19.0	N/A (options not set or not available)
Ubuntu 16.04 (Xenial)	4.4.0	No
Ubuntu 18.04 (Bionic)	4.15.0	No
Ubuntu 19.04 (Disco)	5.0.0	No
CentOS 7	3.10.0	N/A (options not set or not available)
RHEL 7	3.10.0	N/A (options not set or not available)
RHEL 8	4.18.0	No
OpenSuSE Leap 15	4.12.14	No
Fedora 30	5.0.14	No

Wheezy LXC container on Debian Buster

Updated July 2nd, 2021

The same problem also happens on Debian Buster and results in such errors in the host's kernel log:

Jul 2 08:09:51 buster systemd-udevd[4078]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable.
Jul 2 08:09:51 buster systemd-udevd[4078]: Using default interface naming scheme 'v240'.
Jul 2 08:09:51 buster systemd-udevd[4078]: Could not generate persistent MAC address for veth4KV37O: No such file or directory
Jul 2 08:09:51 buster systemd-udevd[4079]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable.
Jul 2 08:09:51 buster systemd-udevd[4079]: Using default interface naming scheme 'v240'.
Jul 2 08:09:51 buster libvirtd[1233]: Failed to open file '/sys/class/net/veth4KV37O/operstate': No such file or directory
Jul 2 08:09:51 buster libvirtd[1233]: unable to read: /sys/class/net/veth4KV37O/operstate: No such file or directory

The solution is the same: Enable legacy vsyscall in the Kernel options:

root@buster ~ # grep "GRUB_CMDLINE_LINUX_DEFAULT" /etc/default/grub
GRUB_CMDLINE_LINUX_DEFAULT="quiet cgroup_enable=memory swapaccount=1 vsyscall=emulate nouveau.modeset=0"

Followed by update-grub and a reboot:

root@buster ~ # update-grub
root@buster ~ # reboot

Once the server is back up, the Kernel command line should contain vsyscall=emulate:

root@buster ~ # cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-5.10.0-0.bpo.7-amd64 root=UUID=f8a58d53-7ca3-42ab-8c51-daa5d27ce7c3 ro quiet cgroup_enable=memory swapaccount=1 vsyscall=emulate nouveau.modeset=0

And the wheezy container can now be started with success:

root@buster ~ # lxc-start -n wheezy -d
root@buster ~ # lxc-ls -f | grep wheezy
wheezy RUNNING 1 - 192.168.15.179 - false

Infiniroot Blog: We sometimes write, too.