Infiniroot Blog: We sometimes write, too.

Of course we cannot always share details about our work with customers, but nevertheless it is nice to show our technical achievements and share some of our implemented solutions.

Nginx 1.14 on Debian 10 (Buster) filling error logs with nchan_store_init_worker assertion failed

Published on November 13th 2020


Let's start this post with a love declaration: We love Linux packages. Whether these are Debian based deb packages, rpm's or other formats, installation of additional software is easy and fast. With the help of package managers such as apt, yum, zypper and others, it's even easier and dependencies are installed with the package. Great.

Some of the packages are very easy to maintain. They contain a single binary, maybe a few man pages and maybe a configuration file. That's it. As soon as upstream changes something in the source code, the maintainer can build a new package.

But there are packages with a much higher complexity around. Apache and Nginx are such packages. They include multiple modules where each module is itself its own open source project. These "sub-projects" may change fast and dependencies on core versions might change, too. To oversee all this requires a lot of dedication and time. That's not easy - especially in the open source community where a lot of work is done benevolent.

In this particular example we stumbled on a problem between Debian's Nginx 1.14 package and the embedded nchan module - and dedicated a blog post for it. Important: This is not a blame post! We want to show technical reasons why this problem occurred and present a workaround to handle the problem.

Nginx going crazy: Thousands of error log events

Our monitoring alerted us, that a container running Debian Buster (the current Debian Stable as of this writing) was going crazy and disk space quickly started to fill up:

Nginx error logs filling up disk quickly

After a quick verification it turned out that the Nginx error log was logging thousands of entries:

2020/11/13 13:45:10 [alert] 8368#8368: worker process 21797 exited on signal 6
2020/11/13 13:45:10 [alert] 8368#8368: shared memory zone "memstore" was locked by 21797
ter process /usr/sbin/nginx -g daemon on; master_process on;: /build/nginx-Cjs4TR/nginx-1.14.2/debian/modules/nchan/src/store/memory/memstore.c:701: nchan_store_init_worker: Assertion `procslot_found == 1' failed.
2020/11/13 13:45:10 [alert] 8368#8368: worker process 21798 exited on signal 6
2020/11/13 13:45:10 [alert] 8368#8368: shared memory zone "memstore" was locked by 21798
ter process /usr/sbin/nginx -g daemon on; master_process on;: /build/nginx-Cjs4TR/nginx-1.14.2/debian/modules/nchan/src/store/memory/memstore.c:701: nchan_store_init_worker: Assertion `procslot_found == 1' failed.
2020/11/13 13:45:10 [alert] 8368#8368: worker process 21799 exited on signal 6
2020/11/13 13:45:10 [alert] 8368#8368: shared memory zone "memstore" was locked by 21799
ter process /usr/sbin/nginx -g daemon on; master_process on;: /build/nginx-Cjs4TR/nginx-1.14.2/debian/modules/nchan/src/store/memory/memstore.c:701: nchan_store_init_worker: Assertion `procslot_found == 1' failed.
2020/11/13 13:45:10 [alert] 8368#8368: worker process 21800 exited on signal 6
2020/11/13 13:45:10 [alert] 8368#8368: shared memory zone "memstore" was locked by 21800
ter process /usr/sbin/nginx -g daemon on; master_process on;: /build/nginx-Cjs4TR/nginx-1.14.2/debian/modules/nchan/src/store/memory/memstore.c:701: nchan_store_init_worker: Assertion `procslot_found == 1' failed.
2020/11/13 13:45:10 [alert] 8368#8368: worker process 21801 exited on signal 6
2020/11/13 13:45:10 [alert] 8368#8368: shared memory zone "memstore" was locked by 21801
ter process /usr/sbin/nginx -g daemon on; master_process on;: /build/nginx-Cjs4TR/nginx-1.14.2/debian/modules/nchan/src/store/memory/memstore.c:701: nchan_store_init_worker: Assertion `procslot_found == 1' failed.

The same errors were logged over and over again, just with different worker PIDs. According to these logged events, the errors seem to come from the nchan module. As the disk was filling up very quickly, we needed to find a way to solve this - fast.

The quick workaround: Unload nchan and restart Nginx

Research led to issue #446 on the nchan repository. The relevant hint came from the nchan maintainer (slact):

The Debian repo is very far behind on Nchan updates, I think something like this issue was fixed over a year ago. If you don't need to use Nchan, just unload it.

As we don't use any of the nchan features in this particular setup, we decided to just do this as suggested: Unload the (dynamic) module and restart Nginx. This is quickly done:

root@nginx:~# ll /etc/nginx/modules-enabled/|grep nchan
lrwxrwxrwx 1 root root 49 Apr 20  2020 50-mod-nchan.conf -> /usr/share/nginx/modules-available/mod-nchan.conf

root@nginx:~# rm /etc/nginx/modules-enabled/50-mod-nchan.conf
rm: remove symbolic link '/etc/nginx/modules-enabled/50-mod-nchan.conf'? y

root@nginx:~# systemctl restart nginx

No more errors were logged after this.

The why: Research why this happened

Based on slact's comment, the obvious breadcrumb is to find the version of nchan itself. But as nchan is bundled with the Nginx package, dpkg is not helpful and only shows the version of the main Nginx package:

root@nginx:~# dpkg -l|grep nchan
ii  libnginx-mod-nchan    1.14.2-2+deb10u3     amd64    Fast, flexible pub/sub server for Nginx

To find out which module version was used, the debian build package of the Nginx package needs to be downloaded and unpacked:

ckadm@mintp ~/build $ wget http://deb.debian.org/debian/pool/main/n/nginx/nginx_1.14.2-2+deb10u3.debian.tar.xz
ckadm@mintp ~/build $ xz -d nginx_1.14.2-2+deb10u3.debian.tar.xz
ckadm@mintp ~/build $ tar -xf nginx_1.14.2-2+deb10u3.debian.tar

The nchan module can be found in the extracted directory debian/modules/nchan. Inside this directory is not only the module's source code but also the changelog. Let's take a closer look at this:

ckadm@mintp ~/build $ head debian/modules/nchan/changelog.txt
1.0.8 (Nov. 28 2016)
 fix: possible crash under severely heavy load, introduced in 1.0.7 with stack-overflow fix
1.0.7 (Nov. 27 2016)
 fix: memory leak after websocket publisher uncleanly aborts connection
 fix: misbehaving websocket publisher with nchan_publisher_upstream_request
 fix: potential stack overflow with very large message buffers
 fix: invalid memory access with empty nchan_publisher_upstream_request for websocket publisher
 fix: incorrect handling of chunked response from nchan_publisher_upstream_request
 fix: publishing through websocket too fast may result in buffered messages that never arrive
 fix: DELETE to multiplexed channel should delete all listed channels

So this shows that nchan version 1.0.8 (from 2016) was used to build this Nginx 1.14.2 package. However there is one big problem with this version combination. On GitHub issue #460 the module maintainer mentions the following:

The Nginx internals changed around a little in 1.13.10: http://hg.nginx.org/nginx/rev/20f139e9ffa8

And also the nchan changelog mentions a couple of compatibility problems with Nginx > 1.13.10:

1.2.1 (Aug. 2 2018)
 fix: authentication failure body not forwarded for Nginx > 1.13.10

1.2.0 (Jul. 23 2018)
 fix: Publisher upstream compatibility for Nginx > 1.13.10

Conclusion

This means that any Nginx 1.13 and above should use nchan module with at least version 1.2.1 or newer. And this unfortunately also means that the current Debian Stable (Debian Buster) is still shipping an incompatible nchan module within the Nginx package.

For the upcoming Debian release (static codename: sid) however, this problem seems to be solved. According to Debian's Nginx repository on their GitLab, nchan was upgraded to 1.2.7 (see commit). However as of today, the Nginx package in the current Debian Stable is still bundled with an incompatible nchan module version.

If you are using Debian Buster and need Nginx with the nchan module, you should (re-)build the deb package yourself.