Of course we cannot always share details about our work with customers, but nevertheless it is nice to show our achievements and share some solutions.
Our monitoring informed me about a HTTP 500 error from a central reverse proxy running with Nginx. Checking the error logs revealed the following issue:
2019/05/09 08:43:35 [crit] 25655#0: *524505514 open() "/usr/share/nginx/html/50x.html" failed (24: Too many open files)
2019/05/09 09:04:27 [alert] 28720#0: *59757 socket() failed (24: Too many open files) while connecting to upstream,
This basically means that the Nginx process had too many files open, which could also be checked on the Nginx status page. Here the graph from check_nginx_status.pl:
The default is set to a limit of 4096 files per (worker) process, which can be seen in /etc/default/nginx:
# cat /etc/default/nginx
# Note: You may want to look at the following page before setting the ULIMIT.
# Set the ulimit variable if you need defaults to change.
# Example: ULIMIT="-n 4096"
However don't be fooled. Changing this file doesn't help. Instead this needs to be set in /etc/security/limits.conf:
# tail /etc/security/limits.conf
#@faculty hard nproc 50
#ftp hard nproc 0
#ftp - chroot /ftp
#@student - maxlogins 4
# Added Nginx limits
nginx soft nofile 30000
nginx hard nofile 50000
# End of file
Here a soft limit of 30k and a hard limit of 50k files are defined per nginx process.
Note: I tried this here with www-data first (the user under which Nginx runs), but this didn't work. Although a user name could be used as a "domain" in this config file...
Additionally Nginx should be told how many files can be opened. In the main config file /etc/nginx/nginx.conf add:
# head /etc/nginx/nginx.conf
# 2019-05-09 Increase open files
After a service nginx restart the limits of the worker processes can be checked:
# ps auxf | grep nginx
root 7027 0.0 0.3 103620 13348 ? Ss 09:21 0:00 nginx: master process /usr/sbin/nginx
www-data 7028 8.6 1.0 127900 40724 ? R 09:21 2:37 \_ nginx: worker process
www-data 7029 8.9 1.0 127488 40536 ? S 09:21 2:44 \_ nginx: worker process
www-data 7031 9.5 1.0 127792 40896 ? S 09:21 2:53 \_ nginx: worker process
www-data 7032 8.1 1.0 128472 41244 ? S 09:21 2:29 \_ nginx: worker process
# cat /proc/7028/limits | grep "open files"
Max open files 30000 30000 files
The "too many open files" errors disappeared from the Nginx logs after this change.
But what did cause this sudden problem? As you can see in the graph above this Writing (and Waiting) connections suddenly sharply increased. It turned out that an upstream server behind this reverse proxy did not work anymore and this particular virtual host received a lot of traffic, causing general slowness and holding files open while waiting for a timeout from Nginx (504 in this case).