We sometimes write.

Of course we cannot always share details about our work with customers, but nevertheless it is nice to show our achievements and share some solutions.

Nginx socket() failed (24: Too many open files)

Published on May 9th 2019 - see original post

Our monitoring informed me about a HTTP 500 error from a central reverse proxy running with Nginx. Checking the error logs revealed the following issue:

2019/05/09 08:43:35 [crit] 25655#0: *524505514 open() "/usr/share/nginx/html/50x.html" failed (24: Too many open files)
2019/05/09 09:04:27 [alert] 28720#0: *59757 socket() failed (24: Too many open files) while connecting to upstream,

This basically means that the Nginx process had too many files open, which could also be checked on the Nginx status page. Here the graph from check_nginx_status.pl:

Nginx stats

The default is set to a limit of 4096 files per (worker) process, which can be seen in /etc/default/nginx:

# cat /etc/default/nginx
# Note: You may want to look at the following page before setting the ULIMIT.
#  http://wiki.nginx.org/CoreModule#worker_rlimit_nofile
# Set the ulimit variable if you need defaults to change.
#  Example: ULIMIT="-n 4096"
#ULIMIT="-n 4096"

However don't be fooled. Changing this file doesn't help. Instead this needs to be set in /etc/security/limits.conf:

# tail /etc/security/limits.conf
#@faculty        hard    nproc           50
#ftp             hard    nproc           0
#ftp             -       chroot          /ftp
#@student        -       maxlogins       4

# Added Nginx limits
nginx       soft    nofile  30000
nginx       hard    nofile  50000

# End of file

Here a soft limit of 30k and a hard limit of 50k files are defined per nginx process.

Note: I tried this here with www-data first (the user under which Nginx runs), but this didn't work. Although a user name could be used as a "domain" in this config file...

Additionally Nginx should be told how many files can be opened. In the main config file /etc/nginx/nginx.conf add:

# head /etc/nginx/nginx.conf
user www-data;
worker_processes 4;
pid /run/nginx.pid;

# 2019-05-09 Increase open files
worker_rlimit_nofile 30000;

After a service nginx restart the limits of the worker processes can be checked:

 # ps auxf | grep nginx
root      7027  0.0  0.3 103620 13348 ?        Ss   09:21   0:00 nginx: master process /usr/sbin/nginx
www-data  7028  8.6  1.0 127900 40724 ?        R    09:21   2:37  \_ nginx: worker process
www-data  7029  8.9  1.0 127488 40536 ?        S    09:21   2:44  \_ nginx: worker process
www-data  7031  9.5  1.0 127792 40896 ?        S    09:21   2:53  \_ nginx: worker process
www-data  7032  8.1  1.0 128472 41244 ?        S    09:21   2:29  \_ nginx: worker process

# cat /proc/7028/limits | grep "open files"
Max open files            30000                30000                files     

The "too many open files" errors disappeared from the Nginx logs after this change.

But what did cause this sudden problem? As you can see in the graph above this Writing (and Waiting) connections suddenly sharply increased. It turned out that an upstream server behind this reverse proxy did not work anymore and this particular virtual host received a lot of traffic, causing general slowness and holding files open while waiting for a timeout from Nginx (504 in this case).