Infiniroot Blog: We sometimes write, too.

Of course we cannot always share details about our work with customers, but nevertheless it is nice to show our technical achievements and share some of our implemented solutions.

Keepalived scripts and how to run them as non root user

Published on September 3rd 2020


Keepalived is a software implementation for Linux of the VRRP protocol (usually used in network devices for fail-over IP addresses). We've been using keepalived for many years now and it runs very stable (beware lost VIP on systemd updates!).

But Keepalived does more than just talking to other "members" of that VRRP cluster. It also allows to configure local "tracking scripts" to determine the health of the local node.

Note: There are also other types of scripts, notably "notify scripts" but this article won't cover them.

Tracking scripts or "Am I healthy?"

A script can be defined using the vrrp_script option. There is no limit on how many scripts can be defined, as long they all have a unique name. The following example shows a script called chk_nginx checking for the PID of Nginx:

vrrp_script chk_nginx {
  script "pidof nginx"
  interval 1       # check every second
  weight 3         # add 3 points of prio if OK
}

To explain these options:

To activate this defined script for the VIP, the track_script option needs to be defined inside the vrrp_instance:

vrrp_instance 50 {
  interface ens160
  state MASTER            # MASTER on nginx1, BACKUP on nginx2
  virtual_router_id 50
  priority 101            # 101 on master, 100 on backup
  advert_int 1
  virtual_ipaddress {
    192.168.50.100
  }
  track_script {
    chk_nginx
  }

}

What happens is the following: Keepalived triggers the command pidof nginx every second. If the command was successful (command exit code 0), the weight is added to the local priority. In a VRRP cluster it may be that suddenly Nginx is stopped on the current master node. In this case the chk_nginx script will fail and remove the 3 points from the local weight - the VRRP backup node, where Nginx is still running, now likely has a higher priority and the VRRP failover is initiated.

Running the VRRP scripts as non-root user

Interestingly these vrrp_scripts run (by default) as root user. There are certainly situations when this is required, but simply checking a process, such as Nginx, to be running as root, is not needed and does not make sense from a security point of view:

ck@nginx1:~$ pidof nginx
30991 30990 30989 30988 30987 30986 30985 30984 24805

So why are these vrrp_scripts run as root user by default, when even the keepalived man page says this could be dangerous:

There are significant security implications if scripts are executed with root privileges, especially if the scripts themselves are modifiable or replaceable by a non root user.

Actually the default setting is to not run the scripts as root, as one can read in the man page:

By default the scripts will be executed by user keepalived_script if that user exists, or if not by root, but for each script the user/group under which it is to be executed can be specified.

Fun fact: This user "keepalived_script" is not created by the keepalived packages. At least not in Ubuntu and Debian (both tested as of this writing). Due to this fact, Keepalived is failing back to the root user, as described in the man page.

One possibility would be to create a local user "keepalived_script" - but then all the scripts would run under that user and, depending on the defined commands and their needed permissions, this might break the health check.

Another possibility is to define another default script_user inside the Keepalived global_defs:

global_defs {
  router_id nginx1        # my own hostname
  script_user nagios      # run all scripts as nagios user
}

And yet another possibility is to define a user on a per script basis (as mentioned in the quote above):

vrrp_script chk_nginx {
  script "pidof nginx"
  interval 1       # check every second
  weight 3         # add 3 points of prio if OK
  user nagios      # run this particular script as nagios user
}

In the above example, the existing script "chk_nginx" was modified and a user option was added. This means that for this particular health check, the user nagios will be used. All the other scripts continue to use the default user.

Certainly a much better (secure) approach for triggering vrrp scripts.