Infiniroot Blog: We sometimes write, too.

Of course we cannot always share details about our work with customers, but nevertheless it is nice to show our technical achievements and share some solutions.

Monitoring of Windows services with NSClient and what to do if services should NOT be started

Published on September 18th 2020 - see original post


Windows services can easily be monitored using the NSClient++ monitoring agent. NSClient (server) supports both check_nt and check_nrpe to use active checks on the Windows host.

In this article we describe how to monitor specific service(s) and explain the differences between the supported methods using NSClient as monitoring agent.

Windows services to be monitored using NSClient

Monitoring Windows services using check_nt and SERVICESTATE

The check_nt monitoring plugin (part of the official nagios-plugins and monitoring-plugins packages) connects to the NSClient server, defined in nsclient.ini (here an example):

; Section for NSClient (NSClientServer.dll) (check_nt) protocol options
[/settings/NSClient/server]
; ENABLE SSL ENCRYPTION - This option controls if SSL should be enabled
use ssl = 0
; PERFORMANCE DATA - Send performance data back to Nagios (set this to 0 to remove all performance data)
performance data = 1
; PORT NUMBER - Port to use for check_nt.
port = 12489

NSClient server runs on port tcp/12489 in this case and is waiting for certain commands using check_nt (see ./check_nt --help for a list). To monitor a service on Windows, the relevant command/variable is -v SERVICESTATE, followed by the service name to be checked using -l service.

In the following example a single service "WORKSPACE-JBASSVC" is checked:

ck@monitoring:~$ /usr/lib/nagios/plugins/check_nt -H windows1 -v SERVICESTATE -l WORKSPACE-JBASSVC
 OK: All 1 service(s) are ok.

NSClient running on windows1 executed the local check, confirming that this service is indeed running. On another host (windows2), where this service isn't running, the check will return CRITICAL:

ck@monitoring:~$ /usr/lib/nagios/plugins/check_nt -H windows2 -v SERVICESTATE -l WORKSPACE-JBASSVC
 CRITICAL: WORKSPACE-JBASSVC: Stopped, delayed ()

If a service doesn't exist on the target Windows, the check will inform accordingly:

ck@monitoring:~$ /usr/lib/nagios/plugins/check_nt -H windows1 -v SERVICESTATE -l SOMETHING
 Failed to open service SOMETHING: 424: The specified service does not exist as an installed service.

Multiple services can be checked by using multiple services in a comma-separated list -l service1,service2,...:

ck@inf-monm01-p:~$ /usr/lib/nagios/plugins/check_nt -H windows1 -v SERVICESTATE -l nscp,server,WORKSPACE-JBASSVC
 OK: All 3 service(s) are ok.

The advantage using this method with check_nt and SERVICESTATE is: It's very easy and quick to understand and implement.
However there are limitations using this method: Services can only be checked whether they're running or not. And the check assumes all the given services must be running.

This is where the NRPE method comes into play.

Monitoring Windows services using check_nrpe and check_service

NSClient also "embeds" a NRPE server which can be enabled in nsclient.ini:

; Section for NRPE (NRPEServer.dll) (check_nrpe) protocol options.
[/settings/NRPE/server]
; ENABLE SSL ENCRYPTION - This option controls if SSL should be enabled
use ssl = 1
; ALLOW INSECURE CIPHERS AND ENCRYPTION - Only enable this if you are using legacy check_nrpe client.
insecure = 1
; COMMAND ALLOW NASTY META CHARS - This option determines whether or not that we will allow clients to specify nasty (as in |`&><'"\[]{}) characters in arguments.
allow nasty characters = 1
; COMMAND ARGUMENT PROCESSING - This option determines whether or not that we will allow clients to specify arguments to commands that are executed.
allow arguments = 1
; PORT NUMBER - Port to use for NRPE.
port = 5666
; EXTENDED RESPONSE - Send more than 1 return packet to allow response to go beyond payload (requires modified client, if legacy is true this defaults to false).
extended response = 1

But NSClient didn't simply add a NRPE server/listener; it added a lot of pre-defined and very sophisticated check commands. The relevant NRPE check command to monitor Windows services is check_service. This command is loaded with the CheckSystem module. To make sure this module is loaded, verify that it is added in nsclient.ini:

; Modules
[/modules]
CheckExternalScripts = 1
CheckHelpers = 1
CheckNSCP = 1
CheckDisk = 1
CheckSystem = 1
CheckWMI = 1
NSClientServer = 1
CheckEventLog = 1
NSCAClient = 1
NRPEServer = 1
CheckLogFile = 1
SimpleFileWriter = 1
SimpleCache = 1

Just as in the Linux world (using nagios-nrpe-server on Debian and derivates or nrpe on RHEL/CentOS), a check command can work with additional parameters (arguments). For most checks this is required - and is quite essential for a deeper monitoring. This means that allow_arguments should be set to 1 in nsclient.ini.

A basic check of a Windows service using check_service simply verifies that the given service (service=name) is running:

ck@monitoring:~$ /usr/lib/nagios/plugins/check_nrpe -H windows1 -c check_service -a "service=WORKSPACE-JBASSVC"
OK: All 1 service(s) are ok.|'WORKSPACE-JBASSVC'=4;0;0

However there is a problem: If the service is NOT running (but should), the check will not alert:

ck@monitoring:~$ /usr/lib/nagios/plugins/check_nrpe -H windows2 -c check_service -a "service=WORKSPACE-JBASSVC"
OK: All 1 service(s) are ok.|'WORKSPACE-JBASSVC'=1;0;0

The only difference can be seen in a different performance value (1 instead of 4). But even though the service is not running on windows2, the check resulted in OK - which is not good!

To handle this properly, a filter must be added. The filters tell NSClient what exactly has to be considered as OK, WARNING or CRITICAL. This is clearly more complex than the check_nt/SERVICESTATE method, but allows fine tuning of the service check:

ck@monitoring:~$ /usr/lib/nagios/plugins/check_nrpe -H windows2 -c check_service -a "service=WORKSPACE-JBASSVC" "ok=state='started'" "warning=not state='started'" "critical=not state='started'"
CRITICAL: WORKSPACE-JBASSVC=stopped (delayed), delayed ()|'WORKSPACE-JBASSVC'=1;0;0

Using these filters the criteria can be reversed, too.

Monitor a Windows service and alert if it is running

So what if a certain service should be stopped and monitoring should only alert if this service was started? A practical example for this scenario is a disaster recovery server or environment or to avoid multiple Windows servers using the same software license.

In this case the filter criteria can be reversed: Alert if state=started - OK if not started:

ck@monitoring:~$ /usr/lib/nagios/plugins/check_nrpe -H windows2 -c check_service -a "service=WORKSPACE-JBASSVC" "ok=state='stopped'" "warning=not state='stopped'" "critical=not state='stopped'" 
OK: All 1 service(s) are ok.|'WORKSPACE-JBASSVC'=1;0;0

So check_nt or check_nrpe?

For a simple check whether or not a service is running and the service should be running, check_nt with the SERVICESTATE variable is much easier and does the job quickly.

But for fine tuning and reversed (service should NOT be running) checks, check_nrpe with the check_service command needs to be used.

Note: There are also other possibilities how to monitor Windows services, but this article describes how to achieve this with NSClient as monitoring agent.