Infiniroot Blog: We sometimes write, too.

Of course we cannot always share details about our work with customers, but nevertheless it is nice to show our technical achievements and share some of our implemented solutions.

How to monitor MySQL or MariaDB Galera Cluster writes and avoid deadlocks

Published on May 24th 2019


check_mysql only checks read operation

To use a simple monitoring of a MySQL server (whether this is MySQL itself, MariaDB, Percona or other fancy forks), the standard plugin is - obviously - check_mysql, which is part of the monitoring-plugins package. However check_mysql only makes a read query as it retrieves some basic statistics from the MySQL server.

To monitor write operations as well (e.g. to catch a full file system) I've been using a simple shell script in the past few years: check_mysql_write.sh .

check_mysql_write

This shell script (from now on mentioned as plugin) is executed from the monitoring server and establishes a remote connection to the given database server:

$ ./check_mysql_write.sh -H dbhost -P port -u dbuser -p dbpass -d database

Note: Of course this can also be executed locally on the DB server.

The plugin relies on a table monitoring in a dedicated database. Here are the statements to prepare this monitoring table:

MariaDB [(none)]> CREATE TABLE mymonitoring.monitoring ( id INT(1), mytime INT(13) );
MariaDB [(none)]> INSERT INTO mymonitoring.monitoring (id, mytime) VALUES (1, 1421421409);

The plugin check_mysql_write would then update the table with the current timestamp, with the following UPDATE query:

UPDATE monitoring SET mytime=$curtime WHERE id=1

This results in the following table with content:

MariaDB [mymonitoring]> select * from monitoring;
+------+------------+
| id   | mytime     |
+------+------------+
|    1 | 1558676432 |
+------+------------+
1 row in set (0.00 sec)

Unless there was an error during the UPDATE query, the plugin returns OK and - hurray - you have a working MySQL server.

The problem with multi-master nodes

However there's a problem when running multi master nodes, or a Galera cluster. As the plugin is executed on multiple database hosts of the same cluster, this table is the same across the cluster. And because there's only one row with a unique ID (1) in this table, this causes problems.

On a three node Galera cluster with 4 monitoring servers and a check interval of 10 seconds, this single row is updated at least 12 times within 10 seconds. Forcibly the write operations overlap and this creates a deadlock. The UPDATE query would then get the following error:

ERROR 1213 (40001) at line 1: Deadlock found when trying to get lock; try restarting transaction

And the plugin would return "CRITICAL: There was an error trying to write into mymonitoring.monitoring.  Do a manual check.".

Adjusting the check_mysql_write plugin

To overcome that deadlock situation, I adapted the structure of the monitoring table and the plugin itself. Instead of using a hard-coded unique ID, the table now has columns host and mytime. The host column is a VARCHAR type which has the values of the hostname on which the plugin runs. The new table is created like this:

MariaDB [(none)]> CREATE TABLE mymonitoring.monitoring ( host VARCHAR(100), mytime INT(13) );

The plugin now tries to update the timestamp of the row where the local hostname matches the hostname in the table:

UPDATE monitoring SET mytime=$curtime WHERE host = '$(hostname)'

Obviously if the plugin runs for the first time, there is no row with the local hostname and the plugin would fail. This is why the plugin itself creates the row at the very first run, if it can't find it in the table:

INSERT INTO monitoring (host, mytime) VALUES ('$(hostname)', $curtime)

Running the plugin on a Galera cluster

Running the plugin is straightforward:

$ /usr/lib/nagios/plugins/check_mysql_write.sh -H galeranode1 -u monitoring -p secret -d mymonitoring
OK: Write query successful (UPDATE monitoring SET mytime=1558683558 WHERE host='icinga1')

Hurray, the write operation was successful. The plugins which run all over different monitoring hosts created the following entries in the monitoring table:

MariaDB [mymonitoring]> select * from monitoring;
+---------------------------+------------+
| host                      | mytime     |
+---------------------------+------------+
| icinga1                   | 1558680145 |
| icinga2                   | 1558680107 |
| icinga3                   | 1558680109 |
| icinga4                   | 1558680131 |
| haproxy1                  | 1558680149 |
| haproxy2                  | 1558680148 |
+---------------------------+------------+
6 rows in set (0.00 sec)

The plugin is now public

The plugin check_mysql_write can now be used by everyone interested. You can find it on GitHub: https://github.com/Napsty/check_mysql_write .

Need help in Galera troubleshooting?

Problems in Galera Clusters are not always easy to spot. Need help troubleshooting a Galera cluster? Contact us on Infiniroot.com.