Infiniroot Blog: We sometimes write, too.

Of course we cannot always share details about our work with customers, but nevertheless it is nice to show our technical achievements and share some of our implemented solutions.

Convert database and tables from utf8 to utf8mb4 and solve collation errors in Jira and Confluence

Published on September 29th 2020


Note: This guide applies to both Jira and Confluence. This article can also be used in general to migrate from utf8 to utf8mb4.

After Jira's database was migrated from a MariaDB 10.3 to a MySQL 5.7 and Jira was updated from 8.5 to 8.12, issues started when creating a ticket with special characters (such as an emoticon) in it:

(SQL Exception while executing the following:INSERT INTO jiraaction (ID, issueid, AUTHOR, actiontype, actionlevel, rolelevel, actionbody, CREATED, UPDATEAUTHOR, UPDATED, actionnum) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?) (Incorrect string value: '\xF0\x9F\x98\x8A \x0A...' for column 'actionbody' at row 1))

The reason for this is that Jira now supports UTF8MB4 character encoding in combination with MySQL 5.7 (see Connecting Jira applications to MySQL 5.7). In MySQL 5.6 this character encoding caused application issues and setting the Jira database to UTF8 was mandatory (see Connecting Jira applications to MySQL 5.6).

Jira and database / collation compatibility

As a general rule of thumb, the supported MySQL character sets and collations by Jira are (as of September 2020):

Additionally it depends on your Jira (and Confluence) version, whether or not it can communicate with a utf8mb4 MySQL database. The following list indicates the minimum release version to be able to use utf8mb4 from an application point of view:

Older releases, especially when using a MySQL 5.6 database, must continue to use utf8.

Changing the database to utf8mb4

Note: This basically applies to all databases, not only to Jira.

In this case we created a database dump, dropped the Jira database and re-created it with UTF8MB4 (as described here):

mysql> CREATE DATABASE Jira CHARACTER SET utf8mb4 COLLATE utf8mb4_bin;

Afterwards the database dump was imported into this new empty database.

Note: It would also have been possible to simply alter the database using:

mysql> ALTER DATABASE Jira DEFAULT CHARACTER SET = utf8mb4 DEFAULT COLLATE = utf8mb4_bin;

See article How to change the collation of a MySQL database for more information.

This is of course much faster than to export and import a database dump.

Error: Unsupported collation: utf8mb4_bin

However after a Jira restart, the instance health checks showed an error in the database collation check:

Your mysql database is currently using an unsupported collation: utf8mb4_bin. You should change this to a supported collation: utf8_bin

Jira health check database error: unsupported collation utf8mb4_bin

In order that Jira starts to use the newer UTF8MB4 character encoding, it needs to be told to use a MySQL 5.7 (using mysql57 as database type) now. This needs to be defined in Jira's dbconfig.xml:

root@jira:~# head /var/atlassian/application-data/jira/dbconfig.xml
<!--?xml version="1.0" encoding="UTF-8"?-->

<jira-database-config>
<name>defaultDS</name>
<delegator-name>default</delegator-name>
<database-type>mysql57</database-type>
<jdbc-datasource>
<url>jdbc:mysql://dbserver:3306/Jira?useUnicode=true&characterEncoding=UTF8&sessionVariables=default_storage_engine=InnoDB</url>
<driver-class>com.mysql.jdbc.Driver</driver-class>
<username>jirauser</username>
</jdbc-datasource>
</jira-database-config>

Note: The previous database-type was just 'mysql'.

But this alone did not solve it...

Error: 'utf8_bin' is unsupported by Jira

After database-type was changed in dbconfig.xml and Jira restarted, the next error was shown in the database health check:

The table collation: 'utf8_bin' is unsupported by Jira. The Database collation: 'utf8mb4_bin' is supported by Jira.

Jira health check database error: unsupported table collation utf8_bin

Important here is the mention of "table collation". This error is showing because although the database's encoding was set to utf8mb4, all the tables (imported from the dump) still have the old utf8 collation active.

Confluence equivalent error

Confluence might run into the same kind of error, but the error look somewhat different. The following screenshot would be shown if Confluence was not able to start due to a conflict with table collations:

Confluence not starting up due to wrong character encodings detected in database tables

Detected tables with non-default character encoding. See https://confluence.atlassian.com/x/TABrFw

Verifying table encoding and collations

According to Jira's health check, the table collations are still set to utf8_bin. This can be verified using the following SQL query (limited to 15 tables):

mysql> SELECT DISTINCT table_name,table_collation FROM information_schema.tables WHERE table_schema = "Jira" LIMIT 0,15;
+--------------------------------+-----------------+
| table_name                     | table_collation |
+--------------------------------+-----------------+
| AO_013613_ACTIVITY_SOURCE      | utf8_bin        |
| AO_013613_EXPENSE              | utf8_bin        |
| AO_013613_EXP_CATEGORY         | utf8_bin        |
| AO_013613_FAVORITES            | utf8_bin        |
| AO_013613_HD_SCHEME            | utf8_bin        |
| AO_013613_HD_SCHEME_DAY        | utf8_bin        |
| AO_013613_HD_SCHEME_MEMBER     | utf8_bin        |
| AO_013613_PERMISSION_GROUP     | utf8_bin        |
| AO_013613_PROJECT_CONFIG       | utf8_bin        |
| AO_013613_SAVED_REPORT         | utf8_bin        |
| AO_013613_SAVED_REPORT_V2      | utf8_bin        |
| AO_013613_WA_SL_VALUE          | utf8_bin        |
| AO_013613_WA_VALUE             | utf8_bin        |
| AO_013613_WL_SCHEME            | utf8_bin        |
| AO_013613_WL_SCHEME_DAY        | utf8_bin        |
+--------------------------------+-----------------+
15 rows in set (0.01 sec)

But that isn't enough. Not only the tables have a collation - the columns inside the tables (may) also have character encoding defined. The following SQL query helps to find all the column names and the relevant tables (here limited to 15 results):

mysql> SELECT DISTINCT column_name,table_name,character_set_name,collation_name FROM information_schema.columns WHERE table_schema = "Jira" AND character_set_name IS NOT NULL LIMIT 0,15;
+-------------+----------------------------+--------------------+----------------+
| column_name | table_name                 | character_set_name | collation_name |
+-------------+----------------------------+--------------------+----------------+
| SOURCE_ID   | AO_013613_ACTIVITY_SOURCE  | utf8               | utf8_bin       |
| SOURCE_TYPE | AO_013613_ACTIVITY_SOURCE  | utf8               | utf8_bin       |
| CREATED_BY  | AO_013613_EXPENSE          | utf8               | utf8_bin       |
| DESCRIPTION | AO_013613_EXPENSE          | utf8               | utf8_bin       |
| SCOPE_TYPE  | AO_013613_EXPENSE          | utf8               | utf8_bin       |
| NAME        | AO_013613_EXP_CATEGORY     | utf8               | utf8_bin       |
| ITEM        | AO_013613_FAVORITES        | utf8               | utf8_bin       |
| TYPE        | AO_013613_FAVORITES        | utf8               | utf8_bin       |
| USER        | AO_013613_FAVORITES        | utf8               | utf8_bin       |
| DESCRIPTION | AO_013613_HD_SCHEME        | utf8               | utf8_bin       |
| NAME        | AO_013613_HD_SCHEME        | utf8               | utf8_bin       |
| DESCRIPTION | AO_013613_HD_SCHEME_DAY    | utf8               | utf8_bin       |
| NAME        | AO_013613_HD_SCHEME_DAY    | utf8               | utf8_bin       |
| TYPE        | AO_013613_HD_SCHEME_DAY    | utf8               | utf8_bin       |
| USER_KEY    | AO_013613_HD_SCHEME_MEMBER | utf8               | utf8_bin       |
+-------------+----------------------------+--------------------+----------------+
15 rows in set (0.02 sec)

This means: Not only the tables need to be altered for the new UTF8MB4 encoding, but also the table columns shown above.

Note: The advantage here is that all existing tables and columns use utf8. If some of the encodings would still use latin1, this might cause migration problems (and should probably first be migrated from latin1 to utf8).

Time to backup!

Before continuing and starting the character conversion in the MySQL database, Jira should be stopped and a full database dump/backup should be created.

root@jira:~# mysqldump --routines --events --single-transaction --quick --max_allowed_packet=512M Jira | gzip > Jira.20200929.sql.gz

utf8 to utf8mb4 conversion

A couple of good examples on how to migrate from utf8 to utf8mb4 can be found in the responses of a DBA Stackexchange question. A good way was shared by Stackexchange user mrjingles87, but it needs to be slightly adjusted for Jira.

First create a file, we name it migrate-to-utf8mb4.sql, with the following content (make sure you're using the correct database name, here 'Jira'):

root@jira:~# cat migrate-to-utf8mb4.sql
use information_schema;
SELECT concat("ALTER DATABASE `",table_schema,"` CHARACTER SET = utf8mb4 COLLATE = utf8mb4_bin;") as _sql
FROM `TABLES` where table_schema like "Jira" and TABLE_TYPE='BASE TABLE' group by table_schema;
SELECT concat("ALTER TABLE `",table_schema,"`.`",table_name,"` CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_bin;") as _sql  
FROM `TABLES` where table_schema like "Jira" and TABLE_TYPE='BASE TABLE' group by table_schema, table_name;
SELECT concat("ALTER TABLE `",table_schema,"`.`",table_name, "` CHANGE `",column_name,"` `",column_name,"` ",data_type,"(",character_maximum_length,") CHARACTER SET utf8mb4 COLLATE utf8mb4_bin",IF(is_nullable="YES"," NULL"," NOT NULL"),";") as _sql
FROM `COLUMNS` where table_schema like "Jira" and data_type in ('varchar','char');
SELECT concat("ALTER TABLE `",table_schema,"`.`",table_name, "` CHANGE `",column_name,"` `",column_name,"` ",data_type," CHARACTER SET utf8mb4 COLLATE utf8mb4_bin",IF(is_nullable="YES"," NULL"," NOT NULL"),";") as _sql
FROM `COLUMNS` where table_schema like "Jira" and data_type in ('text','tinytext','mediumtext','longtext');

Note: It's very important to use COLLATE utf8mb4_bin. The original collate of the Stackexchange article above uses utf8mb4_unicode_ci, which is not compatible with Jira.

Now run the sql file against the MySQL server:

root@jira:~# mysql < migrate-to-utf8mb4.sql | egrep "^ALTER" > finalalterquery.sql

This generates the ALTER TABLE queries for each table and each column (needing a conversion) and saves it in an additional file (finalalterquery.sql). The queries also keep existing data type limits, such as varchar character limits:

root@jira:~# grep varchar finalalterquery.sql |tail
ALTER TABLE `Jira`.`userpickerfiltergroup` CHANGE `groupname` `groupname` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NULL;
ALTER TABLE `Jira`.`versioncontrol` CHANGE `vcsname` `vcsname` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NULL;
ALTER TABLE `Jira`.`versioncontrol` CHANGE `vcsdescription` `vcsdescription` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NULL;
ALTER TABLE `Jira`.`versioncontrol` CHANGE `vcstype` `vcstype` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NULL;
ALTER TABLE `Jira`.`workflowscheme` CHANGE `NAME` `NAME` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NULL;
ALTER TABLE `Jira`.`workflowschemeentity` CHANGE `WORKFLOW` `WORKFLOW` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NULL;
ALTER TABLE `Jira`.`workflowschemeentity` CHANGE `issuetype` `issuetype` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NULL;
ALTER TABLE `Jira`.`worklog` CHANGE `AUTHOR` `AUTHOR` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NULL;
ALTER TABLE `Jira`.`worklog` CHANGE `grouplevel` `grouplevel` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NULL;
ALTER TABLE `Jira`.`worklog` CHANGE `UPDATEAUTHOR` `UPDATEAUTHOR` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NULL;

The finalalterquery.sql file can now be run against the MySQL server. Before you do that, make sure that the application using this database (Jira in this case) is stopped. An ALTER TABLE will lock the table and the application might stop working correctly.

root@jira:~# mysql < finalalterquery.sql

This took 8 minutes to complete on a ~4GB Jira database.

Error 1832 foreign key constraint

While running finalalterquery.sql on MySQL, you might get the following error (seen with Confluence database):

root@confluence:~# mysql < finalalterquery.sql
ERROR 1832 (HY000) at line 65: Cannot change column 'BELONG_SUB_CALENDAR_ID': used in a foreign key constraint 'fk_ao_950dc3_tc_custom_ev_types_belong_sub_calendar_id'

In this case adjust finalalterquery.sql and at the very first line add: SET FOREIGN_KEY_CHECKS=0;
As the very last line of finalalterquery.sql, add: SET FOREIGN_KEY_CHECKS=1;

To verify:

root@confluence:~# head -n 1 finalalterquery.sql
SET FOREIGN_KEY_CHECKS=0;
root@confluence:~# tail -n 1 finalalterquery.sql
SET FOREIGN_KEY_CHECKS=1;

Then run mysql < finalalterquery.sql again. This time it should run through.

Jira start and health checks are green

After Jira was started again (without any changes in dbconfig.xml), the instance health checks finally showed all checks passed.

Jira database migrated to utf8mb4 - health checks pass

Atlassian Jira or Confluence Hosting needed?

Looking for professional Atlassian Jira or Confluence server hosting in Switzerland? Head on over to Infiniroot!