Of course we cannot always share details about our work with customers, but nevertheless it is nice to show our technical achievements and share some solutions.
Doing Nginx access log analysis in Kibana is great. When using the correct log parsers (either using a Filebeat agent or using Logstash with a a grok filter), each line of the Nginx access log is split into several fields. Each field then becomes searchable. See article Using ELK to collect Nginx logs and show TLS version and ciphers used by HTTP clients for an implementation example.
Note: This also applies to Apache2 access logs by the way. Both use by default the "combined" log format.
In Elasticsearch, the full access log line is stored in the "message" field:
127.0.0.1 - - [28/Sep/2020:15:41:55 +0200] "GET /robots.txt HTTP/1.0" 200 1358 "-" "Mozilla/5.0 (compatible; heritrix/3.3.0-SNAPSHOT-20150302-2206 +http://127.0.0.1)"
If correctly inserted into Elasticsearch (see above for methods), this message is additionally split into the following fields. Here using Filebeat 7 as log shipper (see Using ELK to colle:
This allows to search for very specific log entries, for example showing all the requests to "/robots.txt":
However when the request path is set to just a simple "/" (forward slash), representing the default/home page, Kibana would not show any results:
Why does this happen?
It turns out that "/" is a special character in Elasticsearch's search syntax, used as a regular expression pattern. Adding a "/" into a search query (whether using a filter or direct search) simply tells Elasticsearch to start using regular expression for the following search term. As there is nothing following the "/", there is no search term for Elasticsearch - hence Elasticsearch has no results.
Regular expression patterns can be embedded in the query string by wrapping them in forward-slashes ("/"):
The best explanation we came across (so far) which explains data types vs. keyword types is the article When to use the keyword type vs text datatype in Elasticsearch by ObjectRocket:
The primary difference between the text datatype and the keyword datatype is that text fields are analyzed at the time of indexing, and keyword fields are not. What that means is, text fields are broken down into their individual terms at indexing to allow for partial matching, while keyword fields are indexed as is.
One useful application of the text datatype is for product descriptions. Imagine a user searching for pajamas– chances are, they’ll simply use “pajamas” as their search terms, and you’ll want your results to give them all products that have the word “pajamas” somewhere in their description.
The keyword datatype can come in handy for cases where a user will be querying for exact matches. A good example would be a “state” field. A user will search for “North Carolina” but not for the word “North” by itself. Email addresses are also good candidates for the keyword datatype for similar reasons.
As the wanted search term is a static "/" in the request path, the request.keyword field should be used.
Now that the fact about the keyword (=exact matching) was learned, this is definitely the correct (at least better!) way to filter the access log. Unless someone wants to do relative searches in the request URI of course. But in this case the forward slash is an exact match which we want to see in the results. And - it works:
Results are finally showing up in Kibana: