How to use Log Analytics tool

This page explains the requirements and how to run the Matomo (Piwik) Log Analytics tool to import your server logs in Matomo.

Requirements

  • Install Matomo (or update). This should take around 5 minutes
  • To execute the script you need access to the server via SSH or some way of executing scripts on your server
  • Python 2.6 required. Note: the script that loads and parses the log files is written in Python, but Matomo itself behind the API is written in PHP5
  • You will also need one or more log files to parse and analyze with Matomo (inside each log file the log lines must be ordered by date)
  • Note: we recommend that you use the extended log format which includes user agent, referrer URL, an full URLs (including hostnames) in the logs. If these fields are missing from the logs, analytics data in Matomo will be less accurate.
  • Setup Geo Location for accurate country and city detection. Matomo guesses visitors’ countries based on the visitor’s browser language, but this information is not available in the access logs, so Geo Location is a must have.
  • Matomo 1.7.2 at minimum is required, but we always recommend to update to the latest version.

Differences using Log Analytics VS using Javascript client

When using the server logs import (compared to JavaScript Tracking) there are be a few user data points missing: screen resolutions, browser plugins, and page titles are not available (report Actions > Page Titles will be mostly empty). Tracking cookies cannot be used resulting in afew missing data points. See also this faq.

How to: run the Log File analysis script with default options

Once you have Matomo running, you will find the script in misc/log-analytics/import_logs.py

  1. $ python /path/to/piwik/misc/log-analytics/import_logs.py

This will display the help information. The only required parameter is

  1. --url=http://analytics.example.com

to specify the Matomo base URL. Then, you can specify one or many log files to import.

There are many more options available. See the help output, and the README for more information and explanations about available parameters.

For example, if you wish to track all requests (static files, bot requests, http errors, http redirects) the following command would be used:

  1. python /path/to/piwik/misc/log-analytics/import_logs.py --url=http://analytics.example.com
  2. --idsite=1234 --recorders=4 --enable-http-errors --enable-http-redirects --enable-static
  3. --enable-bots access.log

How to: import more data including bots, static files, and HTTP errors tracking

By default, the script does not track static files (JS, CSS, images, etc.) and excludes all bot traffic.

You can enable these using the following commands:

  • —enable-botswill track search/spam bots in Matomo, using a custom variable with the name of the bot. When enabled, the log file will take longer to process since all bot page views are sent to Matomo. Matomo detect whether a log line is a from a bot by looking at the User-agent field.

Example of Custom Variables reporting Bots user agents:

How to use Log Analytics tool - 图1

  • —enable-staticwill specify tracking of all static files (images, JS, CSS) in Matomo. This will add some time to the general log file processing.

  • —enable-http-errorswill specify tracking of HTTP errors (4xx, 5xx status) as a page view in Matomo, with a custom variable HTTP-code set to 404, 500, etc. The page title for this page view will show the URL referrer if it is specified in the log file (which can help finding out which pages have a link to a 404 for example).

  • —enable-http-redirectswill track HTTP redirect (302,301,3xx) as a page view, with a custom title, and a custom variable. Note: HTTP status 304 responses (“Not modified”) are tracked as page views.

  • —enable-reverse-dnswill enable the reverse DNS (used to generate the Visitors > Providers report), expect a big performance hit as reverse DNS is very slow.

  • —recorders=Nspecifies the number of threads: we recommend setting it to the number of CPU cores in the system (or slightly more or less depending on your server configuration)

  • —recorder-max-payload-size=NThe importer uses the bulk tracking feature of Matomo to achieve greater speed. By default, 300 pageviews (or log lines) will be sent to Matomo at once. You can experiment with this number to try and achieve better performance, but there is an upper limit to the speed you can get.

How to: exclude some particular log lines

There are several ways to exclude particular log lines or visitors from being tracked.

  • you can exclude specific IP addresses or IP ranges from being tracked. To configure excluded IPs, log into Matomo as Super User, then click Administration > Websites.
  • the script provides an option to exclude visits with specific User Agent HTTP headers — via

—useragent-exclude

  • the script provides an option to enforce a whitelist of all URL hostnames that should be considered — all other log lines with a hostname not in the list will not be imported. See the option

—hostname

  • it is also possible to exclude specific log lines where the URL path matches a particular URL path. See the option —exclude-path

For example to exclude all files from the URL example.org/assets/ you would write —exclude-path="/assets*"

To exclude two paths you would write: —exclude-path="path1/here" —exclude-path="/sub/path2"

Using Log Analytics

Learn more about how to use the script, how to import logs automatically every day, and advanced setups in our Log Analytics Readme.

Frequently Asked Questions

For more information and guides, check out our Log Analytics tool FAQs

If you have feature requests for better Server Log processing with Matomo, please let us know using the feedback form below. We look forward to your feedback and hope Matomo will deliver huge value for all server logs.