Deepcrawl is now Lumar. Read more.

Lumar Log File Integration: Logz.io

At Lumar, we understand that log file analysis is one of the most critical data sources in search engine optimization. Log file data allows SEO teams to identify how search engine crawlers are accessing the website and can help troubleshoot crawling and indexing issues on a website.

To help our customers easily pull log file data into Lumar, we have partnered with Logz.io.

Our team has written this guide to help customers answer the following questions:

 

What is Logz.io?

Loz.io is a company that provides log management and log analysis services. Their platform combines ELK as a cloud service and machine learning to derive new insights from machine data.

What is ELK?

ELK stack is an acronym used to describe an AWS log management platform consisting of three popular open-source projects: Elasticsearch, Logstash, and Kibana.

  • Elasticsearch is a deep search and data analytics project.
  • Logstash is a centralized logging and parsing project.
  • Kibana is a data visualization project.

ELK gives customers the ability to:

  1. Aggregate logs from all systems and applications
  2. Analyze these logs, and
  3. Create visualizations for application and infrastructure monitoring.

 

How does Logz.io work with Lumar?

Logz.io and Lumar work together as follows:

  1. An account is created in Logz.io.
  2. Log files are shipped from the webserver to Logz.io.
  3. Log files are aggregated and stored in Logz.io.
  4. An API token is generated in the Logz.io account.
  5. This API token is then saved in Lumar.
  6. The API token is then used in the set up of the Logz.io connection in Lumar.
  7. A query is then created in Lumar to fetch log file data through the API.
  8. Lumar sends an authentication request to the API using the API token.
  9. Logz.io API accepts the token and allows Lumar to start requesting log file data based on the query.
  10. The log file data is crawled, processed, and visualized in Lumar, along with other data sources.

 

How to set up and configure Logz.io and Lumar

When integrating with Logz.io, it is crucial to make sure:

  1. Log files are shipping to the Logz.io ELK stack, and
  2. Lumar is set up to access log files from Logz.io.

Let’s go through these two steps in detail:

1. Shipping log files to Logz.io

Before Lumar can crawl log files, they need to be shipped to Logz.io ELK stack for storage and processing.

Logz.io provides a list of all the shipping solutions in their documentation and the dashboard.

Hover over “Logs” in the left side navigation bar:
how to integrate Lumar and Logz.io - go to Logz.io navigation bar

Mouse down to “Send your logs” and click on it in the popup menu:

Logz.io Nav Menu - Manage Data - Send your Logs
Our team strongly recommends going through the shipping options with your engineering or IT teams to better understand how to ship logs into Logz.io for your web server.

The most common log shipping methods for Logz.io are Filebeat and Amazon S3 fetcher.

Filebeat

Filebeat is a lightweight open-source log shipping agent installed on a customer’s HTTP server. Logz.io recommends it because it is the easiest way to get logs into their system.

Logz.io also provide documentation on how to set up Filebeat on the most common web servers:

Amazon S3 fetcher

Amazon Web Services can be used to ship certain services (for example, Cloudfront log files) to an S3 bucket, where the Logz.io Amazon S3 fetcher can be used to fetch the logs.

Our team recommends going through the documentation and technical setup around this option with your engineering and/or IT teams.

2. Allowing Lumar access to Logz.io

Once the log files have been shipped to the ELK stack in Logz.io, the Lumar connection needs to be set up.

1. Go to the Connected Apps page in Lumar.

Connect Logz.io with Lumar

2. Click “Add Logz.io account” and navigate to Logz.io.

Enter Logz.io token

3. Go to the Logz.io dashboard and click on the cog icon in the top right. Go to Tools > API tokens.

Select the Cogwheel “Settings” in the lower-left Logz.io navigation pane.

Logz.io Nav Menu - Settings

Mouse over and click on “Manage Tokens” in the pop-up menu:
Logz.io Navigation Menu for Lumar integration- Manage Tokens

4. Click “API Tokens” On the “Manage Tokens” screen, then + New API Token
Logz.io menu for integration with Lumar - Manage API tokens

Choose a name for the new token and click “Add.”
Logz.io navigation menu - Add new API token for integration with Lumar

5. Copy and paste the API token into the Logz.io account info on the Connected Apps page in Lumar.

Add Logz.io API token to Lumar

6. The API token will then be saved in the Connected Apps page in your account.

Logz.io Added to Lumar
 

Adding log file data to a crawl

1. To select log files as a data source from Logz.io navigate to Step 2 in advance settings.

Logz.io Data Source

2. Scroll down to the Log Summary source, select Logz.io Log Manager and click on the “Add Logz.io Query” button.

Add Logz.io Query

3. The “Add Logz.io Query” button will open a query builder which, by default, contains pre-filled values for the most common server log file setup (more information about values below).

Logz.io and Lumar Query Builder

4. Once the query builder is configured, hit the save button to allow Lumar to crawl URLs from Logz.io API.
 

How to configure the query builder

The query builder can be used to customize the default values in Lumar.

Query Builder Data Fields

The query builder requires analysis and editing to make sure that Lumar can pull in log file data from Logz.io.

Our team has described each field below and how to make sure each can be checked to make sure the query builder is correctly set up.

Base URL

The base URL value is the HTTP scheme (ie. https://) and domain (www.lumar.com), which will be used to append to the beginning of relative URLs (.i.e. /example.html) in log file data.

If it is left blank, the primary domain in the project settings will be used by default.

Please make sure that the URLs in the log file data are from the primary domain you want to crawl, otherwise, Lumar will flag a large number of crawl errors.

Token

The Logz.io token value is the API token used to connect Lumar to your Logz.io account.

Please make sure that the API token used is still active, and is created in the correct account.

Date Range

The date range value uses days as a metric, and Lumar will collect logs in the timeframe of the days inputted into this field.

By default, the date range is set to 30 days. Check if the date range used in your Logz.io account contains log file data.

Desktop user-agent regex

This field tells Lumar only to fetch data from Logz.io which has a specific desktop user-agent string.

Our team has provided the regex for Googlebot desktop only hits below:

(.*Mozilla\/5.0 \(compatible; Googlebot\/2.1; \+http:\/\/www.google.com\/bot.html\).*)|(.*AppleWebKit.*Googlebot.*Safari/537.36.*)

If you require customized user-agent strings, please get in contact with your customer success manager.

Mobile user-agent regex

This field tells Lumar only to fetch data from Logz.io, which has a specific mobile user-agent string.

Our team has provided the regex for Googlebot mobile-only hits below:

.*Android.*Googlebot.*

If you require customized user-agent strings, please get in contact with your customer success manager.

Max number of URLs

This is the maximum number of URLs you want to fetch from the Logz.io API.

Please be aware that this field will not override the total URL limit in the project settings.

URL field name

The URL field name is the name of the URL column in the Logz.io database. This field helps Lumar look up the column that lists all relative URL rows in Logz.io, and fetch the pages which are in the log files.

By default, the query builder will look for a column called “request”. For most websites, this will allow Lumar to look up the right column and fetch the relevant URL rows.

Query Builder URL Field Name

However, each website is unique, and different tech stacks can cause these columns to be named differently. This means that sometimes, the URL field name will need to be updated.

To do this, navigate to the Logz.io dashboard > Logs > Kibana:

Logz.io nav - see Kibana drop-down menu - to update URL names

Click on the arrow icon next to the top row of the log file data.

Logz.io nav menu - click on small arrow icon next to log file item

The arrow icon will open up all the columns and data for that specific hit:

Logz.io log file detailed data page - opened from larger list

In this drop-down look for the column with the URL, which was requested in the log file data, be careful not to mix up the source URL column.

Logz.io log file detailed data page - identify the URL
Once you have identified the URL, make a note of the name of the column. In the example screenshot above, this is “request”.

Go back to the query builder in Lumar and make sure the URL field name matches the name of the column.

Query Build URL Field Name Field

User agent field name

The user agent field name is the name of the user agent column in the Logz.io database. This field helps Lumar look up the column, which lists all the user agent strings in Logz.io and applies the user agent regex to filter out particular bot hits.

By default, the query builder will look for a column called “agent”. For most websites, this will allow Lumar to look up the right column and fetch the relevant URLs with particular user agents.

Query Build Agent Name Field

However, each website is unique, and the different tech stacks can cause these columns to be named differently. This means that sometimes the user agent field name will need to be updated.

To do this, navigate to the Logz.io dashboard > Kibana > Discover.
Logz.io nav - see Kibana drop-down menu - to update URL names

Click on the arrow icon next to the top row of log file data.

Logz.io Kibana Discover Dropdown

The arrow will open up all the columns and data for that specific hit.

Logz.io Fields

In this drop-down look for the column with the user agent string, which requested the log file data.

Logz.io log file - find the user agent

Once you have identified the user agent column name, please make a note of the name it is using. Go back to the query builder in Lumar and make sure the URL field name matches the name of the column.

Query Builder User Agent Field Name
 

Filtering log file data in the query builder

The Lumar query builder can also filter data that is fetched from Logz.io using JSON.

Lumar Query Filter

For example, if you wanted to filter on a specific domain or subdomain.

Our team recommends getting in touch with the customer success team if you want to filter using JSON.
 

Frequently Asked Questions

Should I run a sample crawl?

Yes, our team always recommends running a sample crawl when a new Logz.io query has been set up as a crawl source.

Running a sample crawl will prevent you from having to wait for a massive crawl to finish only to discover there was an issue with the settings. It also helps to reduce the number of wasted credits.

Why is Lumar not pulling in log file data?

These are the most common reasons why Logz.io may not be pulling in data:

  • The API token is not from the correct account.
  • The user agent regex is not correct.
  • The URL fields are not correct.

If log files are still not being pulled in after these issues have been resolved, then we recommend getting in touch with your customer success manager.

Why is Lumar not able to crawl log file data?

Sometimes log file data is being pulled in correctly, but due to other issues, the crawl still fails.

Our team also recommends reading the “how to debug blocked crawls” and “how to fix failed website crawls” documentation.
 

Further questions on Logz.io?

If you’re still having trouble setting up Logz.io, then please don’t hesitate to get in touch with our support team.

Content slider testimonial avatar placeholder
Adam Gent

Product Manager & SEO Professional

Search Engine Optimization (SEO) professional with over 8 years’ experience in the search marketing industry. I have worked with a range of client campaigns over the years, from small and medium-sized enterprises to FTSE 100 global high-street brands.

Newsletter

Get the best digital marketing & SEO insights, straight to your inbox