Wednesday, April 12, 2017

How to write and debug grok pattern in ELK.

elk, it's elastic search, log stash and kibana, three open source key components from elastic search. logstash process the logs, and elastic search provide log indexing, storing ... etc features and kibana displays the result in web UI.

I introduce how to setup elk5 server is my previous post, thus I won't explain the installation and setup but welcome to check previous post before you start it.

If you just start the grok pattern, I recommend you check out the elastic search web site to get some grok pattern basic concept

This post, I will show you how to write/debug a grok pattern with different tools and give a proper tags for your log which allow to search , indexing or event prepare for further utilization in the further such as Machine Learning.


0. outline


  1. Preparation
  2. Design
  3. Implementation
    1. Regular Expression - debugging
    2. Grok Pattern
      1. Method 1: online debug tools
        1. First Steps
        2. Second Steps
        3. Third Steps
        4. Last review result
      2. Method 2: Check Logs


1. Preparation


After you finish the installation you should be able to see the folders/files under /etc/logstash/ as below.

$ ls -larth
total 44K
-rw-rw-r--   1 root       root       1.7K Nov 11 14:25 startup.options
-rw-rw-r--   1 root       root       1.7K Nov 11 14:25 jvm.options
-rw-rw-r--   1 root       root       1.4K Feb 24 09:41 log4j2.properties
-rw-rw-r--   1 root       root       2.8K Feb 24 23:56 logstash.yml.dpkg-old
lrwxrwxrwx   1 root       root         86 Feb 25 00:04 patterns -> /usr/share/logstash/vendor/bundle/jruby/1.9/gems/logstash-patterns-core-4.0.2/patterns
-rw-rw-r--   1 root       root       4.4K Mar 13 16:49 logstash.yml
drwxrwxr-x   4 root       root       4.0K Mar 13 17:01 .
drwxr-xr-x 110 root       root       4.0K Apr  5 10:16 ..
drwxrwxr-x   2 swiftstack swiftstack 4.0K Apr 12 17:39 conf.d
drwxr-xr-x   2 swiftstack swiftstack 4.0K Apr 12 18:05 extra_patterns

There are three major folders you might need to be aware.

a. patterns:
As you can see it's a soft link to your logstash installation directory since logstash will bring the native patterns for your to leverage it.

b. extra_patterns:
You can prepare your customized pattern which is regular expressions mapping to leverage in your grok pattern.

c. conf.d:
You will have a grok pattern mapping under this folder. e.g:
$ ls ./conf.d/ -larth
total 48K
-rw-r--r-- 1 swiftstack swiftstack  271 Feb 27 16:18 30-elk5-output.conf
-rw-r--r-- 1 swiftstack swiftstack  409 Feb 27 20:41 10-rsyslog-input.conf
drwxrwxr-x 4 root       root       4.0K Mar 13 17:01 ..
-rw-rw-r-- 1 swiftstack swiftstack  29K Apr 12 17:39 20-rsyslog-filter.conf
drwxrwxr-x 2 swiftstack swiftstack 4.0K Apr 12 17:39 .


2. Design

The design is all depends on you, in my design I will separate input, filter and output into three diff *.conf but you can either put them together or separate each grok patterns into multiple *.conf thus you might have.
e.g input, filter_1, filter_2 ... etc and output.


In my case, I leverage rsyslog to forward the log to logstash, the detail you should be able to find in here.

a. input:
Input.conf is the taking the logs from syslog and identify the format which is json line by lone.

e.g: $ cat 10-rsyslog-input.conf
# This input block will listen on port 10514 for logs to come in.
# host should be an IP on the Logstash server.
# codec => "json" indicates that we expect the lines we're receiving to be in JSON format
# type => "rsyslog" is an optional identifier to help identify messaging streams in the pipeline.

input {
  udp {
    port => 10514
    codec => json_lines
    type => "rsyslog"
  }
}

b. filter
Filter.conf is the key we put the grok patterns. I will emphasis it in later section.

e.g: $ cat 20-rsyslog-filter.conf
# This is an empty filter block.  You can later add other filters here to further process
# your log lines

filter {
################################
# Object Expirer patterns
################################
#
# EXAMPLE LOG ENTRIES
################################
# OBJ_EXP_START Oct 15 13:13:00 localhost.localdomain object-expirer: Pass beginning; 0 possible containers; 0 possible objects
# OBJ_EXP_FINISH Oct 15 13:13:00 localhost.localdomain object-expirer: Pass completed in 0s; 0 objects expired
################################
  if [programname] == "object-expirer" {
    if "beginning" in [message] {
      grok {
        patterns_dir => ["/etc/logstash/patterns", "/etc/logstash/extra_patterns"]
        match => { "message" => " Pass beginning; %{BASE10NUM:PossibleContainers} possible containers; %{BASE10NUM:PossibleObject} possible objects" }
        add_tag => ["swift", "object", "expirer", "OBJ_EXP_START"]
        remove_tag => ["_grokparsefailure"]
        remove_field => ["PossibleContainers", "PossibleObject"]
      }
    }
    else {
      grok {
        match => { "message" => " Pass completed in %{NUMBER:CompletedSeconds}s; %{BASE10NUM:ObjectsExpired} objects expired" }
        remove_tag => ["_grokparsefailure"]
        add_tag => ["swift", "object", "expirer", "OBJ_EXP_FINISH"]
        remove_field => ["CompletedSeconds", "ObjectsExpired"]
      }
    }
  }

c. output
Output.conf is the logs parse in logstash and forward to elasticsearch thus you can see the output is elasticsearch server ( localhost ) and port.

e.g: $ cat 30-elk5-output.conf
# This output block will send all events of type "rsyslog" to elasticsearch at the configured
# host and port into daily indices of the pattern, "rsyslog-YYYY.MM.DD"

output {
  if [type] == "rsyslog" {
    elasticsearch {
      hosts => [ "localhost:9200" ]
    }
  }
}


3. Implement

This section I will show you how to write a grok patter and how to debug it. I was stuggling the grok patter implementation for a while and found these steps / tools do help me a lot.

Let's give an example from your un-parser logs, as you can see most of the data was matched by default logstash but some of them in message wasn't and turns out can't find any pattern and error out via a tag: _grokparsefailure.
{
  "_index": "logstash-2017.04.12",
  "_type": "rsyslog",
  "_id": "AVtkdWQiSkb40H9LtWNZ",
  "_score": null,
  "_source": {
    "severity": "info",
    "@timestamp": "2017-04-12T23:17:46.000Z",
    "@version": "1",
    "programname": "proxy-server",
    "procid": "-",
    "host": "127.0.0.1",
    "sysloghost": "cloud-pac01",
    "message": " Copying object from /v1/AUTH_sme/stepan/Folders_pack/03 with special characters ~`!@#$%^&()_+-={}[];',._over_PHP.zzz to /v1/AUTH_sme/stepan/.trash/org/Folders_pack/03%20with%20special%20characters%20%7E%60%21%40%23%24%25%5E%26%28%29_%2B-%3D%7B%7D%5B%5D%3B%27%2C._over_PHP.zzz (txn: txae6773cb5efd47ae86c1b-0058eeb59a)",
    "type": "rsyslog",
    "facility": "local0",
    "tags": [
      "_grokparsefailure"
    ]
  },
  "fields": {
    "@timestamp": [
      1492039066000
    ]
  },
  "highlight": {
    "message": [
      " @kibana-highlighted-field@Copying@/kibana-highlighted-field@ @kibana-highlighted-field@object@/kibana-highlighted-field@ @kibana-highlighted-field@from@/kibana-highlighted-field@ /v1/AUTH_sme/stepan/Folders_pack/03 with special characters ~`!@#$%^&()_+-={}[];',._over_PHP.zzz to /v1/AUTH_sme/stepan/.trash/org/Folders_pack/03%20with%20special%20characters%20%7E%60%21%40%23%24%25%5E%26%28%29_%2B-%3D%7B%7D%5B%5D%3B%27%2C._over_PHP.zzz (txn: txae6773cb5efd47ae86c1b-0058eeb59a)"
    ]
  },
  "sort": [
    1492039066000
  ]
}





In above message "  Copying object from /v1/AUTH_sme/stepan/Folders_pack/03 with special characters ~`!@#$%^&()_+-={}[];',._over_PHP.zzz to /v1/AUTH_sme/stepan/.trash/org/Folders_pack/03%20with%20special%20characters%20%7E%60%21%40%23%24%25%5E%26%28%29_%2B-%3D%7B%7D%5B%5D%3B%27%2C._over_PHP.zzz (txn: txae6773cb5efd47ae86c1b-0058eeb59a)" is basically a sting to tell us "Copying object from "A" to "B" (txn: TXN)" that it. However, we notice some of the features in below bullet items.

a. there has special characters in messages e.g : "(" and ")", we need to skip it.
b. there has special characters in messages e.g : "...$%^..."
c. there might has foreign letter which need unicode to be includes in our patterns.


3.1 Regular Expression

First of all we need to design a regular expression pattern to map what we want and allow us to re-use it all the time in the later mapping.

Before we start I have to introduce you a easy on line tool to parse your regex ( regular expression ) - https://regex101.com

a. leverage existing pattern - I call it use your own dog food.
# TXN in transaction number
# In OpenStack swift we use it a lot thus I design a extra-pattern to leverage it all the time
# This example is very straight forward just two words concats together.
TXN %{WORD}[-]%{WORD}


# Above a + b + c bullet items can be summarized as below which 
# Copy from unicode object path which allow unicode
#\w matches any word character (equal to [a-zA-Z0-9_])
#\s matches any whitespace character (equal to [\r\n\t\f\v ])
#\/ matches the character / literally (case sensitive)
#\. matches the character . literally (case sensitive)
#\p{L} matches any kind of letter from any language
#\p{N} matches any kind of numeric character in any script
#\p{S} matches any math symbols, currency signs, dingbats, box-drawing characters, etc
#\p{P} matches any kind of punctuation character
COPYFROMUNICODE [\w\s\/\.\p{L}\p{N}\p{S}\p{P}]+

Via online regex tools we can practice and debug our regex which an match the requirement or not.

This tools has detail explination about how regex map to your test string.
/
[\w\s\/\.\p{L}\p{N}\p{S}\p{P}]+
/
Match a single character present in the list below 
[\w\s\/\.\p{L}\p{N}\p{S}\p{P}]+
+ Quantifier — Matches between one and unlimitedtimes, as many times as possible, giving back as needed (greedy)
\w matches any word character (equal to [a-zA-Z0-9_])
\s matches any whitespace character (equal to [\r\n\t\f\v ])
\/ matches the character / literally (case sensitive)
\. matches the character . literally (case sensitive)
\p{L} matches any kind of letter from any language
\p{N} matches any kind of numeric character in any script
\p{S} matches any math symbols, currency signs, dingbats, box-drawing characters, etc
\p{P} matches any kind of punctuation character

This is how you debug the special string. Try this link.

After we sort the pre-define mapping pattern we put them under /etc/logstash/extra_patterns/...

e.g: $ cat ../extra_patterns/swift
# Swift logs documentation: http://docs.openstack.org/developer/swift/logs.html

# TXN in transaction numbers
TXN %{WORD}[-]%{WORD}

# Copy from unicode object path which allow unicode
#\w matches any word character (equal to [a-zA-Z0-9_])
#\s matches any whitespace character (equal to [\r\n\t\f\v ])
#\/ matches the character / literally (case sensitive)
#\. matches the character . literally (case sensitive)
#\p{L} matches any kind of letter from any language
#\p{N} matches any kind of numeric character in any script
#\p{S} matches any math symbols, currency signs, dingbats, box-drawing characters, etc
#\p{P} matches any kind of punctuation character
COPYFROMUNICODE [\w\s\/\.\p{L}\p{N}\p{S}\p{P}]+


3.2 Grok Patterns

After we prepare extra_patterns we can add grok patterns in 2. filter.conf.

Before you start you might now what's the general structure for your filter, thus below is an example for you to reference it. I highlight them into 4 major colors which is easier for you to catch it. In my case if the pattern match, I will give "swift", "proxy" and "PROXY_SER_SIDE_COPY"

e.g: $ cat 20-rsyslog-filter.conf
filter {
################################
# proxy server patterns info
################################
  if [programname] == "proxy-server" and [severity] == "info" {
    if "Copying object from" in [message] and "to" in [message] {
      grok {
        patterns_dir => ["/etc/logstash/patterns", "/etc/logstash/extra_patterns"]
        match => { "message" => " Copying object from %{COPYFROMUNICODE:copy_from_obj_path} to %{NOTSPACE:copy_to_obj_path} \(txn: %{NOTSPACE:txn}\)" }
        remove_tag => ["_grokparsefailure"]
        add_tag => ["swift", "proxy", "PROXY_SER_SIDE_COPY"]
      }
    }
  }
}

However, how do we know the grok pattern is correct, there has two major way to test it. There has some grok pattern on line debugger, here is the example.

Method 1: you can leverage online debug tools
First column is your source which is message content in my example log

 Copying object from /v1/AUTH_sme/stepan/Folders_pack/03 with special characters ~`!@#$%^&()_+-={}[];',._over_PHP.zzz to /v1/AUTH_sme/stepan/.trash/org/Folders_pack/03%20with%20special%20characters%20%7E%60%21%40%23%24%25%5E%26%28%29_%2B-%3D%7B%7D%5B%5D%3B%27%2C._over_PHP.zzz (txn: txae6773cb5efd47ae86c1b-0058eeb59a)

Second column is the grok pattern I wrote.

 Copying object from %{COPYFROMUNICODE:copy_from_obj_path} to %{NOTSPACE:copy_to_obj_path} \(txn: %{NOTSPACE:txn}\)

In my case I leverage extra pattern thus I check the "Add custom patterns"
And put my extra pattern in the textbook.

COPYFROMUNICODE [\w\/\.\p{L}\p{N}\p{S}\p{P}\s]+
TXN %{WORD}[-]%{WORD}

Third is adding extra pattern then you can click go

Final, then you should be able to see the result as below.


{
  "copy_from_obj_path": [
    [
      "/v1/AUTH_sme/stepan/Folders_pack/03 with special characters ~`!@#$%^&()_+-={}[];',._over_PHP.zzz"
    ]
  ],
  "copy_to_obj_path": [
    [
      "/v1/AUTH_sme/stepan/.trash/org/Folders_pack/03%20with%20special%20characters%20%7E%60%21%40%23%24%25%5E%26%28%29_%2B-%3D%7B%7D%5B%5D%3B%27%2C._over_PHP.zzz"
    ]
  ],
  "txn": [
    [
      "txae6773cb5efd47ae86c1b-0058eeb59a"
    ]
  ]
}

The whole debug page is as blow.


Method 2: check logstash log to trace the grok pattern bug. 
$ sudo service logstash restart 
then tail the logstash log see whether there has clue for you.
$ sudo tail -f /var/log/logstash/logstash-plain.log

If you are not seeing the error and restart log show success as below. Then you are done !!!
[2017-04-12T18:03:19,124][INFO ][logstash.pipeline        ] Starting pipeline {"id"=>"main", "pipeline.workers"=>2, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>5, "pipeline.max_inflight"=>250}
[2017-04-12T18:03:19,133][INFO ][logstash.pipeline        ] Pipeline main started
[2017-04-12T18:03:19,133][INFO ][logstash.inputs.udp      ] Starting UDP listener {:address=>"0.0.0.0:10514"}
[2017-04-12T18:03:19,150][INFO ][logstash.inputs.udp      ] UDP listener started {:address=>"0.0.0.0:10514", :receive_buffer_bytes=>"106496", :queue_size=>"2000"}
[2017-04-12T18:03:19,188][INFO ][logstash.agent           ] Successfully started Logstash API endpoint {:port=>9600}

That it ! Then you should be able to have correct mapping logs with tags which in my case is "swift", "proxy", "PROXY_SER_SIDE_COPU"




Reference:
https://regex101.com
https://grokdebug.herokuapp.com/
https://www.elastic.co/guide/en/logstash/current/plugins-filters-grok.html