Tag Archives: kibana

Logging for the masses

(I really need to update the blog template 🙁 )

Problem: there are several sources of logs you want to consult, search in a centralized way. Also those logs should be correlated for events and raise alerts.

At first glance there are 2 alternatives: Splunk, maybe the leader for logging systems and ArcSight Logger already installed in Poland RDC.

The former is ridiculously expensive (at least for my miserable budget) and the later is a bureaucratic hell.

Both are expensive solutions, proprietary and closed, so sometimes pays itself to look for inexpensive and free (as in speech) source

The free solution involves using Logstash, Elasticsearch and Kibana for logging, storing and presentation.

Web Server Logging

We have about 80 log feeds from 15 web applications and 30 servers, the goal is log everything and be able to search by app, server, date, IP,…

The good news are that all those logs follows the same pattern 

The architecture follows the next scheme (the configuration files are sanitized):

logstash-infr

Logstash-forwarder: formerly known as lumberjack. It’s an application that tails logs and sends them in a secure channel to logstash over a tcp port, maintains an offset of each log.

Logstash (as shipper): Receives all the logs streams and stores them in a Redis data store.

Redis: It works here like a message queue between shipper and indexer. A thousand times easier to setup than ActiveMQ.

Logstash (as indexer): Extracts from the redis queue and process the data: parse, map and store in a elasticsearch db.

ElasticSearch: the database where logs are stored, indexed and be searchable.

Kibana: a php frontend for ES, allows the creation and customization of dashboards, queries and filters.

 

Logstash works as shipper and indexer why split those functions in two different process?

  • Because we don’t want to lose data.
  • Because the indexer can do some serious, CPU intensive tasks per entry.
  • Because the shipper and indexer throughput are different and not synchronized.
  • Because the logs can be unstructured and the match model could have errors reporting null pointers and finally out of memory killing or making it a zombie process (as when i tried to add some JBoss log4j logs).

For those reasons there is a queue between shipper and indexer, so the infrastructure is resilient to downtimes and the indexer isn’t saturated by the shipper throughput.

Logstash-forwarder configuration

A JSON config file, declaring the shipper host, a certificate (shared with the shipper) and which paths are being forwarded.

One instance per server

{

  "network": {

        "servers": [ "<SHIPPER>:5000" ],

    "ssl certificate": "/opt/logstash-forwarder/logstash.pub",

    "ssl key": "/opt/logstash-forwarder/logstash.key",

    "ssl ca": "/opt/logstash-forwarder/logstash.pub",

    "timeout": 15

  },

"files": [

             {

      "paths": [

        "/opt/httpd/logs/App1/access.log" , "/opt/httpd-sites/logs/App2/access_ssl.log"

      ],

      "fields": { "type": "apache" },

      "fields": { "app": "App1" }

    },

            {

      "paths": [

        "/opt/httpd/logs/App2/access.log" , "/opt/httpd/logs/App2/access_ssl.log"

      ],

      "fields": { "type": "apache" },

      "fields": { "app": "App2" }

    }

  ]

}

Logstash as shipper

Another JSON config file, accepts logs streams and stores them in a redis datastore.

input {

lumberjack {

port => 5000

ssl_certificate => "/etc/ssl/logstash.pub"

ssl_key => "/etc/ssl/logstash.key"

codec => json

}

}

output {

stdout { codec => rubydebug }

redis { host => "localhost" data_type => "list" key => "logstash" }

}

 

Redis

I think is out of the scope of this blog entry, it’s really dead easy, a default config was enough. It would need scaling depending on the throughput.

Logstash as indexer

Here the input it’s the output of the shipper, the output it’s the ES database, between there is the matching section where we filter the entries (we map them, dropping the health-checks from the F5 balancers and tagging entries with 503 errors). Yes, the output can be multiple too here not only we store those matches but those 503 are sended to a zabbix output which in turn sends them to our zabbix server.

input {

redis {

host => "<REDIS_HOST>"

type => "redis"

data_type => "list"

key => "logstash"

}

}

filter {

grok {

match => [ "message", "%{IPORHOST:clientip} %{USER:ident} %{USER:auth} \[%{HTTPDATE:timestamp}\] \"(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})\" %{NUMBER:response:int} (?:%{NUMBER:bytes}|-) %{QS:referrer} %{QS:agent} %{QS:jsessionid} %{QS:bigippool} %{NUMBER:reqtimes:int}/%{NUMBER:reqtimems:int}" ]

}

}

filter {

if [request] == "/f5.txt" {

drop { }

}

}

filter {

if [response] == "503" {

alter {

add_tag => [ "zabbix-sender" ]

}

}

}

output {

stdout { }

elasticsearch {

cluster => "ES_WEB_DMZ"

}

zabbix {

# only process events with this tag

tags => "zabbix-sender"

# specify the hostname or ip of your zabbix server

# (defaults to localhost)

host => "<ZABBIX_SERVER>"

# specify the port to connect to (default 10051)

port => "10051"

# specify the path to zabbix_sender

# (defaults to "/usr/local/bin/zabbix_sender")

zabbix_sender => "/usr/bin/zabbix_sender"

}

}

 

ElasticSearch

The configuration file for a basic service is easy. Depending on the needs, throughput, how many searches per second it gets complicate (shards, masters, nodes,…) but for a very occasional use with this line is enough:

cluster.name: ES_WEB_DMZ

Kibana

Another easy configuration, it only needs to know the ES address: “http://<ES_HOST>:9200” and that’s all. Dashboards and queries are saved in the ES database. The php files and directories can be read only.

This post was originally posted in my company intranet and was showing 2 dashboards/screenshots that I can’t reproduce here:

  1. A simple dashboard showing how the logs are distributed per application, server, how many entries and their response times. Each facet can be inspected and go deeper.
  2. A dashboard showing the application errors (error codes 5XX)

 

Risk management

In a migration there are three risks that must be controlled yes or yes.

In the development environment, use a original service thinking that you are using a migrated one. In this scenario the tests will work flawlessly and the go-live day will be plagued with errors because it hasn’t been properly tested, actually it hasn’t been tested!

In production, there are two. Corrupting with test data the original live production environment and after the go-live saving production data in already migrated or development systems.

Solutions to mitigate or avoid the problems related to those risks?

One is simple, cosmetic but it works wonders. Everybody remember OMG Ponies! So choose an official RDC migration colour, the camper the better and change the background colour on the migration branch of each app. Nobody will be able to say that it was a mistake doing tests in a real app thinking they were attacking its migrating counterpart. Also if by error someone promotes an app with real properties (from a source code branch different to the migration one)  it’ll be detected instantly.

The second is like putting a profilatic over your systems. Yes, internal firewalls. Netfilter/iptables for the win. Using Windows? Sorry. Twice (the first one for having to use it). Iptables is a bit rough, I prefer using shorewall. A collection of scripts for configuring iptables using policies and high level objects.

The different systems will be in different networks ( if they aren’t, the migration is trivial or there are more serious issues to think about… Like abandoning the boat as fast as possible). In the development environment, reject (and log) egress connections to the original network range is usually enough. If instead of rejecting the conn, it’s dropped, you get timeouts, it’s usually better a fast log with a connection reset in it than waiting minutes not knowing what’s happening. Production’s environment rules have a different scope, all outward connections are blocked by default, and only selected network ranges are opened explicitly.

Derivative accesses are still a risk, more if they are managed by other departments with other standards. Some can be tracked per application… others are a matter of faith.

Tracking all those rejected conns can be done easily at Kibana. Just add to the automatic process (I use fabric) for configuring shorewall/iptables on all the servers, the option to relay the shorewall log to lumberjack/logstash (see Warlogs).

 

Warlogs

One of the first things to do in a migration is to be able to handle a lot of logs without having to check each one manually.

If you don’t have one then you need it, in a final state maybe the log of an application is enough but during a migration you will need to control firewall, security,… In multiple servers because usually you should have your services clustered or at least balanced. Multiply it for the number of applications and you get the idea.

There could be an official centralized log system but sometimes realpolitik or strict rules of use (in format, size, origin, use,…) don’t advise its use.

A temporal solution is using logstash, temporal like a scaffolding. A permanent solution needs a lot of capacity planning and a serious logging strategy out of the scope of this entry (maybe another day).

You need a server with a few GB of space but don’t fret about it you can always delete the logs saved. Remember the goal isn’t keeping an historic is detecting errors usually when testing.  And a reason to view it as temporal or a demo of the things to come.

The first thing is to install a standalone elasticsearch server. The database which keeps the logs. Carlos Spitzer a Red Hat engineer (who I met in a previous project btw) explains how to create an rpm for rhel/centos, and for Debian Internet is full of examples. Actually all this applications are very basic and simple to install.

Logstash receives the logs and records them in the elasticsearch database. It’s a java application so you need a jre but at least the installation, configuration and mantainance is so simple as changing the underlying jar. Its configuration file has three sections: input, transformation and output.

  1. Input section: forget about putting a logstash per server just one is enough. You need to configure a lumberjack input and a SSL certificate and that’s all.
  2. Transform section: patterns, matches, transformation process. The more complex section, I haven’t exploited yet all its capabilities. If I’d had an advice it would had been a grok debugger, the best way for matching grok regex patterns.
  3. Output section: just don’t use the embedded, point it to the previous elasticsearch server.

A last tip, the lasts logstash releases come with two modes, agent (for logging, this mode needs the config file) and web (which starts an embedded Kibana server). You would have two logstash processes running.

Lumberjack, the log feeder, you would need to download it from github and compile it with go, but after that, the exec file is all you’ll need. No libs, no dependencies, no runtimes. Just the exec, a json config file declaring the logstash server port, the SSL cert and which log files to track and it’s ready. If the log files don’t have to much life maybe you need to restart the lumberjack process after refreshing the logs but in general it would manage how many entries should send and if the logstash server is down they will keep them until is online again so there are no logs lost. This exec, the certs and the config file is the only files that you’ll need to deploy in each server, a fabric task or as we have, installing it with a script for automatic app deploying is enough.

For accessing the logs I use Kibana, the embedded one that comes with logstash. Sorry, I don’t like ruby platforms, well I don’t like python eggs either or perl CPAN, a production server is no place for compilers. They always give extra work so a flat jar and a jre runtime is a good stand-off for me.