read

Visualizing Logdata With Logstash, Statsd and Graphite

Inspired and passionate

Inspired by Etsy’s blogpost Measure Anything, Measure Everything I have given metrics and how to extract them alot of thought. I work at Mintra Trainingportal and I am responsible for operations of our LMS application written in java. The team I work on consists of mostly developers. So much of my work also includes building bridges between operations and developers.

After reading about Metrics and watching codahale’s presentation on Metrics I instantly realized that I had to get this into our system. It did not take me long to find out that the rest of my team was not sharing the same passion as me about implementing Metrics. Do not get me wrong. They love to visualize metrics and the idea to measure our application, but Metrics just did not seem like the right thing for them.

I had to come up with something.

Logstash

Logstash. A tool for managing events and logs. I had played a bit with it earlier, but I did not know I could gather metrics from logs and ship them to statsd (or Graphite). It was this article in the Logstash documentation that made me realize that this was what I was looking for.

A few weeks ago, one of our developers added new log entries to our application log to debug problems we have encountered with indexing.

04 Jul 09:56:01,088 INFO  LuceneIndex-JMS - Indexing on MASTER took (sync: true): 3602
04 Jul 09:56:10,969 INFO  LuceneIndex-JMS - Indexing on MASTER took (sync: true): 2922
04 Jul 09:56:38,762 INFO  LuceneIndex-JMS - Indexing on MASTER took (sync: true): 2697
04 Jul 09:56:43,985 INFO  LuceneIndex-JMS - Indexing on MASTER took (sync: true): 2706

These logs were the perfect entry point for us to start visualizing what our application does in production.

On the server writing these logs I set up a simple Logstash agent that reads the logfile, filters it with a few grok-filters I wrote, and then ships these metrics to statsd. Collection and aggregation of metrics is done in statsd and then statsd ships them off to Graphite where we can use the render-url API to visualize what is happening.

input {
  file {
    path => '/var/log/tomcat/lucene-jms.log'
    type => 'indexing-stats'
  }

  file {
    path => '/var/log/tomcat/access.log'
    type => 'access-log'
  }
}

filter {
  grok {
    type => 'indexing-stats'
    patterns_dir => '/home/user/logstash/patterns'
    pattern => '%{LUCENEJMS}'
  }

  grok {
    type => 'access-log'
    pattern => '%{COMBINEDAPACHELOG}'
  }
}

output {
  statsd {
    host => 'graphite.example.org'
    count => [ "tomcat.bytes", "%{bytes}" ]
  }

  statsd {
    host => 'graphite.example.org'
    increment => "tomcat.response.%{response}"
  }

  statsd {
    host => 'graphite.example.org'
    timer => [ "tomcat.indextime", "%{indextime}" ]
  }
}

Our application did not have a standard date format so I had to write these simple grok filters.

LOG4JTIME %{MONTHDAY} %{MONTH} %{TIME}
LUCENEJMS %{LOG4JTIME} %{WORD:severity} %{DATA:message} %{NUMBER:indextime}

The installations of statsd and Graphite we used are completely standard, no custom configuration. So getting these up and running will be up to you. There is lots of resources out there about these, so search those up if you need information about them.

Visualizing with Graphite

With all the components up and running we can now visualize the metrics with Graphite. The Graphite render-url API is stacked up with different functions that can be used to visualize data.

Here are a few examples.

Total index time

Total index time

Index time with Holt Winters forecast

Index Time with Holt Winters forecast

Index time with summarized data

Index time with summarized data

  • Green line - Average index time per hour
  • Red line - Maximum index time per hour

Index time with standard deviation

Index time with standard deviation

  • Green line - Standard deviation for the past 10 datapoints
  • Red line - Standard deviation for the past 100 datapoints

Index time with moving average

Index time with moving average

  • Green line - Average for the past 10 datapoints
  • Red line - Average for the past 100 datapoints

Do you measure your application?

comments powered by Disqus
Blog Logo

Pål-Kristian Hamre


Published

Image

blog.pkhamre.com

DevOps, Thoughts, Tools and stuff.

Back to Overview