Humans have an amazingly high tolerance for pain.
By that, I mean that we put up with quite a lot without even realizing how uncomfortable it is or how much it may be costing us. In the past, I’ve watched mediocre sysadmins repeatedly copy/paste snippets of shell code into their terminal, rather than automating the entire task. I’ve had to administer Windows Server machines back when scripting Windows was more of a joke than a feature.
And, last week, we used to watch 2000 lines per second of log output stream past our development screens while we sat there pretending to be Cipher, from The Matrix, looking for patterns in the data.
I’ve always wondered just how the Wachowski brothers would have explained the idea you that you could look at a 15″ LCD display and, without any sort of filtering, see and comprehend whatever part of The Matrix was of interest. As engineers, we know that it’s just plain bullshit creative license. We humans just aren’t cool enough to pull off that kind of information processing.
But Graylog2 is.
Log analytics is a bit of a hot topic right now.
Here’s the scene: you have 50 servers (or 5, or 1, or 5000) hosting your web app or running MapReduce or otherwise generating an epic amount of log data. There’s an insane amount of intelligence in there but your lean, mean Platform Engineering team is too busy solving the world’s contact information problem to sift through all the noise to find it.
There are a lot of ways to attack this logging problem, including open source tools like Apache Flume and the Storm Project, and even a few cool startups cropping up to handle it all as a service, like Papertrail and Loggly.
Our favorite right now is Graylog2, and here’s why:
It hauls ass.
We set Graylog2 up on one server (an EC2 m1.xlarge instance) and in less than an hour, we had it indexing over 9,000 log messages per second!
It’s built upon great technologies.
Graylog2 indexes log messages using elasticsearch. If you don’t know about elasticSearch, go check them out now. I’ll wait here for you. When we run out of disk space or memory for the 60 MILLION messages we managed to stuff into the system every day, we simply add more servers and let elasticsearch rebalance all the shards automatically.
It’s backed by XING.
Streaming filters, dynamic dashboards, and alerting.
Since elasticsearch can index so fast, Graylog2 can offer cool streaming features that apply filters and rules to the message stream, notifying you if anything out of the ordinary happens. We’re even wiring this up to our customer management systems so, for example, we can tell our customers if their webhooks endpoint is running into errors:
We tag every API request with a unique ID. Graylog2 parses this out and allows us to reconstruct the history of an API request, no matter how many servers get involved in the process. We can do the same for API keys, customer accounts, webhook endpoints, you name it.
We’re really liking Graylog2. It’s hard to imagine how we got along without it. Check it out!