When you throw a web crawler to a DevOps — ELK stack

Part 1 — Supervisor

After adding supervisor to enhance the stability of the web crawler, we are going to solve another major problem — observability.

Supervisor did output the log of our code. However, how are we going to read those logs? Using grep? Or open a notepad and press ctrl-f to find bugs? The log is so raw that it is not readable yet. It has to be put in an organized way.

Here we introduce the ELK stack, which is a bit overkill in this task, but it’s ok as it’s just a tutorial.

What is ELK

ELK is a combination of three components:

  • E — Elasticsearch
  • L — Logstash
  • K — Kibana
source

These three modules are responsible for different tasks, simply saying

  • Logstash — to route the log data into Elasticsearach
  • Elasticsearch — a powerful search engine, here it is used for indexing the logs
  • Kibana — a visualization tool that is binded with Elasticsearch

Each tool has unique and powerful features. It is natural to use these three tools together. You can add even more modules into this stack, e.g.

  • beats, a data shipper to ship different kinds of data
  • Elastic SQL, using SQL to access the features of Elasticsearch

If you don’t like the kibana interface: this is how you can customize the layout (this tutorial maybe no longer valid for kibana after version 6.1.3)

Or you may choose other visualization tools like Grafana, or Graphite
Comparison of the three tools: https://stackshare.io/stackups/grafana-vs-graphite-vs-kibana

Using ELK

A common practice to use ELK is to download the docker images and use docker-compose up to run the three instances.

https://github.com/deviantony/docker-elk

The stack exposes the following ports:

  • 9200: Elasticsearch HTTP
  • 9300: Elasticsearch TCP transport
  • 5601: Kibana

You just have to go to localhost:5601 in order to access kibana.

Combining Supervisor with ELK

Routing the logs from supervisor to logstash using: https://github.com/dohop/supervisor-logstash-notifier
 and add this snippet of code into supervisord.conf

[eventlistener:logging] environment=LOGSTASH_SERVER="localhost",LOGSTASH_PORT="5000",LOGSTASH_PROTO="tcp" 
command=logstash_notifier --capture-output
events = PROCESS_STATE,PROCESS_LOG

Then you can find the logs from supervisor appears in kibana.

source

And you can even create charts to summarize the logs

source