When you throw a web crawler to a DevOps — ELK stack
After adding supervisor to enhance the stability of the web crawler, we are going to solve another major problem — observability.
Supervisor did output the log of our code. However, how are we going to read those logs? Using grep? Or open a notepad and press ctrl-f to find bugs? The log is so raw that it is not readable yet. It has to be put in an organized way.
Here we introduce the ELK stack, which is a bit overkill in this task, but it’s ok as it’s just a tutorial.
What is ELK
ELK is a combination of three components:
- E — Elasticsearch
- L — Logstash
- K — Kibana
These three modules are responsible for different tasks, simply saying
- Logstash — to route the log data into Elasticsearach
- Elasticsearch — a powerful search engine, here it is used for indexing the logs
- Kibana — a visualization tool that is binded with Elasticsearch
Each tool has unique and powerful features. It is natural to use these three tools together. You can add even more modules into this stack, e.g.
- beats, a data shipper to ship different kinds of data
- Elastic SQL, using SQL to access the features of Elasticsearch
If you don’t like the kibana interface: this is how you can customize the layout (this tutorial maybe no longer valid for kibana after version 6.1.3)
Or you may choose other visualization tools like Grafana, or Graphite
Comparison of the three tools: https://stackshare.io/stackups/grafana-vs-graphite-vs-kibana
A common practice to use ELK is to download the docker images and use
docker-compose up to run the three instances.
The stack exposes the following ports:
- 9200: Elasticsearch HTTP
- 9300: Elasticsearch TCP transport
- 5601: Kibana
You just have to go to localhost:5601 in order to access kibana.
Combining Supervisor with ELK
Routing the logs from supervisor to logstash using: https://github.com/dohop/supervisor-logstash-notifier
and add this snippet of code into
events = PROCESS_STATE,PROCESS_LOG
Then you can find the logs from supervisor appears in kibana.
And you can even create charts to summarize the logs