TIGeR on guard for performance — Part 2
In the previous part of this series “TIGeR on guard for performance — Part 1”, we sorted out why we need to test performance, why we need to test it automatically, and shaped up the original objective.
In this (second) part, we will take a closer look at the selected solution components architecture (“the TIGeR anatomy”), and we will also describe installation processes and ways to set the components up in their basic configuration (“cage the TIGeR”).
Part 2. Studying the TIGeR
More details on the “TIGeR anatomy” (component architecture)
Let us briefly recollect the components which the TIGeR bundle consists of, namely Telegraf + InfluxDB + Grafana:
· Telegraf is an agent aimed at collecting all sorts of the system metrics.
· InfluxDB is a time series database specializing in time series storage.
· Grafana is a web server designed to visualize data.
It is an agent designed to collect system metrics from stand-alone stations. In our case, we will install it on the server side of the product tested. As you can see on the chart, the agent will transfer the collected metrics into the InfluxDB database in the format understandable for both of them (Telegraf and InfluxDB):
Telegraf consists of an executable file and a configuration file. Therefore, no special installation is required.
Plug-ins are among the main advantages of this agent: there are a lot of them (more than 100 for now). However, annoyances sometimes do happen — for example, it contains no plug-in for MSSQL (as a database, and therefore, no possibility to store metrics there).
As for the capabilities that the Telegraf already has without plug-ins, we can highlight the following:
· Outgoing data formats, namely, metrics transfer to various databases or just like communication lines: InfluxDB, JSON, Graphite, ElasticSearch, common text file, Prometheus, etc.
· Incoming data formats, namely, metrics sources: Win_perf_counters, MSSQL, RabbitMQ, Memcached, MongoDB, MySQL, simple HTTP, JSON, Filestat (different file statistics), Docker, Cassandra, SMART, DNS, etc.
Check a more detailed list at https://docs.influxdata.com/telegraf/ and https://github.com/influxdata/telegraf/tree/master/plugins
Plug-ins, not built into Telegraf, can only be downloaded as source files, and that is both an advantage and a disadvantage. On the one hand, you can flexibly fine-tune the plug-in, and on the other hand, you will have to spend extra time to integrate it.
One more thing: unfortunately, we could not compile them into a separate library file and then insert it into the folder with the executable file. It is necessary to include source files of such plug-ins in the Telegraf source files and re-compile the entire agent completely. There are more details on this process on github (see the link above), but now, we will not dwell on this peculiarity, as Telegraf has enough built-in capabilities to meet our objectives.
Let us have a look at a simplified structural chart of this database:
· Database available;
· Inside it, there are tables — Measurements, each has its own name and retention;
· The tables include a Time column — the main and compulsory columns where timestamps are stored;
· There are Fields columns (not indexed) — the very metrics data is stored here;
· Tags columns (indexed) usually store group records, for example, a metric from some server, developed by some user, for some process, etc.
To query data from the database, an SQL-like language is used.
· Quick installation (just unpack the archive, executable files, and configuration files);
· High performance and low resource utilization;
· Record and read data over HTTP (access data through a browser, curl, etc.);
· Support for various data sources except for Telegraf and similar agents (see the list in the previous part in the chapter “What tools, designed for performance testing, exist?”), it is possible to work using the ready libraries, including .NET C#.
What we can highlight among disadvantages is the lack of a graphical user interface (GUI) dedicated to working with the database (interestingly, it was available in previous builds but was removed in the latest versions), meaning that all services should be carried out using Command-Line Interface (CLI).
That is, actually all. Simple, isn’t it?
It consists of a client (usually a browser) and a server used as a proxy to the database (there is also a possibility to connect directly, using the client — we will talk about this mode in a little more detail below), has a simple role model, various data sources and a set of panels grouped by dashboards (henceforth — dashboard).
The entire rendering is performed on the client side. For example, here is a sample of one of those dashboards (however, all the possibilities to set up beautiful charts in Grafana are a topic for a separate article):
In addition to InfluxDB, we can use various databases as data sources (almost all most widespread ones, except MSSQL) and common text files. Initially, there are few supported data sources, but InfluxDB is available and that will do for us.
We can extend Grafana’s capabilities installing plug-ins (unlike with Telegraf, it is much easier to do so — all you have to do is to run the installation command of the required plug-in from the command line). Besides additional data sources, there are plug-ins adding various panels to get an even more flexible and beautiful visualization than already available.
Now, we will briefly tell you about the data request modes, mentioned above:
· Proxy — the client opens Grafana, and it independently executes all requests in InfluxDB;
· Direct — the request goes directly from the browser to InfluxDB (for example, when Grafana does not have access to InfluxDB, but the client has it):
· Easy to install (just unpack the archive);
· Very convenient, flexible, and rich dashboard customization;
· Simple (relatively) and flexible data sources query builder;
· Alert system;
· Snapshots — these are very detailed reports on the conducted test with a link entry point:
Despite all the advantages, very many people choose it primarily for its visualization beauty:
As for the disadvantages, we can only highlight the complicated configuration system. There are lots of instructions, articles, tips, and discussions on this topic, but in our opinion, this shortcoming is by far compensated with advantages.
“Caging” the TIGeR (installation and basic configuration)
From the previous chapter, it has become clear that the installation process of all three components consists of:
· Downloading archives
o Telegraf and InfluxDB — https://portal.influxdata.com/downloads
· Unpacking into any directory;
· Setting up in configuration files (for Grafana dashboards as well).
The general chart of interaction with our test product will look as follows:
· A loading client (or several) loads the tested product, with which the Telegraf agent is installed
· Telegraf collects predefined metrics and records them to InfluxDB
· Observers or operators address Grafana, using the browser
· Grafana makes a query to the InfluxDB database, receives, and visualizes data
In the example above, InfluxDB and Grafana are on the same machine, but actually, they can be on different ones.
Everything is really simple in this case: at the minimum — you do not have to change anything even in the configuration file:
influxd.exe -config influxdb.conf
Important remark: initially, all parts of TIGeR cannot work as Windows services — we have to use some third-party solutions, for example, to “wrap” executable files into services (for example, NSSM — Non-Sucking Service Manager). We will not dwell on this issue in our article anymore, you will decide on your own whether to leave it as it is or to configure the launch as services. For example, in our company, we have set up InfluxDB and Grafana to be launched as services, and Telegraf as a common application.
We will not expatiate upon the InfluxDB configuration parameters either — there are lots of them, but most of them will not be of use for us right now. Later on, we can change the port, authentication, logging, etc.
The executable file influx.exe is the command interface to InfluxDB, with which you can manage various databases and their data. We do not need it for the time being, but we will return to it if necessary.
Almost everything with Telegraf is just the same as it is with InfluxDB, but we will have to tweak the default settings in the configuration file telegraf.conf a little bit:
· logfile = “path to the folder and the file name for the log” — better to enter your own values;
· In the section [outputs.influxdb]:
urls (correct to the real server address where InfluxDB is installed);
database — database name (specify any, but if it is not set, a database with a default name is created automatically);
Next, the file contains the list of Windows Performance Counters — you can leave the values as they are:
telegraf.exe -config telegraf.conf — and let it collect the metrics for the time being (Windows Performance Counters), but we will go on.
In this case, it is even simpler and more difficult at the same time — we can right away run:
and follow the link http://localhost:3000 (login: password — admin: admin, the host name should be replaced with the real one if you log in with Grafana not locally).
The further dashboards configuration and metrics display are all that is difficult here. In this part of the article, we will just run a ready dashboard to comprehend how it works. So, there is nothing difficult for now.
After we log in to Grafana, the first thing to do is to create a data source:
· Source name (any), type — InfluxDB, URL in our example — http://localhost: 8086 (should be replaced with your own if InfluxDB is not installed locally but on some other port);
· Access type — commonly Proxy (we have talked about these types above);
· If authentication is set up in InfluxDB (initially it is not), you also have to specify the required user credentials;
· Database name (see configuration file for Telegraf), time interval — 10 seconds.
That will do for now:
Next, you have to download the ready dashboard template (you will have time to create your own one later on, but for now, it is better to use an example) — https://grafana.com/dashboards/1902
Dashboards come as files with the json extension. Hence, they contain the usual json data.
The package (on the dashboard page) also contains a configuration file for Telegraf: you can compare it with your own and correct points that differ or replace your file with the one from the package (do not forget to correct the parameters mentioned above in the chapter about Telegraf launching).
After that, we have to import the downloaded json file into Grafana, specify the dashboard name, and select the data source (we have already created it above):
Unfortunately, not all panels from this template will work at once — some of them will return the “No data points” message:
We will have to get in the settings of the problem panels and change the host to /^server$/
Then, everything will operate as it should:
In the upper right corner, we can flexibly adjust the display time and the data refresh rate.
Now, we can observe how the charts are drawn and do something on the server where Telegraf is running, just for a change. To get a better feel of Grafana, we can “play” with various settings and parameters right on the test dashboard.
In the second part of our article, we had a closer look at the TIGeR components architecture and also explained how to install and set them up in their basic configuration.
Next, in the third part of the article, we will set up Telegraf to read specific Windows Perf Counters for our product, apply the load to the tested server (i.e. “feed the TIGeR”), learn how to display the values returned by the counters in Grafana, and also consider some Grafana’s fine tuning-issues relating to visualization and notifications (first “TIGeR training”).
Author: Eugene Klimov, System Engineer, ITA Labs