Securely sending application logs and metrics across the internet

Geoff Bourne
9 min readJan 8, 2018

--

Using TLS, Logstash, Telegraf, and friends

Here’s what I wanted to achieve:

  • Run a Spring Boot web application maximizing availability
  • Monitor the application’s logs and metrics for errors and just general activity

Since I work at Rackspace, I get a few monthly credits on their public cloud. That roughly allows for running a Docker Swarm of two 2GB, 2-core nodes, which is plenty for my web application and its MongoDB. What that capacity doesn’t allow is running the telemetry pieces, such as Graylog, Elasticsearch, InfluxDB, Kibana, Chronograf, and so on.

Speaking of Graylog, while a very nice looking and thorough system for log monitoring, analysis, and alerting, from the start it felt too resource intensive. So, in an early version of this endeavor I abandoned Graylog in favor of plain old Elasticsearch + Kibana. Actually, now that Elastic products are up in the 6.x series they are pretty slick and cover all my needs anyway.

So, my application hosting is easily satisfied by my public cloud budget. That leaves me with the challenge of where to run the telemetry gathering and visualization pieces. Here’s where I get to cheat a bit because I already had an Intel NUC sitting under my desk at home running Ubuntu. With a dual-core i5, 16 GB RAM, and a 256 GB SSD it can run a surprising amount of stuff without making a noise.

The NUC will be a great place to run a dual stack of logging/metrics collection and visualization; however, that will be running without the safety net of a hosting provider. There are two things I can leverage though: 1) my entire home network is already behind the NAT of my cable modem and 2) public key cryptography + TLS.

The NAT port forwarding is fairly boring, so here’s the quick run down on that:

  • I assigned a reserved IP for my NUC in the cable modem’s configuration. That way I can be lazy and let the NUC use DHCP, yet get a fixed IP address.
  • I picked a block of 1000 ports to forward to the NUC’s IP address
  • My self imposed rule is that ports in that 1000-port range, which are now exposed to the public internet, are secured by TLS and client certificate authentication

Brief Overview of Public Key Infrastructure (PKI)

Establishing Mutual Trust using PKI

There’s a lot of things in this diagram, but that’s because it includes all of the ingredients needed to establish mutual trust between a client and a server over the untrusted wilds of the internet. That’s all possible thanks to two intense sounding acronyms, PKI and TLS. If this already melts your brain, you can safely skip this entire section.

TLS is a whole topic in itself, so here’s the short version: TLS stands for Transport Layer Security and is the successor to SSL. It provides a way for two network devices to handshake and establish an encrypted communication channel between them. Part of that handshake involves public key cryptography, which is where the concept overlaps with PKI.

PKI, or Public Key Infrastructure, refers to the cryptographic ability to leverage publicly visible certificates to verify a message that is signed by a combination of that public certificate and a private key held by the signer.

Secure communication is all about trust or rather the lack thereof. In short, trust no one. So, first, how does a client trust that the certificate presented by a server is really theirs? This is where the Infrastructure in PKI comes into play.

DNS is actually part of that infrastructure and for the sake of this discussion we have to assume we trust our DNS server and its answers. The client knows a server by its hostname, but an IP connection needs an IP address. We’re still skeptical, but the first step towards trust is that our DNS server has told us the IP address of the server we think we want.

Now, the client can connect to the server over IP and during an early part of that TLS handshake the server presents its public certificate. Certificates include a field called the canonical name (CN) of the holder of that certificate. For server certificates that CN is typically the server’s hostname, but not always. There are also optional extensions to certificates one of which is the Subject Alternative Names (SANs). If the expected hostname of the server matches the CN or one of the DNS entries in the SAN list, then the client can further trust the server is the one expected.

But wait, just because the certificate looks valid and contains the expected hostname, how can the client trust the certificate in the first place? That can be answered by resolving a chain-of-trust.

Certificates (like C, above) are issued by a certificate authority and that certificate authority has a certificate (B) and that certificate is issued by a certificate authority…and that’s why it is called a chain. Like the diagram above, that chain of issuing can’t go on forever, so in the majority of cases there are well known root and intermediate certificate authorities. How are those well known? Typically because your operating system or browser says so. For example, if we trust B because it is bundled with our browser, then we can trust C because it was issued by B.

Using self-signed certificates to build a triad of trust

There was a lot going on in that previous section and that only scratched the surface. If you skipped that section, glance back at the diagram “Establishing Mutual Trust using PKI”. Items within that yellow, center cloud are things that my client and server trust and can even share “in the clear”. Of particular interest is that Issuer Certificate, which I can trust, because I created it.

The following diagram shows the triad of trust I need to implement mutually authenticated TLS across my own systems:

The diagram only lists the public certificate file name, but there’s actually three files in each set:

The public certificate file, such as “ca.pem” and contains the header line:

-----BEGIN CERTIFICATE-----

The key file, such as “ca-key.pem” and contains the header line:

-----BEGIN RSA PRIVATE KEY-----

and a bundled certificate file, “ca-bundle.pem” which contains both the public certificate content and key such as:

-----BEGIN CERTIFICATE-----
MIIDgjCCAmqgAwIBAgIUYoFix22qYHwMLqW7MFSm9WrCDxswDQYJKoZIhvcNAQEL
...
-----END CERTIFICATE-----
-----BEGIN RSA PRIVATE KEY-----
MIIEpAIBAAKCAQEA7LTksyDmYwH8ITenwTdsTzPxC1X9X8kBHisEdQpeQlAY+GjD
...
-----END RSA PRIVATE KEY-----

The set of three files covers all of the configuration cases where some software, such as Logstash, work with separate certificate and key files, while other software, such as HAproxy, require a bundled file.

With a tiny helper script, I used CFSSL to generate the three sets of files. (Later I’ll rewrite that helper script to be a makefile.) In the past I had written my own bash script that wrapped openssl to create certificates, but cfssl was much easier. One part that makes it easier is that you define the certificate signing request (CSR) as a JSON file. With that you can tweak, re-run, and source control the definition.

Here is the CSR JSON file I used to generate the “logstash-server” files (with fake hostnames):

{
"CN": "full.host.name",
"hosts": [
"full.host.name",
"127.0.0.1"
],
"key": {
"algo": "rsa",
"size": 2048
},
"names": [
{
"C": "US",
"L": "Your Town",
"O": "Your Org",
"OU": "logstash",
"ST": "Your State"
}
]
}

For “ca” and “logstash-client” is the same except I removed the hosts field and replaced the CN with “CA” and “client”, respectively.

The first step actually generating the certificate filesets is to generate the self-signed, “CA” certificate:

mkdir -p certs
cfssl gencert -initca csr-ca.json | cfssljson -bare certs/ca

The cfssljson tool in that pipeline picks out the public certificate and key parts from the output and saves them each to a file, under a certs sub-directory in my case.

With the “CA” certificate I can issue the client and server certificates:

cfssl gencert -profile=client \
-ca=certs/ca.pem -ca-key certs/ca-key.pem \
csr-logstash-client.json | cfssljson -bare certs/logstash-client

cfssl gencert -profile=www \
-ca=certs/ca.pem -ca-key certs/ca-key.pem \
csr-logstash-server.json | cfssljson -bare certs/logstash-server

Using the -profile option leverages a default configuration in cfssl to generate certificates with specific “uses” for each case.

The “-bundle” derivative of each pair is created by cating the two generated files:

for t in ca logstash-client logstash-server; do
cat certs/${t}.pem certs/${t}-key.pem > certs/${t}-bundle.pem
done

Publishing certificates and keys into the Docker Swarm

Docker Swarm makes it very easy to publish sensitive content into a Swarm of services. First, I published the top of the triad, “ca.pem” BUT NOT its key. Even though all private key files should be stored and transmitted carefully, the CA key file should be treated with extra special care since it is effectively the keys to the kingdom.

This is the command to create a secret containing the “ca.pem” content:

docker secret create ca.pem ca.pem

The first “ca.pem” is the name of the secret and the second is the source file of the content. Using a filename as the secret name works out nicely since the default behavior is to create a file named for the secret under /var/run/secrets. In the application swarm, I did the same for the public, key, and bundle files for the client. In the telemetry swarm, I did the same for the server files.

Deploying the application swarm

For the application swarm, I am deploying a Docker Swarm Stack defined in a Docker Compose file. In this section I’ll show only snippets of the compose file.

First, the compose file is opened with the schema version and an example of the application service. Notice it only needs to refer to the logs and metrics receiver by their service name, since Docker Swarm will take care of DNS declarations amongst the containers:

version: "3.4"

services:
app:
environment:
LOGGING_GELF_SERVER:
logstash
SPRING_METRICS_EXPORT_STATSD_HOST: telegraf

The logstash service is declared as:

logstash:
image:
docker.elastic.co/logstash/logstash-oss:6.1.1
configs:
- source: logstash
target: /usr/share/logstash/pipeline/logstash.conf
secrets:
- ca.pem
- logstash-client.pem
- logstash-client-key.pem
ports:
- 127.0.0.1:12201:12201/udp

and telegraf as:

telegraf:
image:
telegraf:1.5.0
secrets:
- ca.pem
- logstash-client.pem
- logstash-client-key.pem
configs:
- source: telegraf
target: /etc/telegraf/telegraf.conf
mode: 0660
ports:
- 127.0.0.1:8125:8125/udp

Since we published the certificate and key content with docker secret commands manually, the secrets are declared in the compose file as external:

secrets:
ca.pem:
external:
true
logstash-client.pem:
external:
true
logstash-client-key.pem:
external:
true

Finally, we reference the respective configuration files for logstash and telegraf. Unlike the secrets, we’ll let the file content get published as service configuration during stack deployment:

configs:
logstash:
file:
./logstash.conf
telegraf:
file:
./telegraf.conf

Since secrets declared on Swarm services are situated under /var/run/secrets by default, the following logstash configuration references them there:

input {
gelf {
}
}

output {
tcp {
host => "..."
port => ...
codec => "json_lines"
ssl_enable => true
ssl_verify => true
ssl_cacert => "/var/run/secrets/ca.pem"
ssl_cert => "/var/run/secrets/logstash-client.pem"
ssl_key => "/var/run/secrets/logstash-client-key.pem"
}
}

The telegraf configuration likewise references the certificate files in /var/run/secrets:

# Configuration for telegraf agent
[agent]
interval = "10s"

[[inputs.statsd]
]
protocol = "udp"

service_address = ":8125"

[[outputs.influxdb]
]
urls = ["https://your.host.name:5086"]
database = "telegraf"

ssl_ca = "/var/run/secrets/ca.pem"
ssl_cert = "/var/run/secrets/logstash-client.pem"
ssl_key = "/var/run/secrets/logstash-client-key.pem"

With all that, the Swarm is deployed (and re-deployed) using:

docker stack deploy --compose-file=docker-compose.yml app

Deploying the telemetry swarm

The telemetry swarm follows a very similar process, so I’ll skip the docker secret publication since it also includes the “ca.pem” file, but the “-server” files instead of “-client”.

Logstash is defined much in the same way in the docker compose file:

version: '3.4'
services:

logstash:
image:
docker.elastic.co/logstash/logstash-oss:6.1.1
configs:
- source: logstash
target: /usr/share/logstash/pipeline/logstash.conf
ports:
- 5555:5555
secrets:
- ca.pem
- logstash-server-key.pem
- logstash-server.pem

The configuration of InfluxDB is more awkward since it doesn’t (yet, as of 1.4.2) support client TLS authentication. Instead I’m using an open source tool from Square called ghostunnel. The outside world accesses port 5086, serviced by ghostunnel and it in turn tunnels valid connections through to port 8086 of InfluxDB:

influxdb:
image:
influxdb:1.4.2

influxdb_tls_auth:
image:
squareup/ghostunnel
command: >
server
--listen 0.0.0.0:5086
--unsafe-target
--target influxdb:8086
--keystore /certs/logstash-server-bundle.pem
--cacert /certs/ca.pem
--allow-all
secrets:
# Can't use default /var/run/secrets
# since ghostunnel image is based on alpine
- source: ca.pem
target: /certs/ca.pem
- source: logstash-server-bundle.pem
target: /certs/logstash-server-bundle.pem
ports:
- 5086:5086

Again, the telemetry swarm is deployed as a stack:

docker stack deploy --compose-file=docker-compose.yml telemetry

Lots of details omitted

A lot of details have been purposely omitted from this article, so please feel free to ask any and all questions in the comments. I maintain the Docker compose files of my application and telemetry swarms in these two spots:

--

--