Monitoring your home network with InfluxDB on Raspberry Pi with Docker

Pete Shima
Dec 28, 2017 · 10 min read

Backstory

Over a year ago I was having all sorts of networking problems at home, major packet loss, complete networking outages and more. They were spurious, unpredictable and hard to diagnose. I wasn’t sure if it was my desktop/laptop, phone, wireless, or my internet provider.

So I decided to put a Raspberry Pi 2 on my network with InfluxDB, telegraf and grafana with some network monitors in place. I need historical data because the issues I was seeing were sporadic.

I’ve used it over the past year to debug many issues and this year I’m polishing it up with a new setup on a Raspberry Pi 3.

Preface

These instructions are set for a Mac and a Pi 2 or 3 using raspbian 9 Stretch. This expects you know the basics of running Linux and a Raspberry Pi.

Initial Setup

First we download Etcher and the raspbian image. We use this to drop the image in the SD card we use in the Pi. We want the lite image, however the full image may work just fine as well.

Next for a Pi3 we need to connect it to the wifi.

Then we enable SSH to remotely connect to the Pi.

After that we can now remotely connect.

% ssh pi@172.16.1.200
pi@172.16.1.200’s password:
Linux raspberrypi 4.9.59-v7+ #1047 SMP Sun Oct 29 12:19:23 GMT 2017 armv7l
The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.
Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
Last login: Tue Dec 26 19:19:27 2017 from 172.16.1.86
pi@raspberrypi:~ $

Now that the Pi is setup time to install our software packages.

Setup Docker

Install docker-ce via https://docs.docker.com/engine/installation/linux/docker-ce/debian/#install-using-the-convenience-script

Shortcut is

pi@raspberrypi:~/docker $ curl -fsSL get.docker.com -o get-docker.sh
pi@raspberrypi:~/docker $ sudo sh get-docker.sh

Check that docker is running

pi@raspberrypi:~ $ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
pi@raspberrypi:~ $

Setup InfluxDB

For this we use a pre built container: https://github.com/hypriot/rpi-influxdb

sudo docker run -d --volume=/var/influxdb:/data -p 8086:8086 hypriot/rpi-influxdb

Now when running this I got this error:

pi@raspberrypi:~ $ sudo docker run -d --volume=/var/influxdb:/data -p 8086:8086 hypriot/rpi-influxdb
9499d9a93862749fa6573dde8d0b6303751d2985e74d7612a84c3ac866f1a07e
docker: Error response from daemon: cgroups: memory cgroup not supported on this system: unknown.

Which lead me to this issue: https://github.com/moby/moby/issues/35587

And leads me to this specific comment on how to fix: https://github.com/moby/moby/issues/35587#issuecomment-353976863

After that, reboot.

And now we’ve got influxdb running

pi@raspberrypi:~ $ sudo docker run -d --volume=/var/influxdb:/data -p 8086:8086 hypriot/rpi-influxdb
c10b584e321034f5f360be506ba36b80142054af11162828a792bb66bb64146f
pi@raspberrypi:~ $ sudo docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
c10b584e3210 hypriot/rpi-influxdb "/usr/bin/entry.sh /…" 23 seconds ago Up 21 seconds 0.0.0.0:8086->8086/tcp sad_neumann

Now we need to follow the rest of the instructions to initialize the influxdb database

pi@raspberrypi:~ $ sudo docker exec -it c10b584e3210 /usr/bin/influx
Connected to http://localhost:8086 version 1.2.2
InfluxDB shell version: 1.2.2
> CREATE DATABASE db1
> SHOW DATABASES
name: databases
name
----
_internal
db1
> USE db1
Using database db1
> CREATE USER root WITH PASSWORD 'passhere' WITH ALL PRIVILEGES
> GRANT ALL PRIVILEGES ON db1 TO root
> SHOW USERS
user admin
---- -----
root true

Setting up Telegraf

We’re going to use telegraf as our local agent to collect metrics and statistics. https://hub.docker.com/r/arm32v7/telegraf/

We have to use the arm32v7 version

pi@raspberrypi:~ $ sudo docker run --net=container:c10b584e3210 arm32v7/telegraf
Unable to find image 'arm32v7/telegraf:latest' locally
latest: Pulling from arm32v7/telegraf
Digest: sha256:7da91efbb3e228a31d3859daa0609c91f8c76970dd966a20d9e65e5193cff40a
Status: Downloaded newer image for arm32v7/telegraf:latest
2017/12/26 21:35:15 I! Using config file: /etc/telegraf/telegraf.conf
2017-12-26T21:35:15Z I! Starting Telegraf v1.5.0
2017-12-26T21:35:15Z I! Loaded outputs: influxdb
2017-12-26T21:35:15Z I! Loaded inputs: inputs.mem inputs.processes inputs.swap inputs.system inputs.cpu inputs.disk inputs.diskio inputs.kernel
2017-12-26T21:35:15Z I! Tags enabled: host=c10b584e3210
2017-12-26T21:35:15Z I! Agent Config: Interval:10s, Quiet:false, Hostname:"c10b584e3210", Flush Interval:10s

Now let’s generate a config file we can mount into the container

sudo docker run --rm arm32v7/telegraf telegraf config > telegraf.conf

And we can modify it as needed, we won’t do that quite yet though, but now we can run the container mounted with the local config file

docker run --net=container:c10b584e3210 -v $PWD/telegraf.conf:/etc/telegraf/telegraf.conf:ro arm32v7/telegraf

Setup Grafana

More recently there are now 3rd party grafana docker containers on docker hub which makes this considerably easier

https://hub.docker.com/r/fg2it/grafana-armhf/

docker run -i -p 3000:3000 --name grafana fg2it/grafana-armhf:v4.1.2

Now we can test by hitting the grafana endpoint and logging in with admin:admin

Voila

But adding the data source fails. Trying to run the container with the existing influxdb networked container fails as well.

pi@raspberrypi:~ $ docker run --net=container:c10b584e3210 -i -p 3000:3000 --name grafana fg2it/grafana-armhf:v4.1.2
docker: Error response from daemon: conflicting options: port publishing and the container type network mode.
See 'docker run --help'.

This is a subtetly of docker networking and docker has several modes of networking. In this case we are going to just use host based networking for simplicity.

Pulling things together.

So far we’ve done the basics of booting the three software packages together to ensure that has worked but now we need to reconfigure them to work together and to start on boot. After that we can start customizing our configuration.

First we need to stop all the containers we have running. The first steps were just to make sure everything ran.

pi@raspberrypi:~ $ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
c10b584e3210 hypriot/rpi-influxdb "/usr/bin/entry.sh /…" About an hour ago Up About an hour 0.0.0.0:8086->8086/tcp sad_neumann
pi@raspberrypi:~ $ docker stop c10b584e3210
c10b584e3210
pi@raspberrypi:~ $ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
pi@raspberrypi:~ $

Since we are only planning on running monitoring on this pi, we can just use docker host networking.

Start influxdb using host networking.

pi@raspberrypi:~ $ sudo docker run --rm -d --name=influxdb --net=host --volume=/var/influxdb:/data hypriot/rpi-influxdb
2e30695d09b2baac36653fca83fc5d797c9bf3d6243c831543beeb7121463811

And run telegraf

pi@raspberrypi:~ $ sudo docker run --rm -d --net=host --name telegraf -v $PWD/telegraf.conf:/etc/telegraf/telegraf.conf:ro arm32v7/telegraf
616a096391fcda76c111528e6e568dff9c6159a49f7c18ef1498fd9f108e6487

First let’s create some persistent storage for grafana

pi@raspberrypi:~ $ sudo docker run -d -v /var/lib/grafana --name grafana-storage busybox:latest

And then run grafana with the persistent storage

pi@raspberrypi:~ $ sudo docker run --rm -d --net=host --name grafana --volumes-from grafana-storage fg2it/grafana-armhf:v4.1.2
de31ad82e994f579bb3aff0a3765152c8af3d290b1ce66fd04bb728c99a72e3d

Verify the install

Before we get to the monitoring we should verify the install is working.

First we can tail all the docker logs

Influx

pi@raspberrypi:~ $ docker logs influxdb
=> Starting InfluxDB ...
=> No database need to be pre-created
exec influxd -config=${CONFIG_FILE}
<omitted>
[I] 2017-12-28T17:43:00Z Listening for signals
[I] 2017-12-28T17:43:00Z Sending usage statistics to usage.influxdata.com
[httpd] 127.0.0.1 - - [28/Dec/2017:17:43:00 +0000] "POST /write?consistency=any&db=telegraf HTTP/1.1" 204 0 "-" "-" 8c1abea5-ebf6-11e7-8001-000000000000 15528
[httpd] 127.0.0.1 - - [28/Dec/2017:17:43:00 +0000] "POST /write?consistency=any&db=telegraf HTTP/1.1" 204 0 "-" "-" 8c1d763e-ebf6-11e7-8002-000000000000 18108

Telegraf

2017/12/28 17:44:20 I! Using config file: /etc/telegraf/telegraf.conf
2017-12-28T17:44:20Z I! Starting Telegraf v1.5.0
2017-12-28T17:44:20Z I! Loaded outputs: influxdb
2017-12-28T17:44:20Z I! Loaded inputs: inputs.kernel inputs.mem inputs.processes inputs.swap inputs.system inputs.cpu inputs.disk inputs.diskio
2017-12-28T17:44:20Z I! Tags enabled: host=raspberrypi
2017-12-28T17:44:20Z I! Agent Config: Interval:10s, Quiet:false, Hostname:"raspberrypi", Flush Interval:10s

Grafana

pi@raspberrypi:~ $ docker logs grafana
t=2017-12-28T17:42:45+0000 lvl=info msg="Starting Grafana" logger=main version=4.1.2 commit=v4.1.2 compiled=2017-02-13T12:13:31+0000
t=2017-12-28T17:42:45+0000 lvl=info msg="Config loaded from" logger=settings file=/usr/share/grafana/conf/defaults.ini
t=2017-12-28T17:42:45+0000 lvl=info msg="Config loaded from" logger=settings file=/etc/grafana/grafana.ini
t=2017-12-28T17:42:45+0000 lvl=info msg="Config overriden from command line" logger=settings arg="default.paths.data=/var/lib/grafana"
t=2017-12-28T17:42:45+0000 lvl=info msg="Config overriden from command line" logger=settings arg="default.paths.logs=/var/log/grafana"
t=2017-12-28T17:42:45+0000 lvl=info msg="Config overriden from command line" logger=settings arg="default.paths.plugins=/var/lib/grafana/plugins"
t=2017-12-28T17:42:45+0000 lvl=info msg="Path Home" logger=settings path=/usr/share/grafana
t=2017-12-28T17:42:45+0000 lvl=info msg="Path Data" logger=settings path=/var/lib/grafana
t=2017-12-28T17:42:45+0000 lvl=info msg="Path Logs" logger=settings path=/var/log/grafana
t=2017-12-28T17:42:45+0000 lvl=info msg="Path Plugins" logger=settings path=/var/lib/grafana/plugins
t=2017-12-28T17:42:45+0000 lvl=info msg="Initializing DB" logger=sqlstore dbtype=sqlite3
t=2017-12-28T17:42:45+0000 lvl=info msg="Starting DB migration" logger=migrator
t=2017-12-28T17:42:45+0000 lvl=info msg="Executing migration" logger=migrator id="copy data account to org"
t=2017-12-28T17:42:45+0000 lvl=info msg="Skipping migration condition not fulfilled" logger=migrator id="copy data account to org"
t=2017-12-28T17:42:45+0000 lvl=info msg="Executing migration" logger=migrator id="copy data account_user to org_user"
t=2017-12-28T17:42:45+0000 lvl=info msg="Skipping migration condition not fulfilled" logger=migrator id="copy data account_user to org_user"
t=2017-12-28T17:42:45+0000 lvl=info msg="Starting plugin search" logger=plugins
t=2017-12-28T17:42:45+0000 lvl=info msg="Initializing Alerting" logger=alerting.engine
t=2017-12-28T17:42:45+0000 lvl=info msg="Initializing CleanUpService" logger=cleanup
t=2017-12-28T17:42:45+0000 lvl=info msg="Initializing HTTP Server" logger=server address=0.0.0.0:3000 protocol=http subUrl=

If all is well we can then setup Grafana. There are other tests we can run here to verify and I might add them later.

Setup Grafana Datasource

When loading up grafana at http:<ip>:3000 we are presented with the front page again and can move on to configuring a basic influx datasource

With the data source setup we can now setup a new dashboard. To test things out we will graph performance of the Pi itself, then add our networking monitoring graphs.

Hitting the edit button on the Panel Title allows us to edit it.

And we can get a simple graph together of average CPU as a test.

Adding basic monitoring

With the above we should now have a working monitoring setup, we just need to add some checks to it.

With influxdb and telegraf this is pretty easy right out of the box. In the telegraf.conf file (initially mapped right in the pi user home dir) we are going to change a few of the inputs.

Scrolling through the default file you will see settings regarding the agent, outputs aggregators and inputs. You’ll also notice that most of the items are commented out except our existing host metrics like [[inputs.cpu]] which we used to graph above.

We want to start by modifying [[inputs.dns_query]]

# # Query given DNS server and gives statistics
[[inputs.dns_query]]
## servers to query
servers = ["8.8.8.8", "4.2.2.1"] # required
## Domains or subdomains to query. "."(root) is default
domains = ["www.google.com"] # optional
## Query record type. Default is "A"
## Posible values: A, AAAA, CNAME, MX, NS, PTR, TXT, SOA, SPF, SRV.
record_type = "A" # optional
## Dns server port. 53 is default
port = 53 # optional
## Query timeout in seconds. Default is 2 seconds
timeout = 2 # optional

Here we are setting up telegraf to do a dns query for www.google.com against 2 common DNS servers. You should also input the DNS server you use on your local network as well. With this we will get graphs for DNS response time.

Next we want to modify [[inputs.ping]]

# # Ping given url(s) and return statistics
[[inputs.ping]]
## NOTE: this plugin forks the ping command. You may need to set capabilities
## via setcap cap_net_raw+p /bin/ping
#
## urls to ping
urls = ["www.github.com","www.amazon.com","8.8.8.8","4.2.2.1","172.16.1.1"]
## number of pings to send per collection (ping -c <COUNT>)
count = 3
## interval, in s, at which to ping. 0 == default (ping -i <PING_INTERVAL>)
ping_interval = 15.0
## per-ping timeout, in s. 0 == no timeout (ping -W <TIMEOUT>)
timeout = 10.0
## interface to send ping from (ping -I <INTERFACE>)
interface = "wlan0"

Here we setup ping monitoring to some popular sites, our DNS servers and something on our local network. If you are using ethernet instead of wifi (standard on Pi2) then you’ll want to change the interface to “eth0”

With the note in the ping plugin comments we also need to allow pings

pi@raspberrypi:~ $ sudo setcap cap_net_raw+p /bin/ping

Otherwise we will see something like this in our telegraf log:

2017-12-28T18:21:50Z E! Error in plugin [inputs.ping]: Fatal error processing ping output: 8.8.8.8
2017-12-28T18:21:50Z E! Error in plugin [inputs.ping]: Fatal error processing ping output: 172.16.1.1
2017-12-28T18:21:50Z E! Error in plugin [inputs.ping]: Fatal error processing ping output: 4.2.2.1

There is also a pretty silly issue in the standard telegraf docker container we used. The version of ping that is installed isn’t compatible with it’s own ping plugin. There is an open issue for this here that you will need to use as a workaround until it is fixed: https://github.com/influxdata/telegraf/issues/3295

You can also see throughout the config file there are many other things we can enable to graph if we wanted as well. [[inputs.http_response]] is something you may want to enable as well.

After changing the configuration we just restart telegraf.

pi@raspberrypi:~ $ sudo docker restart telegraf
telegraf

Setting up the network dashboard

We can setup a new dashboard for ping response time.

We can also add one for packet loss

And finally DNS

Summary

Now we’ve got monitoring in place and a dashboard setup to review it as needed. For about $50 we’ve now got a flexible monitoring solution that can be plugged in pretty much anywhere on your network to give you another view of network performance.

Now whenever I have a networking issue I can load up these graphs to get the Pi’s view of what is happening. Turns out I had several different issues going on and sometimes it was ISP, sometimes wireless and sometimes my local device.

Pete Shima

Written by

Opinions are my own and do not reflect that of my employer.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade