Send your systemd journal logs to Graylog

Published in

The Startup

7 min readOct 8, 2019

How do you get your system logs to Graylog? The Graylog sidecar is compatible with Elastic Beats, but FileBeat does not support the systemd journal (yet, see here).

Well, here’s a simple implementation of a basic log exporter. It doesn’t have support for some of the features you may want, such as back-pressure, but it does work. You can run it inside a container, on a host, and it even works on those nice A1 ARM64 instances in AWS.

Unlike most file based logs, the systemd journal is not directly human-readable and uses a cursor-based API for access. The journalctl command can be used to access the journal from the shell.

My implementation of this is based on another implementation from a friend of mine, Steve Jansen. This code is entirely my own, however.

Introducing GELF

The Graylog Extended Log Format (GELF) is a message exchange format used to push messages into Graylog. It is a binary TCP and UDP protocol.

Despite the concerns around TCP mentioned in the documentation, I’ve found TCP to be acceptable in real-world scenarios, especially when fronted by an effective load balancer.

Each message is a JSON payload, and each message needs to be delimited with a \0 NUL byte. Note that Bash does not handle NUL bytes well.

My send-logs function pushes newline-delimited messages to Graylog from stdin and translates newline characters to NUL bytes:

send-logs() {
    # Have to inject NUL bytes here as bash can't store them
    cat - \
    | tr '\n' '\0' \
    | nc \
      -q 1 \
      -w 10 \
      ${GRAYLOG_GELF_ADDRESS} ${GRAYLOG_GELF_PORT}
}

The simple version

At its heart, the logic here is trivial:

#/bin/bashset -e
set -o pipefailBATCH_SIZE=1000
HOSTNAME="$(hostname)"echo "Shipping logs in batches of $BATCH_SIZE from $HOSTNAME ($GRAYLOG_ENVIRONMENT} to $GRAYLOG_GELF_ADDRESS:$GRAYLOG_GELF_PORT"while [[ 1 -eq 1 ]]
do
  get-records
  RECORD_COUNT="$(wc -l <<< "$RECORDS")"  if [[ "$RECORDS" == "" ]]; then
    RECORD_COUNT=0
  fi  if [[ $RECORD_COUNT -eq 0 ]]; then
    # Logs caught up
    CURRENT_BATCH_SIZE=$BATCH_SIZE
    sleep 10
    continue;
  fi  log-now
done

The loop executes continuously and executes get-records to pull out the next log messages to send. If there are no messages pending, it sleeps for ten seconds, otherwise it sends the logs using log-now:

log-now() {
    WIRE_DATA="$(format-logs)"
    send-logs <<< "$WIRE_DATA"
    write-cursor
}

format-logs translates the journal logs into GELF JSON records. We’ve already seen how send-logs sends them to Graylog. write-cursor tracks the current location of the systemd journal so that we only read newer records:

write-cursor() {
    echo -n "$CUR_POS_NEW" > "${JOURNAL_CURSOR_TRACKING}"
}

Of course, this only makes sense once we see how CUR_POS_NEW, but I’m going to revisit that a bit later for reasons that will become clear.

Formatting GELF messages

format-logs is responsible for producing GELF JSON content. It’s also a place where we can add custom filtering, if we wish. In my case, I wanted to filter out protokube.service as it was very noisy and low-value. In general, I do not recommend filtering out log content as it provides an opportunity for an attacker to go unnoticed. At the same time, you have to be pragmatic and acknowledge the cost of doing so.

The function relies on jq to transform the message content from the journal. jq is a fantastically useful program for DevOps scripting.

The final step in the jq program defines the output structure:

| { 
        "version": "1.1",
        "host": "${HOSTNAME}",
        "short_message": ( ._MESSAGE | split("\n") | .[0] ),
        "full_message": ._MESSAGE,
        "timestamp": (.__REALTIME_TIMESTAMP | tonumber | . / 1000.0 | floor | . / 1000.0),
        "level": "1",
        "_environment": "${GRAYLOG_ENVIRONMENT}"
      }
      + .

Each of these fields, except _environment is required by Graylog. You can choose what you provide, but not providing something will cause the message to be dropped. The most important field is timestamp and this must be in the right format, as it is here. You can hit problems when you include the wrong number of decimal places or in the wrong scale.

The _environment value sets the Graylog custom field environment. All custom fields must be prefixed with a single underscore. Additional underscores will become part of the field name.

Finally, the + . at the end adds the additional custom fields coming from the previous stage of the jq program:

  select(.MESSAGE != null and .MESSAGE != "")
  | select(._SYSTEMD_UNIT != "protokube.service")
  | select( ( (._SYSTEMD_UNIT == "kubelet.service" )
              and
              (.MESSAGE | contains("GET /metrics") )
            )
            | not
          )
  | with_entries( .key |= ( ltrimstr("_") | "_" + . ) )
  | del(._ID) | del(._id) | del(.__CURSOR) | del(._MACHINE_ID)

Here we see some basic filtering and name mangling to give us nice field names in Graylog. You can adapt this for your needs, of course. I also drop several fields that have no real meaning for the log platform.

This is all wrapped together in the format-logs function:

format-logs() {
  # Removing noisy/unhelpful fields: cursor, machine id; and fields with a conflict with Graylog: id
  # Skipping protokube.service as very noisy
  # Timestamp converted from microseconds to seconds with millisecond component
  JQ_CMD="$(cat - <<EOF
      ... // above program
EOF
  )"
  jq -Mr "$JQ_CMD" <<< "$RECORDS"
}

The complicated bit

Where things get a little more complicated are the three areas of resilience and recovery, error handling and poison message handling (due to corrupt logs or other issues).

We want the system to be able to pick up from where it left of, and ensure that all logs are delivered, even if there’s a temporary network outage. Secondly, we don’t want a bad journal message to prevent us from continuing to send logs. Thirdly, both of these mean that we have to start handling error codes from processes and that means that set -e will start to cause us problems. This makes the code much more complicated and less readable.

For this content, I consider a data error (and thus a form of poison message) to be any message that the program is unable to transform into the GELF JSON format via jq.

log-now() {
    set +e
    WIRE_DATA="$(format-logs)"
    LOCAL_EXIT_CODE=$?
    if [[ $LOCAL_EXIT_CODE -ne 0 ]]; then
      DATA_ERROR=1
      return $LOCAL_EXIT_CODE
    fi
    DATA_ERROR=0
    set -e
    if [[ "$WIRE_DATA" != "" ]]; then
      send-logs <<< "$WIRE_DATA"
    fi
    write-cursor
}

If a poison message is encountered in a batch, the program attempts to process messages one at a time until it is passed the poison message batch:

process-poison-messages() {
  # We can have multiple poison messages sequentially, so we need
  # to see where we can resume. If this still fails, we can skip to
  # current journal position but then we have lost an unknown number
  # of records
  echo 'Skipping over poison records'
  CURRENT_BATCH_SIZE=$BATCH_SIZE
  get-records
  poison_counter=0
  poison_record_count="$RECORD_COUNT"
  POISON_RECORDS="$RECORDS"
  RECORD_COUNT=1  if [[ $poison_record_count -lt 2 ]]; then
    echo "No newer messages after poison record: $POISON_RECORDS"
    
    CUR_POS_NEW="$( \
      journalctl -n 0 --show-cursor --quiet \
      | sed 's/^-- cursor: *//' \
      )"
    write-cursor
    return
  fi  while [[ $poison_counter -lt $poison_record_count ]]; do
    RECORDS="$(head -n 1 <<< "$POISON_RECORDS")"
    POISON_RECORDS="$(tail +2 <<< "$POISON_RECORDS")"
    poison_counter=$((poison_counter + 1))    set +e
    CUR_POS_NEW="$(jq -rM .__CURSOR <<< "$RECORDS")"
    LOCAL_EXIT_CODE=$?
    set -e
    if [[ $LOCAL_EXIT_CODE -ne 0 ]]; then
      echo "Poison record found, skipping: $RECORDS."
      continue;
    fi
    set +e
    log-now
    LOCAL_EXIT_CODE=$?
    set -e
    if [[ $LOCAL_EXIT_CODE -ne 0 ]]; then
      if [[ $DATA_ERROR -eq 1 ]]; then
        echo "Poison record found, skipping: $RECORDS."
        write-cursor
        # ^^ We can do this because it is only for one record
      else
        exit $? # Don't trap network or other issues
      fi
    else
      echo 'Resuming normal processing'
      return 0 # Can resume normal processing
    fi
  done
  # It is possible that we've got to this point because we haven't
  # been able to read a valid cursor. In this case, we are losing
  # data but still better to resume normal processing
  echo 'Unable to recover valid cursor, skipping to head'
  CUR_POS_NEW="$( \
    journalctl -n 0 --show-cursor --quiet \
    | sed 's/^-- cursor: *//' \
    )"
  write-cursor
  return 0
}

See the link at the end of this story for the whole code.

Resilience is achieved, as you may have already seen, by capturing and updating the _CURSOR from the journal itself. This is written out in write_cursor, as we’ve already seen.

Running in Docker

This approach can also be used from inside a container. Here’s my Dockerfile:

FROM ubuntu:18.04LABEL maintainer="Steve Horsfield <stevehorsfield@users.noreply.github.com>"
USER rootWORKDIR /ENV TINI_VERSION=v0.18.0RUN apt-get update \
    && apt-get install \
       -y --no-install-recommends \
       curl jq systemd ca-certificates netcat \
    && rm -rf /var/lib/apt/lists/* \
    && curl -L https://github.com/krallin/tini/releases/download/${TINI_VERSION}/tini -o /tini --silent --fail \
    && chmod u+x /tiniENV JOURNAL_LOCATION="/var/log/journal"
ENV JOURNAL_CURSOR_TRACKING="/var/log/journal-shipper-cursor.pos"
ENV GRAYLOG_GELF_ADDRESS=graylog-gelf.myorg.org
ENV GRAYLOG_GELF_PORT=1504
ENV GRAYLOG_ENVIRONMENT=missingADD send.sh /send.shENTRYPOINT ["/tini", "/bin/bash", "--", "/send.sh"]

If you update the tini URL to point to an ARM64 one, this also works.

I launch this from a systemd unit, of course:

[Unit]
Description=Send journald logs to Graylog
After=network-online.target docker.service
Wants=docker.service[Service]
Type=simpleExecStartPre=-/usr/bin/docker stop %n
ExecStartPre=-/usr/bin/docker rm --force %n# File must exist for Docker volume mount to work
ExecStartPre=/bin/touch /var/log/journald-graylog-shipper.posExecStartPre=/usr/bin/docker pull \
    some-repo/journald-graylog-shipper:TAG# CPU quota limits to 30% of a single CPU at max
ExecStart=/usr/bin/docker run \
    -t --rm --name %n \
    --volume /var/log/journal:/var/log/journal:ro \
    --volume /var/log/journald-graylog-shipper.pos:/var/log/journald-graylog-shipper.pos:rw \
    --env "GRAYLOG_GELF_ADDRESS=some-fqdn" \
    --env "GRAYLOG_GELF_PORT=1514" \
    --env "GRAYLOG_ENVIRONMENT=my-environment" \
    --env "JOURNAL_CURSOR_TRACKING=/var/log/journald-graylog-shipper.pos" \
    --env "JOURNAL_LOCATION=/var/log/journal" \
    --network=host \
    --privileged \
    --memory=200M \
    --memory-reservation=100M \
    --cpu-quota=30000 \
    some-repo/journald-graylog-shipper:TAGRestart=always
RestartSec=15
RemainAfterExit=no
TimeoutStartSec=60[Install]
WantedBy=multi-user.target

You can find the full code here: https://github.com/stevehorsfield/journald-graylog-shipper.

This post is also published on my blog at https://stevehorsfield.wordpress.com/2019/10/08/send-your-systemd-journal-logs-to-graylog/.