Building a Homelab SIEM — Graylog on Raspberry Pi

Hi, thanks for stopping by. And hope you’re doing well.

A few lines about me to get this thing primed — Well, I’ve been in the tech domain for just under 2 decades now, but call it “I was not ready”, or just procrastination at its worst, this is my first technical public blog post.

I have had some experience posting solutions on experts exchange, but that was like 10 years back;

and though have proud accomplishments on Facebook posts on varied stuff under the sun, none of it construes as something even remotely technical.

So here I am, trying to rouse my “writing cells” (and whatever’s left of my gray cells) and pen this article which I hope does not turn out to be a waste of your time.

But, if you do feel at some point in reading that my mumbling is going haywire, please do let me know through email or in comments below — I’ll appreciate any constructive feedback.

Whoami >> I’m an infant in the Raspberry pi world, having started down this path barely 2 months back.

And this is what I have built using it, which is something I would write about in this post.

  • Raspberry pi 4b 8GB Model , running Raspios 64-bit (I started with 32-bit as 64-bit is still in beta, but there are reasons I moved to 64-bit).
  • 4 TB USB3 Hard drive attached, acting as Network storage, Served through Samba configured on the Pi (formatted as ext4 — again started with ntfs and then switched to ext4)
  • Pi-hole acting both as DNS and DHCP server >> since my router doesn’t support setting a DNS suffix in DHCP options
  • A separate machine hosting Windows /Linux VM’s on the network.
  • Graylog 4.1 arm64 installed as a docker image on the pi, and receiving logs from these VM’s.
  • Logs are saved on the attached USB hard drive, as well as Mongodb runs off the same USB Hard drive

If you’re someone who is a bit conversant with systems and networking, especially Linux , this shouldn’t be a too difficult to do.

Even if you’re skills in Linux are as stellar as Mine, it shouldn’t be a showstopper since there are tons of help articles & documentation available online.

So, if there is already plenty of stuff available online, then how’s this post any different ?

- Well, this information below is more like a summarized guide of information which is kinda scattered across multiple sources,

- And as a newbie, I’m just trying to help consolidate it into a post to help someone who is probably at the same starting point as I was.

To start off with,

  1. You can get 64 bit raspios images off here https://downloads.raspberrypi.org/raspios_arm64/images/

As of this writing, the latest is May 28 release

But before you install, please be aware that since 64-bit OS is still in beta, and hence there are still active bugs/issues (Though the devs are working through and list is shorter than before) that you should be cognizant of

You can spend some time here, just to be sure there no missing/reduced functionality in 64 v/s 32 bit, that could be a potential blocker.

A very legit question is : Why even bother putting 64-bit OS if its beta, and 32-bit is available with none of these “Known but do your homework” bugs/issues.

I had the same question, and what I found was

  • ElasticSearch (which is the core component of both ELK stack and Graylog) , ships with arm-64bit images , either for native OS install or through docker .
  • Which essentially restricts you to 64 bit Pi OS (https://www.elastic.co/blog/elasticsearch-on-arm)

- Another reason was that my Pi 4b was the 8 GB model (I would recommend taking this over the 4 GB model if you’re planning to deploy any SIEM on it) . And this extra RAM comes in handy for the MongoDB database caches. JVM memory (it is RAM intensive) and/or OS cache

2) Now that the OS install is done and Pi is finally booting to its glorious desktop, or the not so glamorous ssh putty session

  • this second step is more for those who would want to attach a USB hard drive
  • and use it as the log location for your to-be-deployed SIEM, or as a Network share, (Or both — as I’m using it)

The rationale is simple — a hard drive can take constant IOPs of SIEM logs a lot better than an SD card can (even if you take those high endurance SD cards) , with the additional benefit that the raw logs can be externally accessible (if you were to do need them)

Imp: Please remember to format the disk at ext4/ext3 and NOT NTFS (I haven’t tested FAT)

- That’s because mongodb ( which is a core dependency in graylog) does not start if the logs/files reside on an NTFS formatted volume

There are essentially 2 Steps to get the USB drive mounting configured

  • Verify the existing mountpoint , query additional info regarding the attached disk, and edit the /etc/fstab to put in static entry for automount at boot

This article documents detailed steps to do this

For reference, this is how my fstab entry looks for ext4 v/s ntfs (Note the FileSystem entry)

sudo nano /etc/fstab

NTFS UUID=708EC8A88EC86864 /mnt/Mydisk1 ntfs defaults,auto,users,rw,nofail,umask=000,,x-systemd.device-timeout=30 0 0

Ext4 UUID=320f034f-1305–4b30-aff7–27b63c4cbda3 /mnt/Mydisk1 ext4 defaults,auto,users,rw,nofail,x-systemd.device-timeout=1,noatime 0 0

  • Next is applicable ONLY if you would want to use this USB attached disk as a Network Share as well (Poor man’s NAS)

This can be done through installing and configuring Samba , or you could use Open Media Vault (OMV).

I went the OMV route at first, only to burn my hands later ending up with an unbootable pi. Starting afresh with an SD card wipe followed by fresh PiOS install , I had my painful learning that OMV wasn’t even needed for a single Disk home use scenario , its more suited to heavy use — RAID type use cases

But if storage as a domain excites you, then this storage offering is something you might want to explore as a very capable NAS solution

As for this use case, We’ll restrict ourselves to configuring Samba — I’ve been using it to make my 4 TB WD USB disk available on the network

And it has handled literally everything I’ve thrown at it (I’ve gone upto 70 GB of data transfer at one go, which chugged along nicely around 10–14 MB/sec)

Here are 2 articles you can refer to for a step by step guide, around

- installing samba

- creating samba/Pi OS user

For reference, here’s my smb.conf file

sudo nano /etc/samba/smb.conf

First custom entry : Security = user

Then Defining the share

Here is the share, as accessed from my Windows 10 machine

\\raspberrypi.lan\DataShare\graylog

So now that the platform in itself is ready with the pre-requisite configurations in place, we can move to the actual Graylog install process.

These are the further steps, listed in sequence

3) Install docker on raspberry PiOS

We’ll have to do it through the convenience script for arm64

sudo apt-get update && sudo apt-get upgrade

curl -fsSL https://get.docker.com -o get-docker.sh

sudo sh get-docker.sh

Sanity Check

docker version

docker info

docker run hello-world

You can follow the detailed instructions as well as access further documentation, off these URL’s

https://docs.docker.com/engine/install/debian

4) Install docker-compose , to be able to run the deployment of Graylog based on the specified config parameters for MongoDB & Elastic, as well as parameters for the actual Graylog instance

There are 2 ways to do it — through the curl command, or using package manager pip

a) I installed it using the standard curl command ( Just use the latest release number for vx.x.x. available at the point, off https://github.com/docker/compose/releases/)

sudo curl -L https://github.com/docker/compose/releases/download/v2.0.1/docker-compose-`uname -s`-`uname -m` -o /usr/local/bin/docker-compose

sudo chmod +x /usr/local/bin/docker-compose

Verify by running docker-compose — help

Read more about this here https://hometechhacker.com/how-to-create-a-graylog-container-in-docker/ (Installing through curl)

b) The alternate procedure to get docker-compose on your pi, is to use pip3 — Package installer for python.

I haven’t tested it myself, but looking at the amount of online documentation available — a lot of people have deployed this procedure, and it does apparently work on PiOS arm64

sudo apt-get install libffi-dev libssl-dev

sudo apt install python3-dev

sudo apt-get install -y python3 python3-pip

sudo pip3 install docker-compose

5) Craft a custom configuration file for graylog defining the parameters, settings and env variables; Which would then be used an input for docker-compose to pull the images and install/initialize the containerized app instance

a) Create a folder (I’m using folder named ‘graylog’) to house the yml config file

mkdir graylog

cd graylog

nano docker-compose.yml

b) Now running apps within docker is a relatively new concept, and If you are not very familiar with installing app instances using docker-compose through custom yml files (As I also wasn’t, but now I think I can hold a conversation without looking like a complete moron) ,

I would highly recommend reading these 2 articles first (in the listed sequence), just to be familiar with what we’re doing next.

>> Steps 1 and 2 are done, you can read Step 3 onwards

>> This will give you the rough list of all the configurable parameters , and the 3 sample yml files directly from graylog

You could use them as it is, or as a template to customize the graylog installation to your environment/requirements

My Graylog install in my home lab environment, is heavily influenced and based off the following article and a install demo on youtube, both of which are done by Graylog engineers

the above video walks us through the overall install process

And the other is a reddit post — this was a good reference point for me to craft my custom yml file

As you would have seen by now, the yml file could as long as a 94 row long file, so pasting the entire file here would just mean a scrolling exercise;

So I’ll post snippets of the important parts instead, and attach the entire vml file with the document ( if you want to take a look)

Based on my understanding, the 3 most sections in this yml file are

  • Version of the image for the app, like mongo, elastic and most importantly — graylog image

e,g: the mongodb image version,

As in the example I’m using the enterprise Graylog image, and then further registered it for Free Enterprise license ( 1 year) , that gives me a 5 GB /day ingestion which is more than adequate for home lab usage

The difference between the example and my yml file is in the line

image: graylog/graylog-enterprise:4.1-arm64

That’s because a newer release of graylog came along, and I edited my yml to reflect the latest image version (And this is also how graylog is upgraded to the latest release, when installed in a docker)

  • Defining the log location as the mounted path for USB HDD , for all 3 components — MongoDB, Elasticsearch and Graylog

In my environment, a 4 TB USB hard disk is mounted at the path /mnt/Mydisk1 on Raspberry pi, so the bind volumes in yml path reference the relative folder paths , as log locations

For MongoDB

For Elastic Search

And this is for Graylog

What you could also do prior to firing up the graylog docker instance through this yml file, is to create the referenced folder paths on the USB HDD

Looks something like this

  • The third section is the port number mappings between docker to respective hosted apps — mongodb, elastic and graylog;

which is essential since the app cannot send/receive any data on the port until there already exists a definition within yml file.

Now the default yml file (version 2/3) already is good for all the port mappings, except an entry for Winlog Beat on port 5044 (which needs to be created),

and would then would act as recipient for all windows event logs forwarded through respective winlogbeat instances

6) With the yml file ready, the following command would download the 3 images (mongodb,elastic, graylog) , install them as an app instance within the docker , and bring them up (initialize)

Please note that the function of “Pulling down images followed by install” ONLY occurs the first time, or when the image version changes triggering a fresh download

Navigate to the graylog folder, and run the command

sudo docker-compose up -d

The status of services (started/started/failed) can be checked by this command

watch docker-compose ps

Re-run the command every 30s, until the status shows up running/ healthy

Fire up the browser, and access the web interface of Graylog on port 9000 (As defined in yml file)

I like the home page rendered in grayscale, though it might be a bit drab for some (At my age though, grayscale is super exciting )

The default creds are username : admin , & password : admin (unless you changed the SHA256 password hard in yml file)

And that would open up the landing page.

I would suggest you head over to the System/Overview Page, take a note of the Cluster ID , and then generate a license for your install from Enterprise/Licenses — Licenses page

7) Configuring Data input for Graylog (Covering event log ingestion from Windows systems)

Uptil this point, we have the Graylog installed and running as a docker container.

The next step is to test if Graylog can accept data input and display it within the UI, and the way we would do that is to :

  • configure a listener within Graylog to accept data input , technical name for it is “Inputs” and this test involves configuring a RAW/Plaintext TCP input on port 5555
  • send a plaintext message on port 5555 , and verify it the UI displays it

Imp: As already mentioned in the previous section, the pre-requisite of configuring any type of “Inputs” within Graylog, is to first define that corresponding port within the docker-compose.yml file.

This is to ensure that this port within the yml file is already mapped to the docker container, post which the configured input will be able to accept actual data.

So for this test to work, the port definitions for Graylog should look like below (Lines 12–13 added)

The detailed instructions on how to go about performing this input test, are documented here

Assuming this test input came through, the rest is easy.

My homelab setup involves sending over Windows event and Sysmon logs using winlogbeat over port 5044, while Graylog has a “Beats Input” configured to listen and accept this data input on 5044

As I understood, there are 3 ways to ingest windows event logs in Graylog (https://docs.graylog.org/en/4.1/pages/sending/windows.html)

a) Use Nxlog community edition on windows endpoints, which parses windows logs and sends them over to Graylog over GELF (https://nxlog.co/products/nxlog-community-edition)

b) Use Solarwinds Eventlog forwarder, which parses Windows logs and sends them over in syslog format (https://www.solarwinds.com/free-tools/event-log-forwarder-for-windows)

c) Use Winlogbeat , which preserves the windows event format and sends it over as Beat Input >> this is the recommended way since this offers a very granular level of control, and maximum flexibility with handling different log types

Here is a very good guide about using a configuration management feature termed “Graylog Sidecar” , which automates the winlogbeat configuration on these endpoints quite a bit.

The official page is

But I will recommend Watching this video

Here is how I have configured the inputs, for your reference.

a) This is the Beats input

b) And these are my 2 collectors — depending on the role of the source windows host (if it’s a domain controller or not)

  • Note the different names , and the difference in terms of the ingested logs

And once you reach the below screen, that is the end of this install procedure; and start of uncovering stories within these logs.

Happy hunting.

Trying to stay a student forever and keep alive the “learning”