Pet-Project

R-Stat

Published in

Scum-Gazeta

7 min readDec 20, 2023

Today, I wanted to tell you about my pet project, which I’ve been working on for several years, and in general, about the benefits of such hobbies.

You probably guess that such projects bring certain benefits to developers — in it, you can feel like any member of a regular development team:

Product Owner
Software Architect
SRE/DevOps
QA
Frontend dev
Backend dev

And if your product is really useful, you’re lucky to work with real clients, their needs, requests, and complaints.

The most challenging part, of course, is to come up with an idea that will solve someone’s problems. But if nothing sensible comes to mind, start with solving your own problems. Of course, there are already enough to-do lists and calculators in life, so list your own problems and routines and try to solve or automate them.

The second positive aspect is technology. In our technical world, there are too many fascinating and useful things that you need to see or try — you won’t be able to do everything, but you must use something different from your standard working stack. Experience is a broad outlook; you should be able to choose and experiment — a pet project is the best place for experiments.

A bit about my project. I love sports; I have been involved in and continue to engage in various sports. Consequently, my interests found realization in my product. I wrote a system that helps you organize competitions for team sports (football, hockey, etc.). My system helps you create a competition structure according to the regulations, register all participants, and keep track of game statistics. It can simply fill your website with sports-related content and more, such as news, videos, photos, and so on. In general, it’s a find for those who want to organize competitions of any size.

But since we are engineers, I would like to delve a bit more into the technical aspects of my project.

Business Logic

Well, my solution is capable not only of organizing competitions from scratch but also of storing existing and publicly available competitions. For this purpose, I use parsers that crawl popular sports websites and gather all the necessary information about them — teams, players, matches, and their calendars. And most importantly, protocols of completed games.
Later, when I gathered enough data, I wanted to process it with AI and make predictions for upcoming games. However, the main purpose is to allow the creation and organization of competitions from scratch. You upload players, and teams, create a calendar, and start playing. That’s it! Of course, don’t forget to enter scores and protocols, and you will definitely have statistics that I can easily provide to your website through an API.
This is implemented with the help of another cool product of mine — Wal-Listener. As soon as new information about played matches appears in the database, the heart of the project, the Stat Processor, receives a message about it and starts counting statistics. This is my main Coca-Cola formula :)
In addition to all this, I have Telegram notifications connected. For example, I can direct my parsers to a website with the World Cup, and as soon as the scores change in the games or the regular time ends, I will receive a notification.
Currently, I support only football and hockey, but implementing any other team sport will not be difficult for me — clean architecture and clean code will help me do it in the shortest possible time. Feel free to reach out!

Infrastructure

I chose Hetzner Cloud — no managed solutions, let’s do it ourselves. Only servers, networks, volumes, and a firewall. I used infrastructure as code — Terraform is a fantastic tool for this.

As the orchestrator for my applications, I use Kubernetes — I’m a big DevOps fan, and there’s no alternative here. I even consider writing an Operator for my project at the application level and a Terraform provider at the infrastructure level. It’s a bare-metal cluster that I deployed using Kubespray — I have a separate article about it. The Ingress controller/load balancer is implemented using the excellent modern tool — Traefik.

My private repositories are on GitLab, so I can use their powerful GitLab CI — as the implementation of CI/CD processes, but perhaps it’s worth getting closer to people and moving to the lighter GitHub Actions. I mainly use clean manifests, but I’m gradually moving to Helm. Secrets should move from regular secrets to Vault soon, but maybe regular Sealed Secrets from Bitnami would be more suitable for GitOps.

For logs, I use the Loki stack, Prometheus for metrics, and a configured alert manager with notifications in Telegram. Also, for error aggregation, I use Sentry.

As you know, GitLab has its own container registry, but at one point, the runners deployed in my Kubernetes cluster were blocked (I understand that IPs were blacklisted) — I lost the ability to store images of my services in the centralized GitLab storage. This is the best moment to switch to Amazon ECR. I’ve wanted to use their services as a grown-up for a long time, but due to the cost, I used only the cheapest. I also used S3 to store database backups, which were kindly prepared for K8S CronJob.

In the GitOps framework, I also use ArgoCD, but until I have large clients, I can neglect them and sometimes deploy manifests from the JetBrains plugin.

For network access, instead of the classic IngressController, I use Traefik, and not just because it’s written in Go — it also has very flexible settings.

Architecture

What are the key points we can see here?

The programming language I use is Golang, and I had several parser services in Python (Beautiful Soup) before, but I’ve long since rewritten them in Go. I plan to implement remote configuration in Etcd for all services, and I think I’ll do it soon.

The services are interconnected with the NATS data bus — a powerful and easy-to-configure message broker. This simplifies the process of scaling and ensuring the availability of the service overall. I might have used Kafka — I love its scalability model, but Java solutions are currently discouraging me, and NATS is just a single binary, you understand, written on what. For this reason, we’ve already moved away from the ELK stack.

Another architectural decision is the use of two gateways for two different purposes (API for client websites and API for the management system). The task is to divide traffic between two different gateways, each of which has its own settings, access levels, and goals.

Also, the idea of Kafka Connect gave rise to two senders in my system — one for sending Telegram messages, the other for working with WebSocket notifications. All we need to do is simply send a message to the appropriate topic.

Security Considerations

Perhaps discussing this is a new attack vector, but let’s consider it as my personal Honeypot. There is certainly a lot of work to be done, but some things have already been implemented.

The first and most important is, of course, the use of TLS, and I don’t buy certificates for my domains but use LetsEncrypt. Obtaining these certificates is automated with CertManager. HTTP traffic should be redirected to the HTTPS protocol.

As for authentication and authorization, I use two API Gateways (KrakenD) for these purposes. One for external clients and the other for internal use. They have different access levels and, accordingly, different settings, but the general principle is the same. I have implemented an IdentityProvider service that issues JWT tokens to registered users, signed with an RSA private key. The API Gateway then verifies these tokens using the provided public keys in the form of JWK.

The network security is handled by the firewall; all ports except standard ones should be closed by it. Sometimes I need remote access to my infrastructure, for example, to the database — in these cases, I set up temporary port forwarding, but in normal conditions, all their ports are closed to the world.

Of course, I have several endpoints necessary for observing my system. Grafana dashboards, Prometheus admin interfaces and its Alert Manager, Load Balancer dashboard, and ArgoCD — all this data is required in defense, especially those that do not require a password for access. MutualTLS of Traefik and my self-signed certificates work well here.

User action auditing is not yet implemented, but I have long wanted to do it with the familiar Clickhouse. However, for the sake of exploring new technologies, I want to try using Cassandra. Its resemblance to Apache has made its distributed model similar to Kafka, which again makes it a more preferable choice for me.

Frontend

I have a complex frontend written in React/Redux, using the Material Design library as a bootstrap. After transitioning to go modules, working with npm was quite frustrating. I remember how, after long breaks, I tried to update dependencies, for example, switch to a new version of React, and everything else refused to work.

Maybe when I have time, I’ll try transitioning to Vue.

I used to dabble a bit in QT, and I had a dream of creating a desktop version of my site, but now I’m not sure if I’ll ever have time for that.

But overall, this whole full-stack vibe gives me the feeling that you control everything, from updating the cluster and the number of backups to the color of your buttons. However, you fully pay for it with your time — a pet project is a black hole that swallows everything :)

Bonus

For those who have read to the end, a small bonus. Let’s count how many products and technologies written in Golang, aside from my services, I use in my project?

Impressive, isn’t it?