The evolution of nxs-backup pt.1

Nixys
9 min readMay 8, 2024

--

nxs-backup article

It would seem the theme is trite — there’s already said and written a lot about backups, so there is no need to reinvent the wheel, just take it and do what you need to do. However, every time the system administrator faces the task of setting up backups, for many of them it hangs in the air as a big question mark. How do I properly backup my data? Where do I store backups? How to unify the backup process for a whole zoo of different software?

We first solved this problem in 2011. Back then we sat down and wrote our backup scripts. For many years we have used only them and they have successfully provided a reliable process of collection and synchronization of backups of web-projects of our clients. Backups were stored in our or some other external storage, with the possibility of tuning it for a specific project.

We must say, these scripts have worked their way through. But the more we grew, the more we started to have various projects with different software and external storage that our scripts didn’t support. For example, we did not have support for Redis and MySQL\PostgreSQL backups that came later. The process of backups was not monitored, there were only email alerts.

Another problem was the support process. Over the years, our once compact scripts have grown and turned into a huge awkward monster. When we got together and released a new version, it was a whole different story to make an update for that part of the customers who used the previous version with some customization.

As a result, at the beginning of last year, we decided to replace our old backup scripts with something more modern. So first we sat down and wrote all the wishes for the new solution. It turned out about the following:

Backup data of the most commonly used software: (Files: discrete and incremental; MySQL; PostgreSQL; MongoDB; Redis)

  • Store backups in popular repositories: (FTP; SSH; SMB; NFS; WebDAV; S3)
  • Receive alerts in case of problems during backup process
  • Have a unified configuration file to manage backups centrally
  • Add support for new software by connecting external modules
  • Specify extra options for collecting dumps
  • Be able to restore backups with standard tools
  • Ease of initial configuration

All these requirements were listed based on our needs about 5 years ago. Unfortunately, not all of them were released. And that’s how we’re getting to the main part of our article.

Birth of nxs-backup

Initially, Python was chosen as the implementation language — it is easy to write and support, flexible and convenient. Configuration files were chosen to be described in yaml format.

For the convenience of supporting and adding backups to the new software, a modular architecture was chosen, where the process of collecting backups of each specific software (for example, MySQL) is described in a separate module.

After a few years of use, the tool had some problems and shortcomings while the demands were growing. The choice we had was not easy. On the one hand — already working utility on Python, on the other — it doesn’t suit us on several parameters.

After a heated debate in the development team, we decided that we would have to rewrite everything because the architecture of the old application didn’t allow for easy implementation of our needs and required radical changes. And since we still had to change the architecture and write almost everything from scratch, the question arose: to stay on Python or change to Go?

Of course, we chose Go!

First, Go out of the box has an AOT compiler, and allows you to build a universal, dependency-free binary that can run on any distribution and not experience problems with different library versions. A nice bonus included built-in cross-compilation for other processor architectures.

Second, Go is a language originally sharpened under multithreading and should be significantly faster.

And third, we have more people who write on Go than on Phyton.

That’s why we decided to rewrite the tool completely in Golang, of course, adding the functions we need. Below we will describe each of them in more detailed way.

Analyze existing solutions

We looked at open-source solutions that already existed even before creating our first version of nxs-backup. But they all had their flaws. For example, Bacula is overloaded with unnecessary functions for us, initial configuration is — rather a laborious occupation due to a lot of manual work (for example, for writing/searching scripts of database backups), and to recover copies need to use special utilities, etc.

No surprise that we faced the same problem while having an idea of rewriting our tool. The possibility of the fact that in four years something has changed and new tools have appeared online was not that high, but still.

We did an audit and studied a couple of new tools that were not considered before. But, as discussed earlier, these also did not suit us. Because they did not fully meet our requirements.

We finally came to two important conclusions:

  1. None of the existing solutions was fully suitable for us;
  2. It seems we’ve had enough experience and craziness to write our solution for the first time. And we basically could do that again.

So that’s what we did.

Before exploring the new version, let’s take a look at what we had before and why it was not enough for us.

The old version supported the following types of file backups, databases, and remote storage:

DBs: MySQL; PostgreSQL; Redis; MongoDB

Files: Discrete and Incremental copying

Remote Storage: S3; SMB; NFS; FTP; SSH; WebDAV

And also had such features as:

  • Backup rotation
  • Logging
  • E-mail notifications
  • External modules
  • Support and Update

Now, more on what we were concerned about.

Run a binary file without restarting the source file on any Linux

Over time, the list of systems we work with has grown considerably. Now we serve projects that use other than standard deb and rpm compatible distributions such as Arch, Suse, Alt, etc.

Recent systems had difficulty running nxs-backup because we only collected deb and rpm packages and supported a limited list of system versions. Somewhere we re-plucked the whole package, somewhere just binary, somewhere we just had to run the source code.

Working with the old version was very inconvenient for engineers, due to the need to work with the source. Not to mention that installation and updating in such mode take more time. Instead of setting up 10 servers per hour, you only had to spend an hour on one server.

We’ve known for a long time that it’s much better when you have a binary without system dependencies that you can run on any distribution and not experience problems with different versions of libraries and architectural differences in systems. We wanted this tool to be the same.

Minimize docker image with nxs-backup and support ENV in configuration files

Lately, so many projects are working in a containerized environment. These projects also require backups, and we run nxs-backup in containers. For containerized environments, it’s very important to minimize the image size and be able to work with environment variables.

The old version did not provide an opportunity to work with environment variables. The main problem was that passwords had to be stored directly in the config. Because of this, instead of a set of variables containing only passwords, you have to put the whole config into a variable. Editing large environment variables requires more concentration from engineers and makes troubleshooting a bit more difficult.

Also, when working with the old version, we had to use an already large Debian image, in which we needed to add several libraries and applications for correct backups.

Even using a slim version of the image we got a minimum size of ~250Mb, which is quite a lot for one small utility. In some cases, this affected the starting process of the collection because of how long the image was pulled onto the node. We wanted to get an image that wasn’t larger than 50 MB.

Work with remote storage without fuse

Another problem for container environments is using fuse to mount remote storage.

While you are running backups on the host, this is still acceptable: you have installed the right packages and enabled fuse in the kernel, and now it works.

Things get interesting when you need fuse in a container. Without an upgrade of privileges with direct access to the core of the host system, the problem is not solved, and this’s a significant decrease in the security level.

This needs to be coordinated, not all customers agree to weaken security policies. That’s why we had to make a terrible amount of workarounds we don’t even want to recall. Furthermore, the additional layer increases the probability of failure and requires additional monitoring of the state of the mounted resources. It is safer and more stable to work with remote storage using their API directly.

Monitoring status and sending notifications not only to email

Today, teams are less and less using email in their daily work. It is understandable because it’s much faster to discuss the issue in a group chat or on a group call. Telegram, Slack, Mattermost, MS Teams, and other similar products are widely distributed by that.

We also have a bot, which sends various alerts and notifies us about them. And of course, we’d like to see reports of backups crashing in the workspace like Telegram, not email, among hundreds of other emails. By the way, some customers also want to see information about failures in their Slack or other messenger.

In addition, you long want to be able to track the status and see the details of the work in real-time. To do this, you need to change the format of the application, turning it into a demon.

Insufficient performance

Another acute pain was insufficient performance in certain scenarios.

One of the clients has a huge file dump of almost a terabyte and all of it is small files — text, pictures, etc. We’re collecting incremental copies of this stuff, and have the following problem — a yearly copy takes THREE days. Yeah, well, the old version just can’t digest that volume in less than a day.

Given the circumstances, we are, in fact, unable to recover data on a specific date, which we do not like at all.

Finding a solution

All of the above problems, to a greater or lesser extent, caused quite a palpable pain to the IT department, causing them to spend precious time on certainly important things, but these costs could have been avoided. Moreover, in certain situations certain risks were created for business owners — the probability of being without data for a certain day, although extremely low, but not zero. We refused to accept the state of affairs.

Nxs-backup 3.0

nxs-backup

The result of our work was a new version of nxs-backup v 3.0 which recently had an update to v 3.6.0 with a change of license to Apache-2.0.

Key features of the new version:

  • Implement the corresponding interfaces of all storage facilities and all types of backups. Jobs and storage are initialized at the start, and not while the work is running;
  • Work with remote storage via API. For this, we use various libraries;
  • Use environment variables in configs, thanks to the go-nxs-appctx mini-application framework that we use in our projects;
  • Send log events via hooks. You can configure different levels and receive only errors or events of the desired level;
  • Specify not only the period of time for backup, but also a specific number of backups;

Backups now simply run on your Linux starting with the 2.6 kernel. This made it much easier to work with non-standard systems and faster to build Docker images. The image itself was reduced to 23 MB (with additional MySQL and SQL clients included).

To summarize the work done

We’ve successfully transitioned nxs-backup from Python to Go and overcame compatibility and performance challenges along the way. In practice, this solution has already shown itself on its best side and works on client projects, so we will not only continue to use it actively but also improve it with useful features.

But we have something else in our pocket! Stay tuned for the second part of this article, where we’ll delve deeper into the features and optimization results of the new version. You can check the latest one on our GitHub!

But we also want to share our work with the community to make the nxs-backup even better and more convenient. If readers of this article have new ideas and suggestions for improving it, we will be glad to hear them.

You can also join the telegram channel where we have a poll on prioritizing the next features. And the chat to keep track of updates or ask any questions you might have.

See you soon!

--

--

Nixys

DevOps, DevSecOps, MLOps and 24/7 server monitoring & support across multiple project types