Trendyol Tech
Published in

Trendyol Tech

How we archive and compress Couchbase data?

You don’t need cold data in the production! Archive it.

Hi everyone,
I would like to share one of my scripts that moving Couchbase data to SQL Server. It’s totally free and open source :)

Before the script, We should talk about “why we need archiving data.”
Maybe you don’t know, I’m working at an e-commerce company. We get lots of orders daily. That means lots of data :)
We use Couchbase for the order database. Every order is a large JSON document because it stores lots of detail. After a while, the database will be too large. That time you cannot manage this database even if it is sharded. You also experience performance issues. As a result, your business is affected.

Large databases are a troublemaker all the time :) Imagine that; your database has a high transaction volume. You have an incident at peak time. You analyzed it and decided on creating a new index. You have no time. Your manager is asking when will finish it?

So, if your database is large, the index creation takes too long. If your database is small, the index creation takes no time or quick.
Which one are you prefer?

With this PowerShell script, you can compress and move your Couchbase data to SQL Server. You just need to define the ruleset for archiving.

How does the script work?

1- Create staging objects on SQL Server if not exists
2- Get sample data from Couchbase
3- Compress and insert to SQL Server with Status = 0
4- Delete inserted data from Couchbase
5- Update deleted data on SQL Server with Status = 1
Your data has been compressed and moved :)

Here is example usage;

  • You should add META().id to the query. Then, the script parses and processes it.
  • There should be a limit clause in the query. My recommendation is that the script should be work chunk by chunk.

Some requirements you should know:

1- PowerShell 5.1 or above (You can also use it on Linux with PowerShell Core)
2- dbatools PowerShell module for SQL Server processes
3- SQL Server 2016 (13.x) or above (for the older SQL Server version, you should remove the COMPRESSION function from the script)
4- Query service on Couchbase

Why I prefer SQL Server for storing archived data?

First of all, you can use any database engine you want. You just need to customize the script for your environment.
I preferred SQL Server because it has a built-in compression feature and more flexible for me. This function compresses the data up to 80% (sometimes more), using the GZIP algorithm. SQL Server supports JSON value as well.

As I always say, the script doesn’t matter. I try to share my experiences and approach. Please get inspired :)

You can download the full script from my GitHub Profile.

If you need any help with this script, please feel free to contact me.

Thanks for reading,
Ahmet.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store