Database sharding with Active Record

Karen
5 min readFeb 15, 2019

--

What is Database Sharding?

Database Sharding is a highly scalable approach for improving the throughput of large database-centric business applications. It is a type of database partitioning that separates very large databases into smaller, faster, easier managed parts called data shards what it means “a small part of a whole”.

Exists two ways of database partitioning, Vertical and Horizontal.

Vertical: “Involves creating tables with fewer columns and using additional tables to store the remaining columns. Normalization also involves this splitting of columns across tables, but vertical partitioning goes beyond that and partitions columns even when already normalized.”

Horizontal: “Involves putting different rows into different tables. For example, customers with ZIP codes less than 50000 are stored in CustomersEast, while customers with ZIP codes greater than or equal to 50000 are stored in CustomersWest. The two partition tables are then CustomersEast and CustomersWest, while a view with a union might be created over both of them to provide a complete view of all customers.”

A difference with Horizontal partitioning. Database sharding partitions the problematic table(s) in the same way, but it does this across potentially multiple instances of the schema. The obvious advantage would be that search load for the large partitioned table can now be split across multiple servers (logical or physical), not just multiple indexes on the same logical server.

Reasons to implement database sharding.

  • The growth in database transactions and large volumes of data.
  • If you need to move data along databases
  • Applications that have a multi-tenancy approach and want to horizontal scalability.

¿Why did I implement Database Sharding?

We needed an API where each client can access their database information. ¿What it means?, We have some enterprise clients, each client has its own database, where they are saving all its information, then there is an API that needs to consume that information, but as you can see there are multiple database instances. After research, I found that database sharding is a good approach for these cases, where the representation of each enterprise client is as a Company and each Company has its own information (database shard). LET’S DO IT.

How to implement database sharding on Active Record.

First, take a look at this gem ar-octopus

Octopus is a better way to do Database Sharding in ActiveRecord. Sharding allows multiple databases in the same rails application.

The API is designed to be as simple as possible. Octopus focuses on the end user, giving the power of multiple databases but with reliable code and flexibility. Octopus is compatible with Rails 4 and Rails 5.

Octopus supports:

  • Sharding (with multiple shards, and grouped shards).
  • Replication (Master/slave support, with multiple slaves).
  • Moving data between shards with migrations.

There are 2 ways to declare your databases shards in your application.

  1. Declare your database instances inside of config/shards.yml
  2. Dynamic shard configuration. Octopus allows us to set shards using ruby code.

Advantages of dynamic shard configuration:

  1. configuring the open connections on demand. NOTE: those connections will be handled by Active Record connection pool.
  2. flexible to use ruby code.

Setting Up

Install the gem in your gemfile.

Gemfile

Add a migration to create Databases table to save shards information.

The API will have the databases credentials in its database, the information needs to be safe then I used this gem attr_encrypted to encrypt the credentials.

To generate the migration run: bin/rails generate migration CreateDatabases

We need to add the Database model, here how it looks like:

app/models/database.rb

In this table we are going to have all our shards credentials, then when we want to create a new shards connection we are going to take the credentials from here.

To save the company information we need to add the Company model.

app/models/company.rb

This table is going to have the company name and have a relationship with a database.

The service to create the shards configuration dynamically is the next:

Explaining the service, we need to pass the company object that is requesting information in the constructor, then we are going to get their databases credentials. On the call method, we set up the connection in case it does not exist.

The connection has an identifier, in this case, is the name of the company, with this identifier we can ask for an existing connection or create a new one.

Now we can create our connection:

I added information before to the Company table.

With the connection established we can get the information that we want, in my case I needed the Members information, then I added a Member model, this is important because Active Record needs to manage that information through the model.

The member model class is next:

app/models/member.rb

To access the correct shard, you need to use the established connection by the company key, that we set up before.

Now we can create as many connections as we need and get information by different shards, in case you still do not have shards hosted on a server and you want to test it locally there is a way to create more than one database in your rails app, see my next post where I am going to explain how to do it.

In conclusion, database sharding is an option for horizontal scalability and Octupus is a complete gem to implement database sharding with Ruby on Rails and also the gem has good documentation.

--

--

Karen

Software Engineer at michelada.io, lover of Ghibli Studio movies, metal and rock music fan.