Redirecting queries to Slave Database

Ishan Joshi
Feb 5, 2020 · 3 min read

Master-Slave Databases are a common configuration of databases considering it allows replication making sure you don’t lose data in case of system failure. It also allows you to remove the burden from one database, and share it among many, making it an essential part of a good system. This feature can be especially useful for data analysis purposes.

Like any good system, we at Practo always have a database with a slave attached to it, and thanks to AWS RDS, we don’t have to worry about the data consistency across the master and slave database. This article aims to explain how to run a Query on the Slave Database with minimal code changes.

Before I dive into explaining the code, the application that we made this for is a Flask application with SQLAlchemy (https://www.sqlalchemy.org) as an ORM.

SQLAlchemy is a great library, that allows developers leverage the full power and flexibility of SQL. One of the most important features of SQLAlchemy is the Query and the Session API. In short, the Queryclass is a SQL construction object, where you can generate a SELECT query programmatically. The Query object is also attached to the Session, which acts like an agent that helps you communicate with your database.

The Flask application we work on has a convention where all the Model Objects return Query Objects instead of actual data. This is useful for two main reason — firstly, this allows you to add filters in the later stages of the code, and, you can lazy load data, since, the Query object acts as a generator. In simple terms, the query is only executed when you request for data.

Although you could just execute the query string on the slave database, it would not provide us with lazy loading and options like adding filtering in the later stages of the code. So we needed to code something where we can still return a Query object, except instead of running the query on the Master Database, it should run it on the Slave Database. For this, we made use of Python decorators.

So lets go through the code:

  1. Create Session Object attached to the Slave Database:
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
slave_db_engine = create_engine(<URI for Slave Database>)
slave_session = sessionmaker(bind=slave_db_engine)()

2. Create the Decorator:

from functools import wrapsfrom app import slave_session


def convert_to_slave_query(query_function):

@wraps(query_function)
def create_slave_query(*args, **kwargs):
return slave_session.query(query_function(*args,
**kwargs).subquery)
return create_slave_query

3. Use it in existing code (in this case, get_rick_characters):

from util import convert_to_slave_query@convert_to_slave_query
def get_rick_characters():
return Rick.query()

You can now easily use this decorator as a plugin and run queries on your slave database without having to go and edit existing code.

Generally, slave databases are used to run read-only queries. To avoid data manipulation to run on the slave database, you can use the following code:

DATA_MODIFICATION_LITERALS = [
'update',
'delete',
'create',
'copy',
'insert',
'drop',
'alter'
]
def _check_query_read_only(query):
query_literals = str(query).split(' ')
intersection = [
literal.lower() for literal in query_literals
if literal in DATA_MODIFICATION_LITERALS
]
return len(intersection) == 0

You can then change convert_to_slave_query decorator to:

from app import slave_session

DATA_MODIFICATION_LITERALS = [
'update',
'delete',
'create',
'copy',
'insert',
'drop',
'alter'
]
def _check_query_read_only(query):
query_literals = str(query).split(' ')
intersection = [
literal.lower() for literal in query_literals
if literal in DATA_MODIFICATION_LITERALS
]
return len(intersection) == 0

def convert_to_slave_query(query_function):
@wraps(query_function)
def create_slave_query(*args, **kwargs):
read_query = slave_session.query(query_function(*args,
**kwargs).subquery)
if not _check_query_read_only(read_query):
raise Exception("Invalid Operation on Slave Database")
return slave_session.query()
return create_slave_query

By doing so, you can be sure that only read queries run on your slave instance.

Although, you can now run queries on one slave database, this code does not allow you to run the query on multiple slave database, choosing the database using the Round Robin algorithm.

One more thing to keep in mind is that there is always some Replication Lag between Master and Slave instances of the Database, so it would be a bad idea to use this on any task that relies on latest data.

What’s Next?

Right now, the above code snippet supports reads only from one slave database. In future, this should be updated so that it supports n different slave database reads. Things like ReplicaLag, network proximity etc. can be used to select the correct database to run the query on.

Practo Engineering

How we build @Practo

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store