Upgrade performance on your data base

Daniele
Data Tech Tips
Published in
4 min readJul 6, 2018

This is that kind of article in which i try to explain some best practises (some acquired from my experience) to upgrade your database.
Today we are talking about: PERFORMANCES (Read/write).

Talking about database performance is a serious topic, because there are many ways to improve its overall performances.
Those improvements could start from configuring properly the database (Eg. Starting modifying the config file) but here i want to describe how to choose the best way to design the database.

How to create a good database

  • First you need to know what’s the purpose of you database
    If it used exclusively to store data or to make simple calculations.
  • Second you MUST need to know how big the database could potentially be
    If you need to store some simple data like the items stored in your shop, or gather huge amount of data from Twitter to analyse sentiments.
  • Third will your database be an OLAP or OLTP ?

Once you have those information you can create your ER schema using some good tools.
Don’t you know what an ER schema? Here’s a link for you, oh here it is a good tool to design one.

This is an example of ER schema model

Ok now we have created the skeleton of our database, but now we have to choose which database to use.

There are so many database to use from MySQL to PostgreSQL and even Oracle or SQLServer but which of them suit my needs?

Wikipedia has always the answer but IMHO PostgreSQL or MySQL should fit the 90% of your needs, they’re simple to use and configure/install.

Once you’ve install the database on your server, or locally, here’s the most difficult part.
You have to create every single table with the IDs and everything else, i’ll give you an example:

Note that:

  • You have to put the right (or maximum potential) amount of character. If you data inserted in that column is between 1 and 20 using too much character like VARCHAR(255) could slow the reading process ’cause your database will allocate the amount of space in the creation process.
  • For the IDs use ALWAYS integers, they are really fast to read and the system will thank you always!
  • For date columns use always date or int (Eg. 20180101 for int 2018–01–01 for date format) this will improve the reading by a lot because the system will automatically optimize the column for the need.

Ok now you have created your tables and inserted some data like this:

Note that for inserting date you need to specify by calling the function TO_DATE this is a common function in every database W3C explains better this.

Ok now we need to create some index to speed up the reading of our database, why?
Well if your database contains few columns and rows this could not be done, but image to query a table where there are millions of rows, this could start to be really long to read!
So, here comes the index what are they? Here’s the answer and here an example in GIF format:

How btree index is used to look for numbers

There are many searching algorithms B-Tree is the most common but in another article i’ll explain different algorithms.
To create an index usually you need to write this king of code:

This simple row of code could improve a lot the speed in querying your database, StackOverflow like wikipedia has always an answer for everything.

As the least thing to do you need to improve your query, a query with a lot of useless JOINS, GROUP BY or WHERE condition could slow seriously the reading process of data even if you have a great machine and have optimized everything, keep everything simple stupid (even the query) [KISS philosophy]

So wrapping up everything:

  • Check the requirements of your database
  • Use the right software for your needs (read a lot please)
  • Create tables suited for the kind of data you have to insert
  • Create the right number of indexes (use always the ID or tuple ID-Other useful column)
  • Keep your query as simple as stupid!

Thanks for your attentions ;)

--

--