Partition in Databases (Part-1) RANGE partitioning
Hello, This article is not related to my previous posts (py-spark) however it is an important topic for any efficient database design. We will look into the methods of doing partitions, parameters and other information for creating a good partition designs.
Before we start, We will be using postgresql. I will be using docker to download and run the postgres in my machine as it will reduce my time and will be easier to work.If you want to run in my particular configuration you can find the code. Create an docker-compose.yml file and paste it.
version: '3.8'
services:
db:
image: postgres:13.0-alpine
restart: always
environment:
- POSTGRES_USER=postgres
- POSTGRES_PASSWORD=postgres
- POSTGRES_DATABASE=database
ports:
- '5432:5432'
volumes:
- db:/var/lib/postgresql/data
volumes:
db:
driver: local
Run this command to download and test your stuffs
docker-compose up -d
Now setup is done. if you are using mac and windows you will have docker-desktop. for others use terminal and exec command to your container. usually it should look like
#to see the available containers run
docker ps
# you will see the container_id in list
docker exec -it container_id bash
you should be able to go to terminal of the postgres container.
If you have docker-desktop follow these steps:
- Open Docker desktop
2. You will see a folder name and inside that you will see your container name in my case its db. once you click on it, you will see something like this.
After that, go to terminal tab and done. you can access the terminal of psql.
It should look something like this.
run these command to login as postgres user which we had created before on our docker-compose.yml file.
psql -U postgres;
Now create a database, you can name it any but i am going with the most creative name that is practice_pratitation. To create the database paste and replace with your preferred database name.
create database practice_pratitation;
we are done with creating database. Let’s learn some theories now,
Types of Partitions
- RANGE partitioning. This type of partitioning assigns rows to partitions based on column values falling within a given range. e.g. 10–20, 20–30 so on.
- LIST partitioning. Similar to partitioning by
RANGE
, except that the partition is selected based on columns matching one of a set of discrete values. - HASH partitioning. With this type of partitioning, a partition is selected based on the value returned by a user-defined expression that operates on column values in rows to be inserted into the table.
We will first look into the Range partitioning, for that we will create a table named range_partition.
# change the database
\c practice_pratitation;
create table range_partition (id integer, name text, age numeric) partition by range ( age);
What we did here? We are saying create a table named as practice_pratitation with id integer, name text, age numeric columns(attributes) and partition by the range and this range is based on the age.
Now, we are done with creating the basic structure for our partition method. We need to create a partitions for the table where we will specify the ranges and data that we want to store based on the age. let’s look into it.
create table minor partition of range_partition for values from(MINVALUE) to (18);
we have create a table called minor where we are saying that minor table is the partition of range_partition and this table holds the value from 1 as if we don’t pass any value inside MINVALUE we will get 1.
again, Let’s create a table called adult where the age is separated from 18 to 60.
create table adult partition of range_partition for values from(18) to (60);
Note: partition must be equal sized where data should be equally distributed. If one table has 1000 rows of data and another has 5000 rows of data then It is not considered as good approach for doing partation. Partition should be equally distributed.
let’s do same for the last time to store senior citizens, to do that..
create table senior partition of range_partition for values from(60) to (MAXVALUE)
We are creating a table where the age group are 60 above. If you describe the table.
\d+ range_partition;
you can see all the informations in details.
practice_pratitation=# \d+ range_partition;
Partitioned table "public.range_partition"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
--------+---------+-----------+----------+---------+----------+--------------+-------------
id | integer | | | | plain | |
name | text | | | | extended | |
age | numeric | | | | main | |
Partition key: RANGE (age)
Partitions: adult FOR VALUES FROM ('18') TO ('60'),
minor FOR VALUES FROM (MINVALUE) TO ('18'),
senior FOR VALUES FROM ('60') TO (MAXVALUE)
You can now insert data into your table normally.
practice_pratitation=# insert into range_partition values (1, 'arjun', 22),(2, 'prajwal', 25), (3, 'aashish', 26);
INSERT 0 3
practice_pratitation=# insert into range_partition values (4, 'sita', 2),(5, 'sima', 18), (6, 'aashika',13 );
INSERT 0 3
practice_pratitation=# insert into range_partition values (7, 'gita', 62),(8, 'kabita', 65), (9, 'sabitri',73);
INSERT 0 3
practice_pratitation=#
let’s see if data was stored or not.
practice_pratitation=# select * from range_partition;
id | name | age
----+---------+-----
4 | sita | 2
6 | aashika | 13
1 | arjun | 22
2 | prajwal | 25
3 | aashish | 26
5 | sima | 18
7 | gita | 62
8 | kabita | 65
9 | sabitri | 73
Here I have inserted some data, let’s see if we succeed to store data according to age category.
select tableoid::regclass, * from range_partition;
tableoid | id | name | age
----------+----+---------+-----
minor | 4 | sita | 2
minor | 6 | aashika | 13
adult | 1 | arjun | 22
adult | 2 | prajwal | 25
adult | 3 | aashish | 26
adult | 5 | sima | 18
senior | 7 | gita | 62
senior | 8 | kabita | 65
senior | 9 | sabitri | 73
(9 rows)
We have successfully implemented the range partition.
After doing these things you might have question, why do we need this… see when your database or data grows we tend to update delete and do other stuffs… doing these things the whole table needs to be scanned atleast once for each of these operations and doing these things over a 100K rows will not be efficient so we just create small chunks so that we can now scan all the rows efficiently and faster.
We will continue these article until we implement remaning types of partition, do follow if you haven't.