SQL | PYTHON | RELATIONAL ALGEBRA

All You Need to Know About Databases — A to Z Guide

Understand SQL, Relational Algebra, Relational Databases, and SQlite3 for Python

Khelifi Ahmed Aziz
Mar 24 · 11 min read
Photo by Isaac Smith on Unsplash

Plan

  1. Basic concepts and terminology.
  2. Creation of a relational schema with SQL (DDL).
  3. Relational algebra.
  4. Aggregation functions.
  5. DML (Data Manipulation Language).
  6. DQL (Data Query Language) (SELECT query).
  7. Access to databases with Python (sqlite3 module).

Basic Concepts and Terminology

planing
Photo by medbadrc on Unsplash

A database management system (DBMS) is a high-level software that allows different users/software (generally competitors) to manipulate the information stored in a database (independently of their physical representations (file location). the user of DBMS describes what he wants to get (not how to get it) using descriptive language (SQL) ≠ imperative languages (Python). The DBMS is responsible for sequencing the operations necessary for the user’s query.

A database is a voluminous set of persistent (years) data, structured and interdependent (interrelated information), managed by a DBMS.

An attribute (field or column) represents information specific to an entity. For example, a customer can have the attribute “name” containing for example the value ‘Aziz’. For a book, for example, we can consider the following attributes {reference, title, topic, year, language, price, quantity}

A domain is a set of instances of an elementary type (integers, reals, texts …). The notion of “elementary type” is opposed to the structured type or collections of values. Each DBMS has a fixed type system that represents the domains of attributes. For example Domain(Name) = Text, Domain (Quantity) = Integer.

By abuse of language, we use the term relation to denote the schema and/or the instance of a relation.

The diagram of a relation is the name of this relation followed by the list of its attributes, each attribute is associated with its domain. The diagram of a relation noted as follows: relation_name(A1 : D1, A2 : D2, …, An : Dn) where Ai is the attribute name and Di is his domain.
For example: Customer(Id : Integer, name : Text, birthday : Date)

The key of a relation is a minimal subset of attributes (ideally a singleton) that allows each row of the relation to be identified in a unique way.

A foreign key is an attribute that refers to a row in another table. In other words, it is a copy of the primary key of a record located in another table. Thus, a foreign key establishes a link between two tables of the database. the DBMS is responsible for checking the referential constraints (i.e the value assigned to the foreign key does indeed correspond to an existing row in the referenced table).

A tuple (row or also a record) of a relation R(A1 : D1, A2 : D2, …, An : Dn) is a tuple of values e = (V1, V2, …, Vn) where each Vi ∈ Di or vi = NULL
For example: (1509, ‘Sam’, ‘1970–01–31’)
is a record of the Customer table (relation)

SQL “Structured Query Language” is a query language used to communicate with a DBMS. It is descriptive language (the “what” not the “how”). This language has 3 parts:

  1. DDL: Data Definition Language, it allows to manipulate the schema (the shape) of the data: description of attributes, constraints, and the different relations of the database.
  2. DML: Data Manipulation Language, is used to describe the operations of inserting, updating, deleting, and selecting data.
  3. DCL: Data Control Language, it allows to control access to data: who can do what and on which data.

Data Definition Language DDL:

This language is a subset of SQL, it allows us to describe (create and modify) the schemas of the tables that will form our database.

The general syntax of the table creation query is illustrated below:

CREATE TABLE table_name
(
attribute1 Domain Constraint,
.
.
.
attribute_k Domain Constraint
Constraint_1
Constraint_k
);
  • Domains: The domains supported by the DBMS (sqlite) are:
    - INTEGER
    - REAL
    - TEXT
    - DATE: for dates represented as “YYYY-MM-DD”.
    - TIME: for times represented as “HH: MM: SS.SSS”
  • Particular Value NULL: It is a value that expresses the absence of value. A field is assigned to this value when it is not possible to know its value so a cell containing NULLis considered as an empty cell.
  • Contraintes: The constraints that it is possible to express are as described below:
    - PRIMARY KEY: indicates the attribute(s) that form the primary key of a table.
    - FOREIGN KEY: indicates the attribute (s) that reference the primary keys of external records.
    -CHECK: allows you to force a condition on the attribute.
    -NOT NULL: the value of the attribute must be known.
    -DEFAULT: indicate a default value for the attribute (if its value is not known the default value replaces NULL).
    -AUTOINCREMENT: allows a unique number to be generated automatically when a new record is inserted into a table (generally used for the primary key)

Example of table creation query:

CREATE TABLE Customer
(
Id INTEGER AUTOINCREMNT PRIMARY KEY,
Name TEXT NOT NULL,
Birthday DATE
);
/* or */CREATE TABLE Customer
(
Id INTEGER AUTOINCREMNT,
Name TEXT,
Birthday DATE,
PRIMARY KEY(Id),
CHECK(Name IS NOT NULL)
);
  • Adding a new column: the following query allows you to add a new column to the schema of a table:
ALTER TABLE relation_name
ADD COLUMN column_name column_domain
  • Rename a table:
ALTER TABLE relation_name
RENAME TO new_relation_name
  • Delete a table:
DROP TABLE relation_name

Relational algebra

Relational algebra offers a set of operations allowing to query a database. The expressions formulated by these operators are translated into SQL with the data manipulation language.
The relational algebra offers 9 operators which are classified into 4 families according to the following criteria:

  • Unary operators: Relation → Relation
  • Binary operators: Relation × Relation → Relation
  • Homogeneous operators: operands and result must have the same scheme
  • Heterogeneous operators: operands and/or result may have different schemes.

It is possible to use notation R1 = R2 in relational algebra, this indicates that the relation R1 is a copy of the relation R2. For example, if we want to designate the Customer relationship by C we can write: C = Customer, in SQL this translates to:

SELECT * FROM Customer AS C;

It is a unary and heterogeneous operator. It is used for the reduction of columns (attributes) of a relation.

  1. Projection on a single attribute:

For a relation R(A1: D1, …, An : Dn) The Projection of R on a single attribute Ai is defined as follows:

Source by Author

The resulting relation contains only one column in this case. In SQL this operation is translated by:

SELECT DISTINCT Ai FROM R;

Note: when implemented in SQL, projection eliminate duplicate data which is obtained by the addition of the DISTINCT keyword.

2. Projection on a family of attributes:

Source by Author
Source by Author

For example:

Source by Author

In SQL this operation is translated by:

SELECT DISTINCT id, name FROM Customer;

It is a unary and homogeneous operator which makes it possible to apply a selection of rows (tuples) of a relation R according to a condition c expressed on the columns of the latter. The condition is evaluated for each row of the relation, the rows for which the condition gives FALSE are discarded from the result. The selection from a relation R according to a condition c is defined as follows:

Source by Author

in SQL this operation is translated as follows:

SELECT * FROM R WHERE C;

in SQL it is possible to use the following operators:

  • Arithmetic operators:
    Addition (+), Subtraction (-), Multiplication (*), Modulus (%), Division (/)
    - Integer division if both operands are INTEGER.
    - Real division if one of the operands is of type REAL.
    -CAST(column_name AS REAL)to change the type of a column.
  • Relational Operators:
    -
    Equality (=or ==), Lesser (<), Greater (>), Less than or equal (<=), Greater or equal (>=), Different (<> or !=).
    -BETWEEN V1 AND V2: check if a value is between the two terminals V1 and V2.
    -IN (V1, V2, ..., Vk): check if a value is in the sequence (V1, …, Vk).
    -EXISTS(query): returns true if the specified query produces at least one row
    - It is possible to precede these operators by NOT:
    NOT BETWEEN
    NOT IN
    NOT EXISTS
    -LIKE: to compare a text with a text pattern (case-insensitive)
    %: replaces a sequence of zero or more characters.
    _: replaces a single character.
    - IS NULLand IS NOT NULLto check if a column is NULLor not NULL.
  • Logical operators:
    - OR
    - AND
    - NOT
  • Conditional operator:
    Used to formulateif ... else ...conditions, it has two forms:
CASE attribute_name
WHEN val_1 THEN res_1
...
WHEN val_n THEN res_n
ELSE other_res
END
/* or */CASE
WHEN cond_1 THEN res_1
...
WHEN cond_n THEN res_n
ELSE other_res
END

It is a unary and heterogeneous operator, it allows one or more attributes of a relation to be renamed
For example, a relation
Startups(id: Integer, name: Text, website :Text, country :Text)

Source by Author

In SQL this operator is translated by the keyword AS:

SELECT name, website AS URL, country AS location
FROM Startups

It is a binary and heterogeneous operator. The result of the Cartesian product R1 × R2 is a relation R whose diagram results from the concatenation of the diagrams of R1 and R2 formally where R1 (A1, …, An) and R2 (B1,…, Bm):

Source by Author

In SQL this operator translates as follows:

SELECT * FROM R1, R2;
/* or */
SELECT * FROM R1 CROSS JOIN R2;

It is a binary and heterogeneous operator which makes it possible to merge the rows of two relations having common values (generally a foreign key or a primary key. Let c be a join condition which connects a subset of the columns of R1 to a subset column of R2 then:

Source by Author

It is a heterogeneous binary operator (it is the reciprocal of the Cartesian product) i.e:

Source by Author

All the attributes of the diagram of the divisor relation (to the right of the operator ÷ must be included in the diagram of the dividend relation (the one to the left of the operator ÷). The resulting relation R ÷R1 contains the attributes that appear in R but not in R1, Formally:

Source by Author

This operator makes it possible to bring the elements of R2 that appeared with all the lines of R1. It is generally used to bring entities that have participated in a link with all entities of a set V.

It is a homogeneous binary operator. For R1 and R2 two relation of the same diagram then:

Source by Author

In SQL it translates to:

SELECT * FROM R1
UNION
SELECT * FROM R2;

It is a homogeneous binary operator. For R1 and R2 two relation of the same diagram then:

Source by Author

In SQL it translates to:

SELECT * FROM R1
INTERSECT
SELECT * FROM R2;

It is a homogeneous binary operator. For R1 and R2 two relation of the same diagram then:

Source by Author

In SQL it translates to:

SELECT * FROM R1
EXCEPT
SELECT * FROM R2;

Data Manipulation Language DML

It is a subset of SQL allowing to query, insert, update or delete data.

The general syntax of the data query request is described below:
Note: the [number] at the end of each line is the order of execution.

SELECT (optionnellement DISTINCT) attribute(s) or (*) /* projection attributes [8]*/
FROM table(s) /* a single table, a Cartesian product, or a join of a sequence of tables * [1]*/
WHERE condition /* selection criteria [2]*/
GROUP BY attribute(s) /* grouping attribute(s)[3]*/
HAVING group_condition /* group filtering (attention, it must be preceded by group by)[4]*/
ORDER BY Attribute1 (ASC or DESC),..., AttributK (ASC ou DESC) /* Order rows[5]*/
LIMIT N /* limit to N lines [7]*/
OFFSET v; /* start from line v [6] */

Aggregation functions allow you to perform vertical calculations (by aggregating rows) of an entire table or a group formed by the (GROUP BY) clause of SQL. The most famous aggregation functions are:
- The total number of elements: COUNT(attribute(s)) or COUNT(*).
- The number of distinct elements: COUNT(DISTINCT attribute(s)) or COUNT(DISTINCT *).
- Sum: SUM(attribute) or SUM(DISTINCT attribute).
- Average: AVG(attribute) or AVG(DISTINCT attribute).
- Maximum: MAX(attribute).
- Minimum: MIN(attribute).

The aggregations are carried out in two phases:
1. Group the lines with common values for a subset of attributes.
2. Apply aggregation functions to perform vertical calculations on the resulting groups.

This query allows the insertion of a new row in a table, it can be done in 3 ways:

INSERT INTO table_name VALUES(V_1, V_2, ..., V_K);

The values must match according to the order and the number of attributes declared in the Table (CREATE TABLE).

INSERT INTO table_name(attribute(s)) VALUES(V_1, ..., V_k);

The order of the values must match the names of the columns indicated after the table name

INSERT INTO table_name SELECT attribute(s) FROM table(s) ...;

This query is used to update the attributes of rows checking a criterion:

UPDATE table_name SET attr1 = v1,
attr2 = v2,
attrN = vN
WHERE criterion

Delete all the lines that meet a deletion criterion:

DELETE FROM table_name WHERE criterion;

Accessing data from python (sqlite3)

Sqlite3 is a module (developed in C language) that allows an application to imitate a DBMS (no server process).
import sqlite3

The use of this module is focused on two main classes:

  1. Connection: allows connection to a sqlite database
    con = sqlite3.connect(database_path)
    If the file indicated by path does not exist it will be created, then an instance of the Connection class which represents a connection to the database indicated by path is instantiated and then returned.
  2. Cursor: used to send query and receive results
    c = con.cursor()
    Returns an instance of the Cursor class used to handle the connected database (via the instance con).
  • Send SQL queries (as str): two methods:
  1. cursor.execute(…)
    - c.execute(SQLquery)
    example 1: c.execute("CREATE TABLE Customer(id INTEGER PRIMARY KEY, name TEXT, birthday DATE")
    example 2:c.execute("INSERT INTO Customer(id, name) VALUES(5,'Sam')")
    - c.execute(SQLquery, iterable)
    we use this method if we would use variables, for example: c.execute("INSERT INTO Customer(id, name) VALUES(?,?)",[5,'Sam'])
  2. cursor.executemany(…)
    -
    c.executemany(SQLquery, iterable_of_iteables)
  • Retrieve the result of a query (SELECT):
  1. curseur.fetchone()
    returns a tuple containing a row from the last SELECT sent at the cursor None if the query does not return rows.
  2. cursor.fetchmany(number_of_lines)
    returns a list of tuples of size (≤ number of rows).
  3. cursor.fetchall()
    returns a list of tuples containing all the rows returned by the query.
  4. cursor.description
    contains the titles of returned columns.
  • Confirm the current transaction:
connetion.commit()
  • Close connection:
connetion.close()

Summary

In this article, we explored everything you need to know from the relational databases to relational algebra, to SQL, and finally sqlite3 for python. Now, using the logic of relational algebra, SQL query and sqlite3 methods, you can create, manipulate, and implement a database into python.

Thanks For Reading! 😄

Check out my other articles and follow me on Medium

Khelifi Ahmed Aziz

Geek Culture

Proud to geek out. Follow to join our +500K monthly readers.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store