Programming with Databases in Python using SQLite

Photo Credit: Pixabay
If you are aspiring to be a data scientist you are going to be working with a lot of Data. Much of the data resides in Databases and hence you should be comfortable accessing data from databases through queries and then working on them to find key insights.

Data forms an integral part of the lives of Data Scientists. From the number of passengers in an airport to the count of stationary in a bookshop, everything is recorded today in form of digital files called databases. Databases are nothing more than electronic lists of information. Some databases are simple, and designed for smaller tasks while others are powerful, and designed for big data. All of them, however, have the same commonalities and perform a similar function. Different database tools store that information in unique ways. Flat files use a table, SQL databases use a relational model and NoSQL databases use a key-value model.

In this article, we will focus only on the Relational Databases and accessing them in Python. We will begin by having a quick overview of the Relational databases and their important constituents.

Relational Database: A Quick Overview

A Relational database consists of one or more tables of information. The rows in the table are called records and the columns in the table are called fields or attributes. A database that contains two or more related tables is called a relational database i.e interrelated data.

The main idea behind a relational database is that your data gets broken down into common themes, with one table dedicated to describing the records of each theme.

i) Database tables

Each table in a relational database has one or more columns, and each column is assigned a specific data type, such as an integer number, a sequence of characters (for text), or a date. Each row in the table has a value for each column.

A typical fragment of a table containing employee information may look as follows:

The tables of a relational database have some important characteristics:

  • There is no significance to the order of the columns or rows.
  • Each row contains one and only one value for each column.
  • Each value for a given column has the same type.
  • Each table in the database should hold information about a specific thing only, such as employees, products, or customers.

By designing a database this way, it helps to eliminate redundancy and inconsistencies. For example, both the sales and accounts payable departments may look up information about customers. In a relational database, the information about customers is entered only once, in a table that both departments can access.

A relational database is a set of related tables. You use primary and foreign keys to describe relationships between the information in different tables.

ii) Primary and Foreign Keys

Primary and foreign keys define the relational structure of a database. These keys enable each row in the database tables to be identified and define the relationships between the tables.

  • Primary Key

The primary key of a relational table uniquely identifies each record in the table. It is a column, or set of columns, that allows each row in the table to be uniquely identified. No two rows in a table with a primary key can have the same primary key value.

Imagine you have a CUSTOMERS table that contains a record for each customer visiting a shop. The customer’s unique number is a good choice for a primary key. The customer’s first and last name are not good choices because there is always the chance that more than one customer might have the same name.

  • Foreign Key

A foreign key is a field in a relational table that matches the primary key column of another table.

The example above gives a good idea of the primary and foreign keys.

Database Management Systems

The Database management system (DBMS) is the software that interacts with end users, applications, and the database itself to capture and analyze data. The DBMS used for Relational databases is called Relational Database Management Systems(RDBMS). Most commercial RDBMSes use Structured Query Language (SQL), a declarative language for manipulating data, to access the database. The major RDBMS are Oracle, MySQL , Microsoft SQL Server, PostgreSQL , Microsoft Access, and SQLite .

We have barely scratched the surface regarding databases here. The details are beyond the scope of this article.However, you are encouraged to explore the database ecosystem since they form an essential part of a data scientist’s toolkit.

This article will focus on using python to access relational Databases. We will be working with a very easy to use database engine called SQLite.

SQLite

SQLite is a relational database management system based on the SQL language but optimized for use in small environments such as mobile phones or small applications. It is self-contained, serverless, zero-configuration and transactional. It is very fast and lightweight, and the entire database is stored in a single disk file. SQLite is built for simplicity and speed compared to a hosted client-server relational database such as MySQL. It sacrifices sophistication for utility and complexity for size. Queries in SQLite are almost identical to other SQL calls.

Python sqlite3 module

SQLite can be integrated with Python using a Python module called sqlite3. You do not need to install this module separately because it comes bundled with Python version 2.5.x onwards. This article will show you, step by step, how to work with an SQLite database using Python.

Before starting I would highly recommend you all to install DB Browser for SQLite. The browser can be downloaded from their official page easily. DB Browser for SQLite is a high quality, visual, open source tool to create, design, and edit database files compatible with SQLite. It will help us to see the databases being created and edited in real time.

DB Browsers’ view

Since everything is in place, let us get to work.

Contents:

  • CONNECTING to a Database
  • CREATING a Table
  • INSERTING records in a TABLE
  • SELECTING records from the TABLE
  • UPDATING Records in the TABLE
  • DELETE Operation
  • Example walktrough

Connecting to a Database

  1. Open any Python IDE of your choice and type in the following commands. You can even use Jupyter Notebook for the same. In general, the only thing that needs to be done before we can perform any operation on a SQLite database via Python’s sqlite3 module is to open a connection to an SQLite database file:
import sqlite3
conn = sqlite3.connect('my_database.sqlite')
cursor = conn.cursor()
print("Opened database successfully")

The above Python code enables us to connect to an existing database using connection objectconn.If the database does not exist, then it will be created and finally, a database object will be returned. Acursor object is our interface to the database, that allows running anySQL query on our database.

If everything goes well, the following lines will be returned on running the script:

Opened database successfully

Let us now open and view the newly created database in the DB browser.

Indeed a new database named my_database.sqlite is created which is currently empty.

Before going further, there are two more things that are worth mentioning. If we are finished with our operations on the database file, we have to close the connection via the .close() method:

conn.close()

And if we performed an operation on the database other than sending queries, we need to commit those changes via the .commit() method before we close the connection:

conn.commit()
conn.close()

We should always remember to commit the current transaction. Since by default Connector/Python does not autocommit, it is important to call this method after every transaction that modifies data for tables that use transactional storage engines. If you don’t call this method, anything you did since the last call to commit() will not be visible from other database connections.

2. Creating a Table

Now we will create a table in the previously created database. Type the following code in the IDE.

cursor.execute('''CREATE TABLE SCHOOL
(ID INT PRIMARY KEY NOT NULL,
NAME TEXT NOT NULL,
AGE INT NOT NULL,
ADDRESS CHAR(50),
MARKS INT);''')
cursor.close()

The routineconn.execute executes the SQL statement. Here we create a table called with SCHOOL the fields: ID, NAME, AGE, ADDRESS and MARKS We also designate asID Primary Key and then close the connection. Let us see these details in the DB Browser.

3. INSERTING records in the TABLE

Let us now INSERT records of students in the tableSCHOOL created in the above example.

import sqlite3
conn = sqlite3.connect('my_database.sqlite')
cursor = conn.cursor()
cursor.execute("INSERT INTO SCHOOL (ID,NAME,AGE,ADDRESS,MARKS) \
VALUES (1, 'Rohan', 14, 'Delhi', 200)");
cursor.execute("INSERT INTO SCHOOL (ID,NAME,AGE,ADDRESS,MARKS) \
VALUES (2, 'Allen', 14, 'Bangalore', 150 )");
cursor.execute("INSERT INTO SCHOOL (ID,NAME,AGE,ADDRESS,MARKS) \
VALUES (3, 'Martha', 15, 'Hyderabad', 200 )");
cursor.execute("INSERT INTO SCHOOL (ID,NAME,AGE,ADDRESS,MARKS) \
VALUES (4, 'Palak', 15, 'Kolkata', 650)");
conn.commit()
conn.close()

When the above program is executed, it will create the given records in the tableSCHOOL.

4. SELECTING records from the TABLE

Let us say we want to select some particular records from the table i.e only,ID, NAME and MARKS. We can do this easily with the commandSELECT.

import sqlite3
conn = sqlite3.connect('my_database.sqlite')
cursor = conn.cursor()
for row in cursor.execute("SELECT id, name, marks from SCHOOL"):
print("ID = ", row[0])
print("NAME = ", row[1])
print("MARKS = ", row[2], "\n")
conn.commit()
conn.close()

When the above program is executed, it will produce the following result. We can see only that the address and age have not been returned.

ID =  1
NAME = Rohan
MARKS = 200
ID =  2
NAME = Allen
MARKS = 150
ID =  3
NAME = Martha
MARKS = 200
ID =  4
NAME = Palak
MARKS = 650

5. UPDATING Records in the TABLE

Let us see how to use the command UPDATE to update any record and then fetch and display the updated records from the table SCHOOL. Here we will update Martha’s marks from 200 to 250 and will again fetch the records.

import sqlite3
conn = sqlite3.connect('my_database.sqlite')
cursor = conn.cursor()
conn.execute("UPDATE SCHOOL set MARKS = 250 where ID = 3")
conn.commit()
for row in cursor.execute("SELECT id, name, address, marks from SCHOOL"):
print("ID = ", row[0])
print("NAME = ", row[1])
print("MARKS = ", row[2], "\n")
conn.commit()
conn.close()

When the above program is executed, marks for Martha would change from 200 to 250.

ID =  1
NAME = Rohan
MARKS = 200
ID =  2
NAME = Allen
MARKS = 150
ID =  3
NAME = Martha
MARKS = 250
ID =  4
NAME = Palak
MARKS = 650

6. DELETE Operation

We can use the operationDELETE to delete any record from the table SCHOOL. Let us say Allen has left the school permanently and we want to delete his records from the database. Now let’s fetch the details of all other students from the records.

import sqlite3
conn = sqlite3.connect('my_database.sqlite')
cursor = conn.cursor()
conn.execute("DELETE from  SCHOOL where ID = 2")
conn.commit()
for row in cursor.execute("SELECT id, name, address, marks from SCHOOL"):
print("ID = ", row[0])
print("NAME = ", row[1])
print("ADDRESS = ", row[2])
print("MARKS = ", row[3], "\n")
conn.commit()
conn.close()

When the above program is executed, it will produce the following result.

ID =  1
NAME = Rohan
ADDRESS = Delhi
MARKS = 200
ID =  3
NAME = Martha
ADDRESS = Hyderabad
MARKS = 250
ID =  4
NAME = Palak
ADDRESS = Kolkata
MARKS = 650

The same can be seen in the DB Browser. The second record has been deleted.


In the above section, we learned how to create a database and perform various operations on it. In this section, let’s work with a real database example to see how we can incorporate the basics we have just learned.

Example Walkthrough: Soccer database

We will be working with the soccer database from Kaggle. It is the ultimate Soccer database for data analysis and machine learning and the entire details can be accessed from kaggle. The database contains 8 tables.

Pre-requisites: A basic knowledge of Python and libraries like pandas will come in handy.

Download the dataset in SQLite format from Kaggle and save it in the same directory as your jupyter notebook.

Importing basic libraries

import sqlite3
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

Accessing the Database with the sqlite3 Package

# Creating the connection to database
con = sqlite3.connect('soccer.sqlite')
cursor = con.cursor()

Getting a list of all the tables saved into the database

for row in cursor.execute("SELECT name FROM sqlite_master WHERE type='table';"):
print(row)
('sqlite_sequence',)
('Player_Attributes',)
('Player',)
('Match',)
('League',)
('Country',)
('Team',)
('Team_Attributes',)

Reading all the TABLES with pandas library

country_table = pd.read_sql_query("SELECT * FROM Country", con)
league_table = pd.read_sql_query("SELECT * FROM League", con)
match_table = pd.read_sql_query("SELECT * FROM Match", con)
player_table = pd.read_sql_query("SELECT * FROM Player", con)
player_att_table = pd.read_sql_query("SELECT * FROM Player_Attributes", con)
team_table = pd.read_sql_query("SELECT * FROM Team", con)
team_att_table = pd.read_sql_query("SELECT * FROM Team_Attributes", con)

Exploratory Data Analysis

We will only analyse Player table here. But feel free to analyse all the remaining tables as well.

Player Table

# Dimensions
player_table.shape
(11060, 7)
# player_table.info()
Data columns (total 7 columns):
id 11060 non-null int64
player_api_id 11060 non-null int64
player_name 11060 non-null object
player_fifa_api_id 11060 non-null int64
birthday 11060 non-null object
height 11060 non-null float64
weight 11060 non-null int64
dtypes: float64(1), int64(4), object(2)

Accessing the first 5 records of Player table

player_table.head()

Now we have a pandas dataframe, and we can easily work with this to get desired information e.g:

Finding all the players with height > 150 cm.

height_150 = pd.read_sql_query("SELECT * FROM Player WHERE height >= 150 ", con)

Similarly you can explore all the other tables further to get other meaningful insights. Please find the code in the Jupyter notebook below. The code is self explanatory.

Conclusion

In this tutorial, we have seen how easy it is to to get started with SQLite database operations via Python. The module sqlite3 is very simple to use and comes in very handy when dealing with large data systems. I hope you found this article useful. Let me know if you have any doubt or suggestion in the comment section below.