Member preview

Python HOW: Connect to, and Manage a Database

If you’ve been trying to connect to a database on-premise or on a local disk, and found ambiguous online resources and inconsistent (or even inaccurate) terminology, then you will enjoy this article

I’ve put my best effort to provide you with a clear, concise, and detailed description on how to connect to, and manage any database from Python

I- Introduction

II- PyODBC: Open DataBase Connectivity for Python

III- SQLite3: SQLite for Python

IV- Executing SQL statements

V- Reading SQL statements into Pandas DataFrame


I- Introduction

1.1 Database model

A database model determines the logical structure of a database (a database is an organized collection of data). This in turn determines how data can be stored, organized and manipulated. The Relational Model (RM) is the most popular database model since the 1980s. RM uses a table-based format, where tables are related by common columns

1.2 Database management system (DBMS)

DBMS is the software that you -or applications- use to interact with the database to create, read, update and manage data. The Relational DBMS (RDBMS) is the DBMS based on RM. According to DB-Engines, the most widely used RDBMS are: Oracle, MySQL, Microsoft SQL Server, PostgreSQL, IBM DB2, Microsoft Access, and SQLite

A database is often referred to by the DBMS used to manipulate it. For example, a database that is manipulated using Microsoft SQL Server is referred to as: Microsoft SQL Server-Database

Although not required, all RDBMS use SQL as a standard data access language

1.3 Database connectivity interface

A database connectivity interface allows an application to access data from a variety of DBMSs, using a specific driver for a specific DBMS and operating system. This means that the application can be written without depending on a specific DBMS or the operating system

Open DataBase Connectivity (ODBC) is a standard Microsoft Windows interface that enables applications (typically written in C or C++) to connect to DBMSs. While, Java DataBase Connectivity (JDBC) is a standard Oracle interface that enables applications written in Java to connect to DBMSs


II- PyODBC: Open DataBase Connectivity for Python

PyODBC is the Python package for ODBC. Through pyodbc, you can easily connect Python applications to a DBMS-database, using the required ODBC driver

2.1 Installation
Step 1: install pyodbc through pip

pip install pyodbc

Step 2: install the required driver for the DBMS-database you want to connect to. For example, if you want to connect to a Microsoft SQL Server-Database, you need to download and install the driver from Microsoft, after choosing your operating system

2.2 Connecting to a database

To make a connection to a database, we need to pass a connection string to the connect() function of pyodbc. The most important thing to remember is that: pyodbc passes the connection string directly to the DBMS-database driver unmodified. Therefore, connection strings are driver-specific

For example, to connect to a Microsoft SQL Server-Database, we provide the following connection string:

cnxn = pyodbc.connect('driver={SQL Server};'
'server=serverName;'
'database=databaseName;'
'trusted_connection=yes')
Note: you can find information about general connection string for most databases here. Make sure to follow the connection string formatting rules found here

The connection string can also be passed as arguments, which are then concatenated into a connection string:

cnxn = pyodbc.connect(driver='{SQL Server}',
server='serverName',
database='databaseName',
trusted_connection='yes')

In the connection string above, we assume that the SQL Server Authentication mode is set to Windows Authentication Mode, so we did not have to provide credentials to get access permissions to the server.

However, if it was a Mixed Authentication Mode (i.e. SQL Server Authentication + Windows Authentication), we can still use the above connection string, or the following one where we provide our User ID and Password to get access permissions to the server:

cnxn = pyodbc.connect(driver='{SQL Server}',
server='serverName',
database='databaseName',
uid='UserID',
pwd='password')

Pyodbc also supports Data Source Name connections. A DSN contains information about a specific database that an ODBC driver needs in order to connect to it (information such as the database driver, the server and database names, etc):

cnxn = pyodbc.connect('DSN=DsnName;'
'pwd=password')

Lastly, depending on the database you are trying to access, and the version of Python you are using, you might need to set the connection encoding/decoding settings, which you can find here

For Microsoft SQL Server-Database and Python 3.x, using pyodbc default encoding/decoding settings is recommended (i.e. no need to do anything)


III- SQLite3: SQLite for Python

SQLite3 is the Python package for SQLite, which is the most widely deployed RDBMS in the world (arguably). The main features of SQLite are:

  • An embedded database. SQLite does not have a separate server process (i.e. server-less). It reads and writes directly to ordinary disk files
  • Free for use for any purpose, commercial or private

3.1 Installation

SQLite3 is included with Python by default. So even if you have created a new Anaconda environment, it will be installed for you

3.2 Connecting to a database

To create a connection to a disk-based database, we just need to pass the name of the database to the connect() function of sqlite3:

cnxn = sqlite3.connect(r'Databases/database.db')

If the database does not exist, an empty one will be created for us


IV- Executing SQL statements

Once we have a connection to the database, either from pyodbc or sqlite3, we can then create a Cursor object which represents a database cursor. We can do this using the cursor() method:

cursor = cnxn.cursor()

Now we have a Cursor object, we can execute any valid SQL query string using the Cursor execute() method:

cursor.execute(sql_query_string)

Let’s assume that the database we are connected to (database.db), has 2 tables:

  • T_CUSTOMERS
                 | id |    fname     |   lname    |
|----|--------------|------------|
| 1 | 'Maria' | 'Anders' |
| 2 | 'Antonio' | 'Moreno' |
| 3 | 'Thomas' | 'Hardy' |
| 4 | 'Ana' | 'Trujillo' |
| 5 | 'Christina' | 'Berglund' |
  • T_ADDRESSES
                | id | country |    city     |   code   |
|----|---------|-------------|----------|
| 1 | Germany | Berlin | 12209 |
| 2 | Mexico | Mexico D.F. | 05023 |
| 3 | UK | London | WA1 1DP |
| 4 | Mexico | Mexico D.F. | 05021 |
| 5 | Sweden | Lulea | S-958 22 |

4.1 SELECT statement

To select the id, fname, and lname columns from T_CUSTOMERS, we use SQL SELECT:

cursor.execute("SELECT id, fname, lname FROM T_CUSTOMERS")

The executed SELECT statement returns rows, which we can retrieve using one of the Cursor fetch functions:

  • fetchone(): retrieves one row only, and move the Curser to the next row
  • fetchall(): retrieves all rows, and move the Curser to the end
  • fetchmany(size): retrieves a number of rows, and move the Curser to the next row
Note: if there are no rows left, fetchone() will return None, whereas fetchall() and fetchmany() will both return empty lists

For example, to retrieve one row only:

row = cursor.fetchone()

row will be a Tuple of values, one for each of the columns in the SELECT statement:

print(row)
Out[1]:
(1, 'Maria', 'Anders')

To get each column’s description, the Cursor description attribute can be used:

descriptions = cursor.description

This description will be a list of Tuples, one for each column. Each Tuple has 7 items to describe the column, the one that matters to us is the first item, which is the column’s name (or alias, if specified in the SQL SELECT):

for description in descriptions:
print(description[0])
Out[2]:
id
fname
lname

To retrieve all the remaining rows as a list, we can use fetchall():

rows = cursor.fetchall()
for row in rows:
print(row)
Out[3]:
(2, 'Antonio', 'Moreno')
(3, 'Thomas', 'Hardy')
(4, 'Ana', 'Trujillo')
(5, 'Christina', 'Berglund')

4.2 UPDATE statement

To update an existing record in T_CUSTOMERS, we use SQL UPDATE with a WHERE clause

Note: if you do not use a WHERE clause in the UPDATE statement to specify which record(s) should be updated, all records will be updated!

Let’s update lname=’Trujillo’ to ‘Taqueria’:

cursor.execute("UPDATE T_CUSTOMERS SET lname = 'Taqueria' WHERE lname='Trujillo'")

If we want to know how many records were modified by the last SQL statement, we can use the Cursor rowcount attribute. This will return -1 if no SQL has been executed or if the number of rows is unknown:

print(cursor.rowcount)
Out[4]:
1

Let’s check if that record has been updated successfully:

rows = cursor.execute("SELECT id, fname, lname FROM T_CUSTOMERS").fetchall()
for row in rows:
print(row)
Out[5]:
(1, 'Maria', 'Anders')
(2, 'Antonio', 'Moreno')
(3, 'Thomas', 'Hardy')
(4, 'Ana', 'Taqueria')
(5, 'Christina', 'Berglund')

4.3 DELETE statement

To delete a customer’s record from T_CUSTOMERS, we use SQL DELETE with a WHERE clause

Note: if you do not use a WHERE clause in the DELETE statement to specify which record(s) should be deleted, all records will be deleted!

Let’s delete the record with id=4:

cursor.execute("DELETE FROM T_CUSTOMERS WHERE id=4")

Let’s check how many records were modified:

print(cursor.rowcount)
Out[6]:
1

Let’s check if that record has been deleted successfully:

rows = cursor.execute("SELECT id, fname, lname FROM T_CUSTOMERS").fetchall()
for row in rows:
print(row)
Out[7]:
(1, 'Maria', 'Anders')
(2, 'Antonio', 'Moreno')
(3, 'Thomas', 'Hardy')
(5, 'Christina', 'Berglund')

4.4 INSERT INTO statement

To insert a new record into T_CUSTOMERS, we use SQL INSERT INTO

Let’s insert the same record we have deleted:

cursor.execute("INSERT INTO T_CUSTOMERS(id, fname, lname) VALUES (4, 'Ana', 'Trujillo')")

Let’s check how many records were modified:

print(cursor.rowcount)
Out[8]:
1

Let’s check if that record has been inserted successfully:

rows = cursor.execute("SELECT id, fname, lname FROM T_CUSTOMERS").fetchall()
for row in rows:
print(row)
Out[9]:
(1, 'Maria', 'Anders')
(2, 'Antonio', 'Moreno')
(3, 'Thomas', 'Hardy')
(4, 'Ana', 'Trujillo')
(5, 'Christina', 'Berglund')

4.5 Long SQL statement

Let’s say that we want to select the full name, and city and postal code of a customer with the last name ‘Trujillo’. We (or someone else) has written the following SQL statement to do this:

SELECT customer.lname,
customer.fname,
address.city,
address.code
FROM T_CUSTOMERS AS customer
LEFT JOIN T_ADDRESSES AS address ON address.id = customer.id
WHERE customer.lname = 'Trujillo'

The easiest way to pass this long SQL statement to excute() is to use the triple-quote string format to encapsulate the statement. White-spaces (including tabs and newlines) should be ignored by the SQL database engine, but if we want to remove them from the left, we can use the dedent() function from the built-in textwrap module:

import textwrap
sql_query_string = textwrap.dedent("""
SELECT customer.lname,
customer.fname,
address.city,
address.code
FROM T_CUSTOMERS AS customer
LEFT JOIN T_ADDRESSES AS address ON address.id = customer.id
WHERE customer.lname = 'Trujillo'
""")
print(cursor.execute(sql_query_string).fetchall())
Out[10]:
[('Trujillo', 'Ana', 'Mexico D.F.', '05021')]

Finally, we must call commit() on the connection that created this cursor, otherwise our changes will be lost if we close the Script:

cnxn.commit()

V- Reading data into Pandas DataFrame

Pandas can read an SQL statement directly into a dataframe without using a Cursor. This can be done using the read_sql(sql_string, connection) function

Let’s read the last SQL statement into a DataFrame:

import textwrap
import pandas as pd
sql_query_string = textwrap.dedent("""
SELECT customer.lname,
customer.fname,
address.city,
address.code
FROM T_CUSTOMERS AS customer
LEFT JOIN T_ADDRESSES AS address ON address.id = customer.id
WHERE customer.lname = 'Trujillo'
""")
df = pd.read_sql(sql_query_string, cnxn)
print(df)
Out[11]:
lname fname city code
0 Trujillo Ana Mexico D.F. 05021

Hope this has been helpful. Please leave a comment for any improvements or corrections 😃