Using SQL in Python — SQLAlchemy and pymysql

Published in

MCD-UNISON

7 min readDec 1, 2022

Objective:

On this article I will describe how to use some basic commands to interact with a database on MySQL through Python, and a simple analysis using pandas and plotly.

I separate the document on these steps:

Download a database
Import database to MySQL server
Using Mysql Workbench (Load .dump files and entity model)
Crete simple View, Function and Store Procedure
Query the database and Pandas

Requirements:

Have an SQL server and a login
MySQL Workbench

Information:

First we will need some components, install them.

# Install and restart kernel
!pip install mysql.connector
!pip install mysql.connector.python
!pip install sqlalchemy
!pip install pymysql

1 — Downloading the database

We will use as example a database provided by MySQL documentation -https://github.com/datacharmer/test_db/archive/refs/heads/master.zip

We will create a folder to separate the information, download the file and unzip it.

# 1 -  Descarga de los archivos necesarios - url https://github.com/datacharmer/test_db/archive/refs/heads/master.zip
# https://github.com/datacharmer/test_db
import os
import zipfile
import platform
import requests
from os.path import exists
import mysql.connector
from mysql.connector import errorcode
from sqlalchemy import create_engine
import pymysql
import pandas as pd
import plotly.express as px

!mkdir Files_DB    

folder_main = os.getcwd()
folder_Files = os.path.join(folder_main, "Files_DB")
file_zip_db = os.path.join(folder_Files, "test_db.zip")
folder_unziped = os.path.join(folder_Files, "test_db-master")

urlString_db = "https://github.com/datacharmer/test_db/archive/refs/heads/master.zip"

# Dowload file and store it on "Files_DB" (Folder created)
if not (exists(file_zip_db)):    
    r = requests.get(urlString_db, stream = True)
    with open(file_zip_db, "wb") as file: 
        for block in r.iter_content(chunk_size = 1024):
            if block: 
                file.write(block)
    if not (exists(folder_unziped)):
        with zipfile.ZipFile(file_zip_db, 'r') as zip_ref:
            zip_ref.extractall(folder_Files)

2 — Import to database

For import database we need to connect to the database, so we will define some functions using SQL-Alchemy.

For connect to a database through python you can use several libriaries, in this case we will use SQLAclhemy, I already test myslqconnector offered MySQL documentation, which on my personal point of view, it’s a little hard to use rather than SQLAlchemy.

Now, for connect to the database you must create an “Engine” where you can pass several parameters (Check documentation), in this case, I created the engine, once created you must connect prior to use it.

So first we will declare the engine to connect to the database and some functions that we will require to execute queries.

Note that the database is not declared in the “Connection String” because we have not created yet the database.

# Create engine to connecto to the database, pool_recycle will close the connection after 1 hour of innactivity
sqlEngine       = create_engine('mysql+pymysql://mysql_user:passwordUser@127.0.0.1/', pool_recycle=3600, pool_size=50, max_overflow=50)
dbConnection    = sqlEngine.connect()

def Sql_Get_Table(sql_query):    
  result_dataFrame = pd.read_sql(sql_query,dbConnection)
  return result_dataFrame

# Set up and end date of the employee
def SQL_FN_Employee_IsActive(employeeID):
  result = dbConnection.execute("select Employee_IsActive(" + str(employeeID) + ") ")
  for row in result:
    # Take note that in order to read the value this way, you must specify the name of the column
    # print(row["Employee_IsActive(" + str(employeeID) + ")"])
    if str(row["Employee_IsActive(" + str(employeeID) + ")"]) == "0":
      print("Employee is not longer active :S")
    else:
      print("Employee is still working with us :D")

def SQL_SP_Employee_EndDate(employeeID, endDate):
  dbConnection.execute("call Employee_ENDS(" + str(employeeID) + ", '" + endDate +"') ")

def SQL_Run_Query(query):
  dbConnection.execute(query)
  # dbConnection.execute("commit")

Here we will read the SQL file, line by line to only, to execute query by query.

This code will avoid comments and only execute SQL commands.

file_zip_db = os.path.join(folder_unziped, "employees.sql")
file_sql = open(file_zip_db, "r")
sql_query_emp = file_sql.readlines()

sql_command = ''
# Iterate over all lines in the sql file
for line in sql_query_emp:
    # Ignore commented lines    
    if not line.startswith('--') and line.strip('\n') and not line.startswith('#'):             
        # Append line to the command string
        sql_command += line.strip('\n')

        # If ends with space
        if sql_command.endswith(' '):
            sql_command = sql_command[0:len(sql_command)-1]
        # If the command string ends with ';', it is a full statement
        if sql_command.endswith(';'):
            # Try to execute statement and commit it
            try:
                SQL_Run_Query(sql_command)

            # Assert in case of error
            except:
                print('Oops')

            # Finally, clear command string
            finally:
                sql_command = ''

3 — Using MySQL Workbench — Load Dump files

Once executed the previous code, we will proceed to add values to the tables using the “.dump” files provided by the example.

1. First we must open our software and go to the administration tab
2. Select “Data Import/Restore”
3. On the tab “Import from disk”
3.1 Select “employees” database
3.2 Select “Import from self-Contained file”
3.3 Put the path to the first dump
3.4 Click on “Start Import”
3.5. Repeat the point 2–4 for each dump file in the right order. (There are tables with columns dependency of other tables)

Dump files order

\load_departments.dump
\load_employees.dump
\load_dept_emp.dump
\load_dept_manager.dump
\load_titles.dump
\load_salaries1.dump
\load_salaries2.dump
\load_salaries3.dump

3 — Using MySQL Workbench — Crate entity model

Let’s go to create a entity model diagram using MySQL workbench, for archive this we must:

Click on the menu: Database -> Reverse engineer
Type our credentials and database we want to access
Select the database we want to generate the diagram (employees)
Next and Execute
Next and Finish

This Entity Model Diagram allows us to rapidly identify the database structure, which columns have each table and the relation between them.

Here we can make some actions, I will just enumerate some I considerer important.

Save an image of this diagram: Menu: File -> Export -> Export as PNG
Create new tables: With the button on the toolbox on the left panel (Also you can press letter “T” inside of the diagram)
Add/Edit columns

For more information of the Entity-Relation Diagram, you can consult the next links:

4 — Creating View, Function and Store Procedure

You do not know what those are?, helpful links

Now, we will create a view, function and Store Procedure, for that I create a 3 variables with the SQL Syntax. For create your owns, you can use MySQL workbench for easly get the syntax.

Run this queries on MySQL workbench.

In case of some error creating Functions or Store Procedure you can run:

``` SQL_Run_Query(“SET GLOBAL log_bin_trust_function_creators = 1;”) ```

Function

```

DELIMITER $$

USE `employees`$$

CREATE FUNCTION `Employee_IsActive` (idEmployee int)

RETURNS INTEGER

BEGIN

declare isActive integer;

SELECT count(*) into isActive FROM salaries where to_date = ‘9999–01–01’ and emp_no = idEmployee;

IF isActive = 0 then

return 0;

END IF;

RETURN 1;

END$$

DELIMITER ;

```

Store Procedure

```

DELIMITER $$

USE `employees`$$

CREATE PROCEDURE `Employee_ENDS` (p_emp_no int, end_date date)

BEGIN

declare existEmp int;

SELECT count(*) into existEmp FROM employees where emp_no = p_emp_no;

IF existEmp = 1 THEN

UPDATE salaries set to_date = end_date where emp_no = p_emp_no and to_date = ‘9999–01–01’;

UPDATE titles set to_date = end_date where emp_no = p_emp_no and to_date = ‘9999–01–01’;

UPDATE dept_emp set to_date = end_date where emp_no = p_emp_no and to_date = ‘9999–01–01’;

UPDATE dept_manager set to_date = end_date where emp_no = p_emp_no and to_date = ‘9999–01–01’;

END IF;

END$$

DELIMITER ;

```

# View - Query
sql_Query_View = """
CREATE  OR REPLACE VIEW v_All_Employee_Data AS
SELECT e.emp_no, birth_date,first_name, last_name, e.gender, e.hire_date, d.dept_no, d.dept_name, de.from_date as dept_from_Date, de.to_date as dept_to_date, t.title, t.from_date, t.to_date
FROM employees e
inner join dept_emp de on de.emp_no = e.emp_no
inner join departments d on d.dept_no = de.dept_no
inner join titles t on t.emp_no = e.emp_no;
"""
# Update engine now that the database is created

sqlEngine       = create_engine('mysql+pymysql://mysql_user:passwordUser@127.0.0.1/employees', pool_recycle=3600, pool_size=50, max_overflow=50)
dbConnection    = sqlEngine.connect()
SQL_Run_Query(sql_Query_View)

5 — Query to Database and Pandas

Now that we have crated successfully our database and his values, we can proceed to run some queries.

Run Function — Check if the employee is still working (by ID)

SQL_FN_Employee_IsActive(10011)

How to add a new table?

Well there is a function in pandas that allow us to easly add a new table, just requieres have table in a dataframe and a “driver” to connect with the database, in this case we will use our engine of sqlAlchemy that we are created.

urlCountries = "https://raw.githubusercontent.com/umpirsky/country-list/master/data/en_US/country.csv"
df_countries = pd.read_csv(urlCountries, encoding = 'utf8')
df_countries.to_sql("countries", sqlEngine, if_exists='append', index=False)

Using pandas from SQL

Now that we have created some useful functions, now we can use them to extract useful information using pandas.

We will use an example problematic…

On the company there is a complaint about woman are less paid for the same job than man.

So, for this complaint we will make a chart to display visualy if this it is true.

# Get the salaries of all employees active in the company
query_salaries = '''
SELECT ve.emp_no, ve.first_name, ve.last_name, gender, hire_date, dept_no, dept_name, ve.to_date, title, salary FROM  v_all_employee_data ve
INNER JOIN salaries s on s.emp_no = ve.emp_no WHERE s.to_date = '9999-01-01' AND dept_to_date = '9999-01-01' AND ve.to_date = '9999-01-01'  ORDER BY emp_no;
'''
df_Salaries = Sql_Get_Table(query_salaries)
# Create a chart 
fig = px.scatter(df_Salaries.head(5000), x="title", y="salary", color="gender", opacity=0.3)
fig.show()

Well that’s all from part, I hope you have found this article useful. :D