Simple guide for SQL in Python

Michael Mejia
Analytics Vidhya
Published in
3 min readApr 10, 2020

A simple guide for creating SQL queries on your home PC using the local server. Many of you might wonder why a local server. The only real changes would be the hostname input and password if you want to connect to a remote server. This is more to help with understanding the process before moving on to other platforms where security would be something that would be put into consideration a well. One of the first steps is to import pymysql. If you haven’t already you can install pymysql with:

$ python3 -m pip install PyMySQL

Pymysql uses a MySQL client library that is based on pep 249. “This API has been defined to encourage similarity between the Python modules that are used to access databases. By doing this, we hope to achieve a consistency leading to more easily understood modules, code that is generally more portable across databases, and a broader reach of database connectivity from Python. (python.org)”

prod_conn = pymysql.connect(
host='localhost',
user='root',
password='Mike4568!',
port=3306,
database='FORMULA1')
with prod_conn:
cur = prod_conn.cursor()

The above code is an example of what your Python code would look like to connect to your server. If you are using your pc as the server, then host=’localhost’; if not, then you would copy the hostname instead. You can name the variables prod_conn or cur anything you want, but they will be used for the rest of your code, so make it simple. Everything will now use my “cur” variable to send SQL queries.

cur.execute(“SHOW DATABASES;”)
4

Using the .execute() will send your queries to the local or remote server. In this case, I have four databases, so it returned four, but that is not what this SQL query should return. There is one more step to retrieve the information from the query.

rows = cur.fetchall()
print(rows)
(('FORMULA1',), ('information_schema',), ('mysql',), ('performance_schema',))

If you are only sending data to your server or doing anything where you don’t need anything returned, then the .execute should be enough. If your query is multi-lined, its best to use the three quotations and save it to a variable, then execute the string variable like shown.

circuit_ = """
CREATE TABLE circuits
(
circuit_id INT NOT NULL,
circuit_ref VARCHAR(30) NOT NULL,
name VARCHAR(45) NOT NULL,
location VARCHAR(30) NOT NULL,
county VARCHAR(30) NOT NULL,
lat DECIMAL(10,6) NOT NULL,
lng DECIMAL(10,6) NOT NULL,
url VARCHAR(150),
PRIMARY KEY ( circuit_id )
);
"""
cur.execute(circuit_)

I would then recommend writing some functions to make it easier for yourself on queries you know you will use often.

def drop_table_(name):
cur.execute(f"DROP TABLE {name};")
cur.execute("COMMIT;")

Works very well when inserting 1000’s of rows of data into your tables.

def insert_values(table_, name):
insert_into = f"INSERT INTO {name} VALUES "

for i in range(0,table_.shape[0]):
a =table_.loc[i].values
r =[j for j in a]
r= str(r).replace('[','(').replace(']',')')
insert_into+=r
if i != (table_.shape[0]-1):
insert_into+=", "
return insert_into
circuit_insert = insert_values(data, "circuits")
cur.execute(circuit_insert)

After inserting all your data, you will be able to query all of your tables or rows now, and depending on how you want to format the information, it could be returned as a data frame. In my example, I’m returning the results as a data frame.

print(rows)

def data_retr(input_str):
cur.execute(input_str)
table = cur.fetchall()
a = list(cur.description)
column_names = [str(a[x][0]) for x in range(len(a))]
return pd.DataFrame(list(table),columns=column_names)

What this function is doing is executing and fetching the results of the query sent to the server. Then using .description, you can retrieve the names of each column. Lastly, just an example of what it would all look like together.

another_one = """
SELECT *
FROM (
SELECT lt.raceId, lt.driverId, lt.lap AS lt_lap, lt.position, lt.milliseconds, lt.lap, rs.circuitId, rs.date
FROM lap_times lt
INNER JOIN races rs on lt.raceId = rs.raceId
) a
LEFT JOIN(
SELECT ps.raceId AS rd, ps.lap AS ps_lap, ps.driverId AS di, ps.millisec as pit_time
FROM pit_stops ps
) b
ON (a.raceId = b.rd) AND (a.driverid = b.di) AND (a.lap = b.ps_lap);
"""
res1 = data_retr(another_one)
res1.head()

--

--