Create SQL table using Python for loading data from Pandas DataFrame

Mayank Gangrade
2 min readMay 6, 2020

--

Some operations like df.merge do not preserve the order of the columns in a resultant dataframe or sometimes we want to apply different operations on the dataframe which ultimately converted into a new dataframe with extra column or change in data types of some columns. And, if we want to load the dataframe data to a SQL table, one of the challenges we face in doing this is to create an appropriate table in your database to load the data.

Here I am presenting a small example of how to create a table with the column name as same as dataframe columns and how to select the appropriate data types for columns.

Let’s say we have data related to investors and their investments in two separate CSV files and we want to merge both CSV files and load the resultant data into a database. Dummy data generated from mockaroo.com

Step 1: Read the CSV files into data-frames

import pandas as pd
investorDF = pd.read_csv('investor.csv')
investmentDF = pd.read_csv('investment.csv')

Step 2: Perform the merge/join operation or any other calculation on data-frames

resultDF = investorDF.merge(investmentDF)

Step 3: Collect column names into a python list

columnName = list(resultDF.columns.values)

Step 4: Deciding data types for SQL table is bit tricky. As data types of pandas data-frame and DBMS are completely different. For collecting the data types in to a list we can write a small piece of code like

def getColumnDtypes(dataTypes):
dataList = []
for x in dataTypes:
if(x == 'int64'):
dataList.append('int')
elif (x == 'float64'):
dataList.append('float')
elif (x == 'bool'):
dataList.append('boolean')
else:
dataList.append('varchar')
return dataList
columnDataType = getColumnDtypes(resultDF.dtypes)

Step 5: Now we have lists of all the column names and column data types . We need to write a small piece of code again to complete create table statement

createTableStatement = 'CREATE TABLE IF NOT EXISTS investordetails ('for i in range(len(columnDataType)):
createTableStatement = createTableStatement + '\n' + columnName[i] + ' ' + columnDataType[i] + ','
createTableStatement = createTableStatement[:-1] + ' );'

Step 5 will give us our required create table statement.

Step 6: Now we can use python library like psycopg2 or any other as per our database system to execute the statement to create a table

import psycopg2conn = psycopg2.connect(dbname='testdb', host='abc.xyz.com', port='xxxx', user='mydbuser', password='xxxxxxxx')cur = conn.cursor()cur.execute(createTableStatement)conn.commit()

By following all the above steps you should be able to create a table into a database for loading data from Pandas data-frame.

Thank you for going through this article. I hope you will find this useful.

You can find complete code here as well.

--

--