Python/Automation/1 → Extract data from Excel sheet and prepare oracle SQL queries for Bulk Database load

5 min readAug 24, 2023

This is a post on how to read the data in sheets of an excel workbook and prepare a SQL query from the data, which can be used to do a bulk Database table insert and update.

(1) Problem:

Yesterday we got a excel with set of data which we were asked to load into Oracle Database table. While loading the excel data directly into Oracle Database we were facing issue, as only specific column data was needed to be updated to the table.

So, here we came up with the following piece of python code to aid in the bulk load of data in Oracle DB table.

Let’s first look into the type of sheet we got to better understand the problem area.

An excel (.xlsx extension) having 2 sheets one for records to be inserted and another for delta records to be updated in database table. Please refer the screenshots pasted in next section.

(2) Solution:

i. For example, taking the table name as “TEST_TABLE” residing in “SCM” schema where the data is to be loaded.

ii. For “insert” sheet, it is like a complete load of all rows & columns data.

ii. For “update” sheet “Column_1” and “Column_2” are 2 columns which are representing the composite keys of the table. Using those 2 columns we are going to form query and update the table.

As per the ask the yellow highlighted fields need to be updated only. Other columns were not needed to be updated. They were marked as yellow.

So, we have made them bold also for easy picking of records from python script using styleframe library without relying on the color highlighted.

Using styleframe we can sense the colors also using color hex codes, but that can be risky during picking records.

iii. Also, the “Column_2" and “Column_5" are integer fields, which we need to take into account during SQL script preparation.

iv. Properties file is used to pass the directory name where input excel is kept, excel file name, DB schema and Table name, name of integer fields in Database table.

config.properties
..........................................................................
# Excel & DB Table Details as properties ;
# To Modify as per path, file name & DB schema/table name
workbook_file_dir=C:\\TEST_DIR\\Excel2Sql
workbook_file_name=DB_Bulk_Load.v100.1.1.xlsx
db_schema=SCM
db_table_name=TEST_TABLE
insert_integer_fields=Column_2,Column_5
update_integer_fields=Column_2,Column_5

v. Only filter column names in Update SQL commands are kept as hard coded in python script. This can be improved upon to make it dynamic.

So, to convert the data in the 2 sheets in the excel workbook into a SQL command, we have used the following python script. Before running the script, need to import following libraries in python IDE.

pip install openpyxl
pip install styleframe
pip install jproperties

Excel2SqlGenerator.py   (windows)
..........................................................................

import openpyxl as xl
import os
from styleframe import StyleFrame
from jproperties import Properties

'''comment : TO MODIFY as per path & file name'''
cfg_dir = "C:\\TEST_DIR\\Excel2Sql"
cfg_file_name = "config.properties"
cfg_file = cfg_dir + "\\" + cfg_file_name

prop_dict = {}
configs = Properties()

with open(cfg_file, 'rb') as cfg_prop:
    configs.load(cfg_prop)

prop_items = configs.items()

for prop_key_val in prop_items:
    prop_key = prop_key_val[0]
    prop_val = prop_key_val[1].data
    prop_dict[prop_key] = prop_val

excel_file_dir = prop_dict.get('workbook_file_dir')
excel_file_name = prop_dict.get('workbook_file_name')

if prop_dict.get('db_schema') != "":
    table_name = prop_dict.get('db_schema') + "." + prop_dict.get('db_table_name')
else:
    table_name = prop_dict.get('db_table_name')

insert_sheet_integer_fields = []
update_sheet_integer_fields = []

if prop_dict.get('insert_integer_fields') != "":
    insert_sheet_integer_fields = list(prop_dict.get('insert_integer_fields').split(","))

if prop_dict.get('update_integer_fields') != "":
    update_sheet_integer_fields = list(prop_dict.get('update_integer_fields').split(","))

excel_file = excel_file_dir + "\\" + excel_file_name
sql_file = excel_file_dir + "\\" + excel_file_name.rsplit(".",1)[0] + ".sql"

if os.path.isfile(sql_file):
    os.remove(sql_file)
    print("File removed : " + sql_file)

wb_obj = xl.load_workbook(excel_file)
sheet_list = wb_obj.sheetnames

sqlOutFile = open(sql_file, "a")

for sheet in sheet_list:

    sf = StyleFrame.read_excel(excel_file, read_style=True, sheet_name=sheet, use_openpyxl_styles=False)

    df = sf.data_df
    col_names_str = str(df.columns)

    col_names = col_names_str[col_names_str.find("[") + len("["):col_names_str.rfind("]")]
    col_names = col_names.replace("\n", '').replace(" ","")
    col_names_list = col_names.split(",")
    colCnt = len(col_names_list)

    if sheet.upper() == "INSERT":

        colNamesStr = ','.join(col_names_list)

        for indInsRow, insRow in df.iterrows():

            insertValStr = ""
            ins_cell_data = ""
            insertStr = "INSERT INTO " + table_name + "(" + colNamesStr + ") VALUES({insStr});"

            for insCol in col_names_list:

                ins_cell_data = str(insRow[insCol])

                if ins_cell_data == "nan":
                    ins_cell_data = "''"

                elif insCol in insert_sheet_integer_fields:
                    ins_cell_data = ins_cell_data

                else:
                    ins_cell_data = "'" + ins_cell_data + "'"

                if col_names_list.index(insCol) != (colCnt - 1):
                    insertValStr = insertValStr + ins_cell_data + ","

                elif col_names_list.index(insCol) == (colCnt - 1):
                    insertValStr = insertValStr + ins_cell_data
                    insertStr = insertStr.format(insStr=insertValStr)

                ins_cell_data = ""
            '''comment : writing for the whole row as a insert sql command'''
            sqlOutFile.write(insertStr + "\n")

    sqlOutFile.write("\n")

    if sheet.upper() == "UPDATE":

        for indUpRow, upRow in df.iterrows():

            updateStr = "UPDATE " + table_name + " SET"
            filterStr = " WHERE Column_1={repl_1} AND Column_2={repl_2};"
            sqlUpStr = ""
            sqlUpStrList = []

            '''comment : need to configure the Update DB command filter columns here'''
            Column_1 = ""
            Column_2 = ""

            cur_cell_style = "False"
            boldFlag = "N"
            appendFlag = "N"
            up_cell_data = ""

            for upCol in col_names_list:

                cur_cell_style = str(sf.at[indUpRow, upCol].style.bold)

                up_cell_data = str(upRow[upCol])

                if upCol.upper() == "COLUMN_1" and upCol in update_sheet_integer_fields:
                    Column_1 = "" + up_cell_data + ""

                elif upCol.upper() == "COLUMN_1" and upCol not in update_sheet_integer_fields:
                    Column_1 = "'" + up_cell_data + "'"

                elif upCol.upper() == "COLUMN_2" and upCol in update_sheet_integer_fields:
                    Column_2 = "" + up_cell_data + ""

                elif upCol.upper() == "COLUMN_2" and upCol not in update_sheet_integer_fields:
                    Column_2 = "'" + up_cell_data + "'"

                if cur_cell_style == "True":

                    boldFlag = "Y"

                    if up_cell_data == "nan":
                        up_cell_data = "''"

                    elif upCol in update_sheet_integer_fields:
                        up_cell_data = up_cell_data

                    else:
                        up_cell_data = "'" + up_cell_data + "'"

                    if appendFlag == "N":
                        updateStr = updateStr + " " + upCol + "=" + up_cell_data
                        appendFlag = "Y"

                    elif appendFlag == "Y":
                        updateStr = updateStr + "," + upCol + "=" + up_cell_data

                up_cell_data = ""

                '''comment : only hardcoded part is the DB column names replaced using repl_1 & repl_2 placeholders'''
                if (col_names_list.index(upCol) == colCnt - 1) and (boldFlag == "Y"):
                    filterStr = filterStr.format(repl_1=Column_1, repl_2=Column_2)
                    sqlUpStr = updateStr + filterStr
                    '''comment : writing for the whole row as a update sql command'''
                    sqlOutFile.write(sqlUpStr + "\n")

sqlOutFile.close()
print("File Written : " + sql_file)

When Python script is executed, the below SQL file is obtained as output. This can be executed to do a bulk load in Database table.

DB_Bulk_Load.v100.1.1.sql
..........................................................................

INSERT INTO SCM.TEST_TABLE(Column_1,Column_2,Column_3,Column_4,Column_5,Column_6) VALUES('Row1_Column1',101,'Row1_Column3','Test_Data_1',201,'Test_Data_11');
INSERT INTO SCM.TEST_TABLE(Column_1,Column_2,Column_3,Column_4,Column_5,Column_6) VALUES('Row2_Column1',102,'Row2_Column3','Test_Data_2',202,'Test_Data_12');
INSERT INTO SCM.TEST_TABLE(Column_1,Column_2,Column_3,Column_4,Column_5,Column_6) VALUES('Row3_Column1',103,'Row3_Column3','Test_Data_3',203,'Test_Data_13');

UPDATE SCM.TEST_TABLE SET Column_5=401 WHERE Column_1='Row4_Column1' AND Column_2=301;
UPDATE SCM.TEST_TABLE SET Column_5=402,Column_6='Test_Data_55' WHERE Column_1='Row5_Column1' AND Column_2=302;
UPDATE SCM.TEST_TABLE SET Column_6='Test_Data_66' WHERE Column_1='Row6_Column1' AND Column_2=303;

………………………………………………………………………………………………

Python/Automation/1 → Extract data from Excel sheet and prepare oracle SQL queries for Bulk Database load

Written by integratio