Generate Excel using Pandas and Xlsxwriter
Python is a popular high-level programming language known for its simplicity, ease of learning, and extensive support for modules and packages. It is widely used a lot in the field of data, due to its powerful libraries that facilitate data processing. In this article, I will discuss two Python libraries that are useful in the data industry: Pandas and Xlsxwriter.
In data engineering, it’s often necessary to export data from a database into formats that are accessible to non-technical users, such as Excel. With Python, we can automate the generation of Excel files from databases using Pandas and Xlsxwriter. Pandas allows us to manipulate the data, while Xlsxwriter enables us to format the data in Excel, including setting fonts, colors, positions, and more.
In the following section, I’ll demonstrate how to generate an Excel file using Pandas and Xlsxwriter. For this example, I’ll use a public dataset from Kaggle. It’s a Coffee Shop Sales dataset, which will be stored in Google BigQuery. I will transform the data as shown in the image, which generates a report to calculate total sales for each product from January to June, identify any total sales that have decreased from the previous month, and calculate the total monthly sales for all products.
Install Pandas and Xlsxwriter Library
First, ensure that you have installed the pandas
and xlsxwriter
library. If you haven’t, you can install it using the following commands.
pip install pandas
pip install xlsxwriter
Authenticate & Connect to BigQuery
Since I stored the data in BigQuery, I need to authenticate and connect to BigQuery to access it. If you do so, you can follow these commands.
from google.colab import auth
auth.authenticate_user()
from google.cloud import bigquery
project_id = 'sal-bq-learning'
client = bigquery.Client(project=project_id)
Get Data
Since I want to get the monthly values for each product, I’m using the following query to retrieve the data I need. I also converted the data into a dictionary.
query = client.query('''
SELECT
store_location,
product_category,
product_type,
FORMAT_DATE('%B', transaction_date) AS transaction_month,
sum(unit_price) unit_price
FROM `sal-bq-learning.bq_demo.coffee_shop_sales`
group by 1,2,3,4
order by 1,2,3
''')
data = [dict(i) for i in query.result()]
Import Pandas and convert the data into a DataFrame. In this case, I want to view the sales data from the Astoria
store, so I will filter the data to include only the Astoria
data.
import pandas as pd
df = pd.DataFrame(data)
store_location = 'Astoria'
df = df[df['store_location'] == store_location]
Pivot Table
With Pandas, we can pivot the table, and I’ll be using this feature in this case. I’ll set store_location
, product_category
, and product_type
as the indexes, transaction_month
as the column, and unit_price
as the value.
Since the data is already sorted as needed, I’ll set sort=False
. However, you can set it to True
if you want Pandas to sort the columns for you.
df_pivot = df.pivot_table(
index=['store_location', 'product_category', 'product_type'],
columns='transaction_month',
values='unit_price',
sort=False)
Generate Excel using Xlsxwriter
The data is all set. Let’s move on to the next step: producing an Excel file using xlsxwriter
.
First, make sure to import the xlsxwriter
library. Next, define the workbook by calling xlsxwriter.Workbook(‘file_name.xlsx’)
and add a worksheet to the workbook. You can customize the workbook and worksheet names as needed.
import xlsxwriter
workbook = xlsxwriter.Workbook('coffee_shop_sales.xlsx')
worksheet = workbook.add_worksheet('Astoria')
I will separate the header, product type, and values, as I will write them separately in the script. We can use these syntaxes to extract each part.
header = df_pivot.columns.tolist() # Get the columns (transaction month)
indexes = df_pivot.index.tolist() # Get the indexes (store_location, product_category, product_type)
values = df_pivot.values.tolist() # Get the values
# Get the product type
product_type = [index[2] for index in indexes]
Set the start column and start row. Both the column and row notation are zero based, so the first column and row started from zero. I want to write the data from the first column and the first row.
start_col = 0
start_row = 0
Formatting using Xlsxwriter
With xlsxwriter
, we can define the format of the columns, such as defining the font type, font size, bold, italic, alignment, background color, and more. These formats are defined as a dictionary. I’ll assign these formats to variables so they can be used later in the script.
default_format = {'border': True}
header_format = {**default_format, 'bold': True, 'align': 'center', 'valign': 'vcenter'}
Write Data
Let’s write the data into the sheet. There are several ways to write data using xlsxwriter
, but here are some syntax that I use in this case.
# To write data in a single column
worksheet.write(row, column, data)
worksheet.write('A1', data)
# To write data in a merged column
worksheet.merge_range(first_row, first_col, second_row, second_col, data)
worksheet.merge_range('A1:B2', data)
Here’s the example code to write the title, header, product type, and values. In those codes, I use workbook.add_format
to apply formatting to the columns.
Title
# Title
## Write A1 merge A2
worksheet.merge_range(start_row, start_col, start_row + 1, start_col, store_location, workbook.add_format(header_format))
Header
# Header
for col_num, header_name in enumerate(header):
worksheet.merge_range(start_row, col_num + 1, start_row + 1, col_num + 1, header_name, workbook.add_format(header_format))
Product Type
# Set start_row +2 because product type will be written after the title
start_row += 2
# Product Type
for i, name in enumerate(product_type):
worksheet.write(start_row + i, start_col, name, workbook.add_format(default_format))
Values
Note: You can also add additional formatting options as needed.
# Values
for row, row_data in enumerate(values):
row = start_row + row
for col, col_data in enumerate(row_data):
col = start_col + 1 + col
worksheet.write(row, col, col_data, workbook.add_format({**default_format, "num_format": "#,##0.0"}))
Using Formula
We can also use formulas with xlsxwriter
, just like in Excel. In this case, I want to calculate the total values for each month, and apply a background color to highlight them.
# Define format for total and write `Total`
total_format = {**default_format, 'bold': True, 'bg_color': '#C3F0F3', "num_format": "#,##0.0"}
worksheet.write(row + 1, start_col, "Total", workbook.add_format(total_format))
To write the data, I’m using xl_rowcol_to_cell
to convert (row, col)
into A1
notation. For the formula, we can use worksheet.write_formula(target_column, formula)
to write it to the worksheet.
# Syntax to write data with formula
worksheet.write_formula(row, col, formula)
# Example
worksheet.write_formula('A4', '=IF(A3>1,"Yes", "No")')
# start_row = 2
# row = 30
for i in range(1, len(header) + 1):
start_cell = xlsxwriter.utility.xl_rowcol_to_cell(start_row, i)
end_cell = xlsxwriter.utility.xl_rowcol_to_cell(row, i)
target = xlsxwriter.utility.xl_rowcol_to_cell(row + 1, i)
formula = f'=SUM({start_cell}:{end_cell})'
worksheet.write_formula(target, formula, workbook.add_format(total_format))
Add Conditional Formatting
We can also add conditional formatting based on specific needs. In this example, I want to highlight the values of each product that are lower than the values from the previous month.
# Syntax for conditional formatting
worksheet.conditional_format(0, 0, 2, 1, {'type': 'cell',
'criteria': '>=',
'value': 50,
'format': format1})
# This is equivalent to the following:
worksheet.conditional_format('A1:B3', {'type': 'cell',
'criteria': '>=',
'value': 50,
'format': format1})
for col in range(2, len(header) + 1):
worksheet.conditional_format(start_row, col, row, col, {
'type': 'formula',
'criteria': f'${chr(65 + col)}{start_row + 1} < ${chr(65 + col - 1)}{start_row + 1}',
'format': workbook.add_format({'bg_color': '#f01e2c', 'font_color': 'white'})
})
Set Zoom, Set Column, Freeze Panes, Hide Gridlines
There are more formatting options available in Xlsxwriter for customizing Excel files. Here are some of them
- Set zoom: To adjust the zoom level of the Excel view.
- Set column: To set the column size.
- Freeze panes: To freeze columns or/and rows.
- Hide gridlines: To hide the gridlines in the Excel file.
# Set Zoom
worksheet.set_zoom(150)
# Set Column
worksheet.set_column(0, 0, 30) # Resize column A to 30
worksheet.set_column(1, col, 12) # Resize all value columns to 12
# Freeze Panes
# Freeze the second row (header) and the first column (product type)
worksheet.freeze_panes(2, 1)
# Hide Gridlines
## 0 = Don't hide gridlines
## 1 = Hide printed gridlines
## 2 = Hide screen and printed gridlines
worksheet.hide_gridlines(2)
Close Workbook
Don’t forget to close your workbook as you finish your code.
workbook.close()