In this article, I’ll introduce you to the OpenPyxl package in Python and show you how to carry out some basic Excel functionalities using this powerful tool.
What is OpenPyxl?
OpenPyxl is a powerful and versatile Python library designed for reading, writing, and manipulating Excel files (with the .xlsx extension). It is widely used in data analysis, automation, and reporting tasks, making it an essential tool for anyone who works with Excel in a Python environment. OpenPyxl is ideal for automating repetitive tasks, generating reports, or integrating Excel functionalities into larger Python applications.
Key Features of OpenPyxl:
- Create and Edit Workbooks: Easily create new Excel workbooks or edit existing ones.
- Read Data: Extract data from Excel files for analysis or processing.
- Write Data: Insert data into Excel workbooks, including complex data structures like pandas DataFrames.
- Formatting: Apply various styles, such as fonts, colors, and borders, to cells for better presentation.
- Merging Cells: Combine multiple cells into one for clearer data organization.
- Formulas and Functions: Use Excel formulas and functions within your Python scripts.
- Charts and Graphs: Create visual representations of your data directly in Excel.
Let’s Dive Right In!
Installation
You can install OpenPyxl using pip, the Python package manager, with the command below:
pip install openpyxl
Basic Utilities
In this section I would go through some basic utilities of Openpyxl to help you get started with using the package.
Importing OpenPyxl
- How to import the library in your Python script:
import openpyxl
Creating a Workbook and Worksheets
- Creating a Workbook:
# How to create a new workbook.
from openpyxl import Workbook
wb = Workbook()
- Creating and Naming Worksheets:
# How to create a new worksheet and set its title.
ws = wb.active
ws.title = "ExampleSheet"
- Adding More Worksheets:
# How to add more worksheets.
ws2 = wb.create_sheet(title="AnotherSheet")
Inserting Data into a Worksheet
To insert data into a cell using OpenPyxl, you just need to make a reference to any of the worksheets you created above, specify the cell you want to write to, and assign a value to it. An example is shown below:
# Write to a cell:
ws['A1'] = 'Hello, Bwhiz here!' # Write text to cell A1
ws['B1'] = 42 # Write a number to cell B1
ws['C1'] = 3.14 # Write a float to cell C1
You can also insert a Pandas DataFrame into a Worksheet.
- Using Pandas to Create a DataFrame:
import pandas as pd
# creating a sample DataFrame :
data = {
'Name': ['Bwhiz', 'Anna', 'Peter', 'Linda'],
'Age': [90, 2, 35, 32]
}
df = pd.DataFrame(data)
- Inserting DataFrame into Worksheet:
To insert a DataFrame directly into a worksheet you can use thedataframe_to_rows
function from theopenpyxl.utils.dataframe
. An example is shown below:
from openpyxl.utils.dataframe import dataframe_to_rows
# loop through the dataframe using 'dataframe_to_rows' and append it to the
# active worksheet you have selected, in this case it is 'ws'
for r in dataframe_to_rows(df, index=False, header=True):
ws.append(r)
An example of the worksheet ‘ws’ that we created with the data we’ve written to it is shown below.
We can also merge cells as well as color the cells using OpenPyxl.
Merging Cells in a Worksheet
- Merging Cells
# to merge cells using OpenPyxl, use the 'merge_cells' method on the active
# worksheet and specify the range of cells you want to merge and then
# insert the text you want to write in the first cell of the merged cell
# range. An example is shown below:
ws.merge_cells('B1:D1')
ws['A1'] = "Merged Cell"
Coloring Cells in a Worksheet
Coloring cells in OpenPyxl involves using the PatternFill
class from the openpyxl.styles
module.
- Import the PatternFill Class:
# The PatternFill class is used to apply color patterns to cells.
# Import it from the openpyxl.styles module.
from openpyxl.styles import PatternFill
- Create a PatternFill Object:
A PatternFill object defines the fill pattern for a cell. In the example shown below, it is used to fill a cell with a solid color.
The PatternFill class takes several parameters, but the main ones are:
- start_color: The starting color (or the only color for a solid fill).
- end_color: The ending color (which is the same as the starting color for a solid fill).
- fill_type: The type of fill pattern. Common values are ‘solid’, ‘darkGrid’, ‘lightGrid’, etc.
fill = PatternFill(start_color="FF224677", end_color="FF224677", fill_type="solid")
In this example, FF224677 represents the color blue in hexadecimal RGB notation, and fill_type=”solid” specifies a solid fill pattern.
- Apply the Fill to a Cell:
To apply the fill to a specific cell, assign the PatternFill object to the fill attribute of the cell.
ws['A1'].fill = fill
The line of code above sets the fill of cell A1 in the worksheet ws1 to the blue color defined by the PatternFill object.
Saving the Workbook
- Saving the Workbook:
# When you are done with inserting data into the worksheets as well as styling
# the worksheets,, you can save the whole workbook using the command below:
wb.save('example.xlsx')
Putting it all together
It’s time to put everything we have looked at together into a single Python script and see the output:
# Importing necessary libraries
import pandas as pd
from openpyxl import Workbook
from openpyxl.utils.dataframe import dataframe_to_rows
from openpyxl.styles import PatternFill, Alignment
# Creating a new workbook and worksheets
wb = Workbook()
ws1 = wb.active
ws1.title = "ExampleSheet"
# Creating a DataFrame
data = {
'Name': ['Bwhiz', 'Anna', 'Peter', 'Linda'],
'Age': [90, 2, 35, 32]
}
df = pd.DataFrame(data)
# Merging Cells
ws1.merge_cells('B1:D1')
ws1['B1'] = "Merged Cell"
# the line below centers the text in the Merged Cell
ws1['B1'].alignment = Alignment(horizontal='center', vertical='center')
# Coloring cells
fill = PatternFill(start_color="FF224677", end_color="FF224677", fill_type="solid")
ws1['B1'].fill = fill
# Inserting DataFrame into Worksheet
for r in dataframe_to_rows(df, index=False, header=True):
ws1.append(r)
# Saving the workbook
wb.save('mediumfinal_example.xlsx')
The output of the code above is shown as seen below:
Conclusion
In this article, we’ve explored the basic utilities of the OpenPyxl library for working with Excel files in Python. We’ve learned how to create a new workbook, insert data from a DataFrame, merge cells, and apply formatting such as coloring and text alignment.
While we’ve covered fundamental functionalities that are commonly used in everyday tasks, OpenPyxl offers a wide range of advanced features and options for even more sophisticated Excel manipulation. To further expand your knowledge and take advantage of these advanced functionalities, I highly recommend exploring the official API documentation of OpenPyxl.
If you want to see how OpenPyxl can be integrated into a codebase that runs on an automated schedule, be sure to check out this repository:
https://github.com/Bwhiz/Auto-Excel-Reports
By examining real-world examples and implementations, you’ll gain practical insights into leveraging OpenPyxl effectively within automated workflows.
Happy coding!