How to Read Excel File in Python using Openpyxl

Vidvatek
5 min readJan 4, 2024

--

As a data enthusiast, I understand the ubiquity of spreadsheets in the world of data, and Excel files are often the go-to format for storing structured information.

Fortunately, Python, with its arsenal of powerful libraries, simplifies working with Excel files.

In this article, I’ll delve into the process of reading Excel files in Python using the openpyxl library.

Whether I'm handling financial data, business reports, or any other Excel-based information, this guide equips me with the skills needed to effortlessly extract and manipulate data from Excel files.

Why Openpyxl?

Before we dive into the specifics, let’s briefly discuss why we’ve chosen openpyxl for this tutorial. openpyxl is a popular Python library for working with Excel files.

It allows us to read and write Excel files with ease and provides extensive functionality for managing Excel sheets, rows, columns, and cell values.

Prerequisites

To follow along with this tutorial, you’ll need to have Python installed on your system. Additionally, you should install the openpyxl library if you haven't already.

You can install it using pip:

pip install openpyxl

Reading Excel Files

Reading an Excel file using openpyxl is a straightforward process. Here's a step-by-step guide:

Step 1: Import the openpyxl Library

Start by importing the openpyxl library in your Python script.

import openpyxl

Step 2: Load the Excel File

Use openpyxl to load the Excel file you want to read.

# Load the Excel file
workbook = openpyxl.load_workbook('your_excel_file.xlsx')

Replace 'your_excel_file.xlsx' with the path to your Excel file.

Step 3: Access the Worksheet

Access the specific worksheet within the Excel file that you want to work with.

# Select the worksheet by name
worksheet = workbook['Sheet1']

Replace 'Sheet1' with the name of the worksheet you want to read.

Step 4: Retrieve Data

You can now retrieve data from the worksheet using openpyxl functions. For example, to get the value of a specific cell, use the cell() method.

# Get the value of a specific cell
cell_value = worksheet.cell(row=1, column=1).value
print(cell_value)

This code retrieves the value of the cell in the first row and first column (A1).

Example: Total number of rows

To determine the total number of rows in an Excel worksheet using the openpyxl library in Python, you can use the max_row attribute of the worksheet object. Here's how you can do it.

import openpyxl

# Load the Excel file
workbook = openpyxl.load_workbook('your_excel_file.xlsx')
# Select the worksheet by name
worksheet = workbook['Sheet1'] # Replace 'Sheet1' with the name of your worksheet
# Get the total number of rows in the worksheet
total_rows = worksheet.max_row
print(f'Total number of rows: {total_rows}')

Replace 'your_excel_file.xlsx' with the path to your Excel file and 'Sheet1' with the name of the worksheet you want to count rows in.

The worksheet.max_row attribute returns the maximum row number that contains data in the worksheet, effectively giving you the total number of rows.

Example: Total number of columns

To determine the total number of columns in an Excel worksheet using the openpyxl library in Python, you can use the max_column attribute of the worksheet object. Here's how you can do it.

import openpyxl

# Load the Excel file
workbook = openpyxl.load_workbook('your_excel_file.xlsx')
# Select the worksheet by name
worksheet = workbook['Sheet1'] # Replace 'Sheet1' with the name of your worksheet
# Get the total number of columns in the worksheet
total_columns = worksheet.max_column
print(f'Total number of columns: {total_columns}')

The worksheet.max_column attribute returns the maximum column number that contains data in the worksheet, effectively giving you the total number of columns.

Example: Display all columns name

Display all the column names (header values) in an Excel worksheet using the openpyxl library in Python, you can iterate through the header row and print each cell's value. Here's how you can do it.

import openpyxl

# Load the Excel file
workbook = openpyxl.load_workbook('your_excel_file.xlsx')
# Select the worksheet by name
worksheet = workbook['Sheet1'] # Replace 'Sheet1' with the name of your worksheet
# Assuming the headers are in the first row (row 1)
header_row = worksheet[1]
# Iterate through the cells in the header row and print their values
for cell in header_row:
print(cell.value)

Example: Display first column value

To display all the values in the first column of an Excel worksheet using the openpyxl library in Python, you can iterate through the rows and print the value of the cell in the first column (column 'A'). Here's how you can do it.

import openpyxl
# Load the Excel file
workbook = openpyxl.load_workbook('your_excel_file.xlsx')
# Select the worksheet by name
worksheet = workbook['Sheet1'] # Replace 'Sheet1' with the name of your worksheet
# Iterate through the rows and print the value in the first column (column 'A')
for row in worksheet.iter_rows(min_row=1, max_row=worksheet.max_row, min_col=1, max_col=1, values_only=True):
print(row[0])
// ORpath = "add_your_file_path_here"

# workbook object is created
wb_obj = openpyxl.load_workbook(path)

sheet_obj = wb_obj.active
m_row = sheet_obj.max_row

# Loop will print all values
# of first column
for i in range(1, m_row + 1):
cell_obj = sheet_obj.cell(row = i, column = 1)
print(cell_obj.value)

Example: Display particular row value

To display the values of a specific row in an Excel worksheet using the openpyxl library in Python, you can access that row directly and then iterate through the cells to print their values. Here's how you can do it.

import openpyxl
# Load the Excel file
workbook = openpyxl.load_workbook('your_excel_file.xlsx')
# Select the worksheet by name
worksheet = workbook['Sheet1'] # Replace 'Sheet1' with the name of your worksheet
# Specify the row number you want to display (e.g., row 3)
row_number = 3 # Replace with the desired row number
# Access the specific row by its row number
selected_row = worksheet[row_number]
# Iterate through the cells in the selected row and print their values
for cell in selected_row:
print(cell.value)
// OR# Give the location of the file
path = "enter_your_file_path_here"

# workbook object is created
wb_obj = openpyxl.load_workbook(path)

sheet_obj = wb_obj.active

max_col = sheet_obj.max_column

# Will print a particular row value
for i in range(1, max_col + 1):
cell_obj = sheet_obj.cell(row = 2, column = i)
print(cell_obj.value, end = " ")

Conclusion

In this tutorial, we’ve embarked on a journey through the world of reading Excel files in Python using the versatile openpyxl library. We began by recognizing the ubiquity of Excel spreadsheets as a data storage format and the increasing need to seamlessly integrate them into our Python workflows.

By the end of this guide, you’ve acquired the essential skills to navigate and extract data from Excel files with confidence.

We discussed the importance of openpyxl as a powerful tool for this purpose. Its robust functionality allows us to not only read data but also manipulate Excel sheets, rows, columns, and cell values. Armed with this knowledge, you can effortlessly work with Excel files for a variety of tasks, from data analysis to reporting and automation.

--

--

Vidvatek

Vidvatek is a comprehensive platform dedicated to assisting beginner programmers in launching their careers in web development.