Chief Exec(utive) Officer: Read in Multiple DataFrames at Once

Aneesh Kodali
3 min readSep 25, 2019

--

You can find the code and data for the example here.

I did a project that involved reading in many csv files and converting them into Pandas dataframes. The usual way is to write multiple pd.read_csv() statements, which is very redundant and time consuming. I will admit that it took me WAY longer to come up with this solution versus writing out statements, but it was still a good thought exercise. I found an alternative way with the exec() function.

NOTE: there are articles out there (including python documentation) outlying the scarce use of exec() (among other ‘compilation’ functions) and potential risks, but those risks involve concepts outside the scope of this post. For the time being, I am only referring to using exec() as a means of reading in files and variables within a single python script that 1) doesn’t have any dependencies with other python files 2) only you possess/access.

If you haven’t heard of exec(), think of Microsoft Excel’s INDIRECT() function. See the snippet below for a quick refresher/lesson. In cells A1 through A4, I enter numbers and then calculate the sum in cell A6 with =SUM(A1:A4). INDIRECT() takes whatever I pass in the parentheses and tries to interpret it as Excel jargon. I can recreate the sum with =SUM(INDIRECT()) and pass the string “A1:A4” (yes, with quotes) inside INDIRECT() (see cell B6). Excel will think “hey, that string looks like a cell range!” and treat it as such for any computations.

Example of Excel’s INDIRECT() function

Exec() does something similar in Python in that I can pass in a ‘code-like’ string and Python will attempt to execute it. The general steps are:

1) Create a list of variable names: variableList
2) Create a list of value names: valueList
3) Write a for loop that iterates through both lists:
for variable, value in zip(variableList, valueList):
exec(f"{variable}= {value}")

Note: In some cases you have to add ‘quotes’ to the right hand side of the equal sign (‘{value}’) because you may still have to designate part of the right hand side as strings (like a text value or the file name portion of pd.read_csv). Confused? Consider the simple illustration:

pet = "dog"

If you removed the quotes from ‘dog’, Python will think you want to set the ‘variable called pet’ to the ‘variable called dog’ instead of the ‘value called dog’ and may throw an error if ‘variable called dog’ is not defined yet.

Similarly, when reading in a dataframe:

df = pd.read_csv('file_name')

You still need quotes around the file name or else Python will think you are trying to read in a variable called ‘file_name’.

Example: Calculate SumOne’s Name

For fun, let’s say you want to calculate the value of someone’s name. To do so, assign a value to each letter in the name and add up all values.

import string# Create list of letters
letterList = list(string.ascii_lowercase)
# Create list of values
numberList = list(range(1,27))
# Iterate through both lists to assign values to letters
for letter, number in zip(letterList, numberList):
exec(f"{letter} = {number}")
name='aneesh'
nameSum=0
# Iterate through each letter
for letter in name:
# Add the variable called [letter] to nameSum
exec(f"nameSum+={letter}")
nameSum # 52

MAIN EXAMPLE: Create Multiple DataFrames

When creating dataframes from files or tables, the general steps are:

import pandas as pd# Create list of file names
fileList = os.listdir('data')
# Using file names to create dataframe names
# Removing file extension and replacing periods with underscores
dfList = [x.replace('.'+x.split('.')[-1], '').replace('.','_') for x in fileList]
# Iterate through both lists to create dataframes
for df, file in zip(dfList, fileList):
exec(f"{df} = pd.read_csv('data/{file}')")
# Verify that this worked
whos DataFrame

--

--