How to load data into a Pandas DataFrame

Alexandre Stamm
3 min readJun 16, 2022

--

Photo by Philipp Katzenberger on Unsplash

In my next articles, I’m going to write some tips & tricks about Pandas, from basic to advanced!

I’m going to keep it short and refer to the documentation if you need a more detailed explanation of how things work!

Today we’re going to start with the basics: How to load data into a Pandas DataFrame!

Loading data into a Pandas DataFrame:

In Pandas, you can create DataFrames using many different ways, like importing data from CSV, JSON, Excel, HTML, XML, and many other file types, or loading data from a SQL or Google BigQuery database.

You can check all the functions for loading DataFrames on this page of the Pandas documentation.

In this tip, I’m going to focus on the 2 main ways I load data into Pandas DataFrames: reading from CSV or connecting directly to an SQL database.

Importing data from an csv file:

To import data from a csv file use pd.read_csv():

If your file is in the same folder as your notebook, the easiest way to use pd.read_csv() is just to pass the filename directly to the function:

df = pd.read_csv (‘file_name.csv’)

In case your file is not in the same folder as your notebook, you pass the file path (absolute or relative) to the function:

#Absolute path:
df = pd.read_csv(‘C:/path/to/you/file/file_name.csv’)
#Relative path:
df = pd.read_csv(‘…/ file_name.csv’)

When passing just the file name to the function, Pandas will infer most of the information it needs, like separators, columns data types, and indexes. But you can also use the function parameters to better deal with the data you’re going to import.

There are a series of parameters you can pass to read_csv, but the ones I use the most are:

header: Row number(s) to use as the column names, and the start of the data. The default behavior is to infer the column names.

sep: Delimiter to use, default is a comma.

index_col: Column(s) to use as the row labels of the DataFrame.

dtype: Data type for data or columns, when you want to explicitly inform the dtype of columns when importing, E.g. {‘a’: np.float64}.

na_values: Additional strings to recognize as NA/NaN. Say you are importing a CSV where there are “None” values, you can pass na_values=[‘None’] and pandas will convert them to NaN when importing.

parse_dates and infer_datetime_format: Useful when dealing with date columns. To keep this short I suggest looking at the documentation if you have to use these parameters.

Importing data from an SQL Database:

The two main ways I import data from SQL databases are:

Importing from PostgreSQL databases:

Follow my guide for importing data from a PostgreSQL database to a Pandas DataFrame.

Importing data from SQLite databases:

To import data from an SQLite Database, I use a really helpful library called sqlite3(docs).

You can read the documentation and also use the guide I made for importing data from PostgreSQL database as a reference since the functions are very similar.

Aside from importing data, you can also create DataFrames using these functions:

Using pd.DataFrame.from_dict() and passing a dictionary:

data = {‘col_1’: [3, 2, 1, 0], ‘col_2’: [‘a’, ‘b’, ‘c’, ‘d’]}
pd.DataFrame.from_dict(data)

If you are not running your notebook on the cloud (like on Google Colab), you can read the data directly from your clipboard using pd.read_clipboard().

You can also create an DataFrame from a list or numpy array using pd.DataFrame():

pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])orpd.DataFrame(list)

That’s all for today!

Follow me to read my upcoming tips for working with Pandas!

--

--

Alexandre Stamm

Data Engineer & Enthuasiast. Data Analytics | Data Engineering | Data Science