How to use different parameters to read data correctly from CSV files into Python dataframe

Saurabh Ghosh
Predict
Published in
4 min readDec 28, 2022

Let’s learn to read data from CSV files using the right parameters for the right situation.

Photo by Maksym Kaharlytskyi on Unsplash

What to expect in this blog?

You’ll often read data from CSV files. As you know, data in a CSV can come in various shapes and forms. Python and Pandas provide the necessary controls to read the required data effectively.

So, in this blog, you’ll learn how to use the parameters of the read_csv method in different scenarios.

The key use cases you’ll explore and learn today are -

  1. Normal usage
  2. Specify the header row
  3. Skip rows between the header and the data rows
  4. Read without the header row
  5. Specify missing value
  6. Specify index column
  7. Read only specific columns
  8. Specify delimiter
  9. Specify column type

Let’s plan for the work

You’ll be using Jupyter-lab for this coding. To simulate the different scenarios, you’ll create sample CSV files. And you’ll use specific parameters applicable to the scenario.

Let’s code

You’ll only need Pandas library for this exercise.

import pandas as pd

1. Normal usage

This is the default way of reading a CSV file without any parameters.

Sample data -

Code -

2. Specify the header row

This is useful when there are other descriptive texts before the header and data rows.

Sample data -

Code -

3. Skip rows between the header and the data rows

This is useful when there are gaps between the header and data rows (often containing descriptions, and format of the fields).

Sample data -

Code -

4. Read without the header row

This is useful when concatenating multiple split data files into one dataframe and needs to read data from split files without a header.

Sample files -

Code -

5. Specify the missing values

This is useful when certain values are known to be treated as missing values.
For example, ‘-’ indicates the missing value in the below data sample.

Sample data -

Code -

6. Specify the index column

This is useful when a column (e.g. ‘Roll No’) from the data needs to be used as the index.

Sample data -

Code -

7. Read only specific columns

This is useful when the source file contains many redundant columns and not everything needs to be loaded into dataframe.
For example, consider only the yellow highlighted columns are required.

Sample data -

Code -

8. Specify delimiter

This is useful when the separator or delimiter used in the source data file is different from the default comma.

Sample data -

Code -

9. Specify column type

This is useful when a particular column needs to be stored in dataframe as a specific type.
For example, source data files about employees may contain all salaries in whole numbers. However, the salary needs to be retrieved as a float.

Sample data -

Code -

That’s all the coding for this exercise.

Download

GitHub — https://github.com/SaurabhGhosh/read_csv_parameters.git

References

  1. There is no better way of deep diving into Pandas than the Pandas documentation itself. Pandas.read_csv
  2. I wanted to cover simple use cases and this writeup by Deepanshu was really helpful.

Conclusion

In this blog, I hope you got some ideas about the below scenarios of reading data from CSV -

  1. Normal usage
  2. Specify the header row
  3. Skip rows between the header and the data rows
  4. Read without the header row
  5. Specify missing value
  6. Specify index column
  7. Read only specific columns
  8. Specify delimiter
  9. Specify column type

In my next blog, I’ll explore another area and learn more Python concepts.

If you have any questions related to this program, please feel free to post your comments.

Please like, comment and follow me! Keep Learning!

--

--

Saurabh Ghosh
Predict
Writer for

Business Analyst, Machine Learning Enthusiast, Blogger