Pandas Tutorial: Importing CSV, and TSV Files Into Data Frames

Data Science Delight
4 min readMay 15, 2024

--

Do you know what’s the first step in data analysis? It’s importing files.

You might be wondering what’s hard in that. We will get a CSV file, and using pandas ‘read_csv()’ we can import the CSV file and convert it to a data frame.

But, this is not the same in all the cases. Every time you won’t get a CSV file. Sometimes you might be asked to import files using API or extract valuable data from the database.

Table of Contents:

1. CSV Files
Q) What is the role of read_csv in Pandas?
2. TSV Files
Q) Why the data from a TSV file is loaded into a single column instead of being properly separated into multiple columns in a DataFrame?
Q) How to convert a TSV file to a perfect DataFrame?
Q) How to name the columns in the data frame?
That’s It for today!
Link for Code:

Let’s explore!

1. CSV Files

CSV files are like tables in Excel but simpler. They store data separated by commas. Let’s see how we can bring them into Pandas.

But before that, you should first know that we can load .csv file in 2 different ways:

1.1 Opening a local CSV file

Here we will be using the Titanic dataset from Kaggle.

# Import Pandas
import pandas as pd

# Read CSV file into a DataFrame
df = pd.read_csv('titanic.csv')
df

1.2 Opening a CSV file using a URL

import requests
from io import StringIO

url = "https://raw.githubusercontent.com/Data-Science-Delight/titanic.csv/main/train%20(1).csv"

response = requests.get(url)
data = StringIO(response.text)

pd.read_csv(data)

Q) What is the role of read_csv in Pandas?

read_csv function in Pandas reads CSV files and converts them into a DataFrame making it easy to read and interpret data in a tabular form.

2. TSV Files

A TSV (Tab-Separated Values) file is a text-based file format used to store structured data in a tabular format.

Unlike CSV (Comma-Separated Values) files, TSV files use tabs as the delimiter to separate individual data fields within each row.

2.1 Loading TSV File to a DataFrame

Here we have used the movie_titles_metadata dataset from Kaggle.

All the files (csv, tsv) are present in our github repo. You can also download the file from there.

df2 = pd.read_csv('movie_titles_metadata.tsv')
df2.head()

Output:

Image by Author

Q) Why the data from a TSV file is loaded into a single column instead of being properly separated into multiple columns in a DataFrame?

This is because read_csv reads comma-separated values. And here there is no comma-separated text, so everything appears in the same line or we can say in a single column.

2.2 Delimiter

Q) How to convert a TSV file to a perfect DataFrame?

Using “Sep Parameter”:

movies_df = pd.read_csv('movie_titles_metadata.tsv', sep = '\t')
movies_df.head()

Output:

Image by Author

NOTE:

You can see that in the columns section, we are getting values from the first row, instead of the columns name. So how to fix that? The answer is by using a header.

2.3 Header

movies_df = pd.read_csv('movie_titles_metadata.tsv', sep = '\t', header = None)
movies_df.head()

Output:

Image by Author

2.4 Names

Q) How to name the columns in the data frame?

movies_df = pd.read_csv('movie_titles_metadata.tsv', sep = '\t', header = None, names = ['Sl.No', 'Movie Title','Year', 'Ratings', 'Votes','Genre'])
movies_df.head()

Output:

Image by Author

Now, It looks like a perfect DataFrame.

That’s It for today!

Stay tuned for Part 2. Please like, share, and follow Data Science Delight for more!

Link for Code:

For complete code, you can visit our github, here.

--

--

Data Science Delight

Content Creator | Sharing insights & tips on data science | Instagram: @datasciencedelight | YouTube: https://www.youtube.com/channel/UCpz2054mp5xfcBKUIctnhlw