Exploring Pandas : Reading CSV Files into DataFrames

Exploring Pandas : Reading CSV Files into DataFrames

Punyakeerthi BL
3 min readJun 23, 2024

--

Before going through this article, please read the following for continuation:

Pandas Library for Data Analysis in Python

This article is part two of a Pandas and covers how to read data from CSV files into a Pandas DataFrame.

What is DataFrame ?

A DataFrame in Pandas is a two-dimensional, size-mutable, and potentially heterogeneous tabular data structure with labeled axes (rows and columns). Think of it as a spreadsheet or SQL table, or a dictionary of Series objects.

Here’s a simple breakdown:

- Rows and Columns: DataFrames consist of rows and columns, where each column can be of a different data type (integer, float, string, etc.).
- Indexing: Rows and columns are labeled, allowing for easy access and manipulation of data.
- Size-Mutable: DataFrames can be expanded or shrunk, meaning you can add or remove rows and columns as needed.

Key Features of DataFrames:

1. Heterogeneous Data: Different columns can contain different data types.
2. Alignment: DataFrame allows for automatic and explicit data alignment.
3. Data Manipulation: Provides various functions for data manipulation, aggregation, and transformation.
4. Integration:Easily integrates with NumPy, making it powerful for numerical operations.

Creating a DataFrame

Pandas DataFrames are powerful tools for data manipulation and analysis. They are two-dimensional, tabular data structures with labeled axes.

One way to create a DataFrame is to specify the column names and data types, and then populate it with data. Here’s an example:

import pandas as pd
# Define column names and data types
data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 28]}
# Create DataFrame
df = pd.DataFrame(data)
# Print DataFrame
print(df)

This code creates a DataFrame with two columns, “Name” (string type) and “Age” (integer type), and populates it with three rows of data.

Reading CSV Files

The more common way to get data into a DataFrame is by reading it from a CSV file. The pd.read_csv() function is used for this purpose.

import pandas as pd
# Specify file path and delimiter
filepath = "data.csv" # Replace with your actual file path
df = pd.read_csv(filepath)
# Print DataFrame
print(df)

This code reads data from a CSV file named “data.csv” (replace with your actual file path) and creates a DataFrame. By default, pd.read_csv() assumes a comma (",") as the delimiter, the character that separates values in each column.

Working with the DataFrame

Once you have a DataFrame, you can access specific data points using indexing and selection. You can retrieve data by column name or by row number.

For example:

# Access data by column name
name = df['Name'][0] # Access first name
age = df['Age'][1]
# Access age of second row
# Access data by row number (zero-indexed)
first_row = df.iloc[0] # Get all data in the first row
# Print results
print(f"Name: {name}, Age: {age}")
print(first_row)

You can also add new columns to a DataFrame and save the DataFrame back to a CSV file using the to_csv() function.

In summary, this Pandas article provides a basic understanding of working with CSV files. You learned how to:

  • Create a DataFrame from scratch.
  • Read data from a CSV file into a DataFrame.
  • Access data points using indexing and selection.
  • Add new columns to a DataFrame.
  • Save a DataFrame to a CSV file.

If you like this post please follow me on Linked In: Punyakeerthi BL

--

--

Punyakeerthi BL

Passionate Learner in #GenerativeAI|Python| Micro-Service |Springboot | #GenerativeAILearning Talks about #GenerativeAI,#promptengineer, #Microservices