Exploring Fundamental Pandas Concepts with Examples

Brian
4 min readSep 6, 2023

--

Pandas is a free and popular library in Python that helps people work with data. It’s used for things like cleaning up data, making data easier to understand, and getting it ready for analysis. Pandas has some important parts called ‘DataFrame’ and ‘Series’ that make working with data easier. Whether you’re a data scientist or just someone who works with data in Python, Pandas is a valuable tool to have. Some of the key ideas in Pandas include:

  1. DataFrame: We use DataFrames like a virtual spreadsheet. Imagine we have a table with columns like ‘Name,’ ‘Age,’ and ‘City.’ Each column can hold different types of information, like names, ages, and cities. We can easily perform operations on this table, like finding the average age.
  2. Series: A Series is like a single column in our table. For example, the ‘Name’ column is a Series containing names like ‘John,’ ‘Alice,’ and ‘Bob.’ We can work with Series individually or as part of a DataFrame.
  3. Index: Think of an index as a label for each row or column. It helps us identify specific data points. For instance, if we have a list of students, the index could be their student IDs.
  4. Selection and Slicing: To get specific data, we use selection and slicing. We can say, “Give me all students from New York,” and Pandas will fetch them.
  5. Data Cleaning: Pandas helps us tidy up data. If there are missing values (like empty cells), we can either fill them or remove them. We can also drop columns we don’t need.
  6. Data Transformation: Sometimes, we need to reshape or merge data to make it suitable for analysis. For instance, if we have data on sales per month and want to see the total sales for the year, Pandas makes that transformation easy.
  7. Filtering and Sorting: If we want to find all students older than 20, Pandas can do it. It can also sort data by age, name, or any other criteria.
  8. Aggregation and Grouping: Let’s say we want to find the average age of students in each city. We can group data by city and calculate the averages.
  9. Data Input/Output: We can load data from files (like CSV or Excel) and save our work for later. This way, we can share our analyses with others.
  10. Visualization: Although Pandas doesn’t create fancy charts, it can work with visualization libraries like Matplotlib to show our data in graphs and plots.

Here’s an example of how to read a CSV file using Python’s Pandas library and then filter the data based on a condition.

Suppose we have a CSV file named “data.csv” with the following content:

Name,Age,Salary
Alice,28,50000
Bob,22,45000
Charlie,35,60000
David,30,55000
Eva,25,48000

We can read and filter this data as follows:

import pandas as pd

# Read the CSV file into a Pandas DataFrame
df = pd.read_csv('data.csv')

# Filter the data to select individuals with salaries greater than or equal to 50000
filtered_df = df[df['Salary'] >= 50000]

# Display the filtered data
print("\nFiltered Data (Salary >= 50000):")
print(filtered_df)

Additionally, if we have data that we want to create a Pandas DataFrame from, we can do so using various methods depending on the data source. These examples demonstrate how to create a Pandas DataFrame from different data sources:

  • Example 1: Creating a DataFrame from a List of Dictionaries
import pandas as pd

# Create a list of dictionaries
data = [
{'Name': 'Alice', 'Age': 28, 'Salary': 50000},
{'Name': 'Bob', 'Age': 22, 'Salary': 45000},
{'Name': 'Charlie', 'Age': 35, 'Salary': 60000},
{'Name': 'David', 'Age': 30, 'Salary': 55000},
{'Name': 'Eva', 'Age': 25, 'Salary': 48000}
]

# Create a DataFrame from the list of dictionaries
df = pd.DataFrame(data)

# Display the DataFrame
print(df)
  • Example 2: Creating a DataFrame from a List of Lists
import pandas as pd

# Create a list of lists
data = [
['Alice', 28, 50000],
['Bob', 22, 45000],
['Charlie', 35, 60000],
['David', 30, 55000],
['Eva', 25, 48000]
]

# Create a DataFrame from the list of lists with column names
df = pd.DataFrame(data, columns=['Name', 'Age', 'Salary'])

# Display the DataFrame
print(df)
  • Example 3: Creating a DataFrame from a NumPy Array
import pandas as pd
import numpy as np

# Create a NumPy array
data = np.array([
['Alice', 28, 50000],
['Bob', 22, 45000],
['Charlie', 35, 60000],
['David', 30, 55000],
['Eva', 25, 48000]
])

# Create a DataFrame from the NumPy array with column names
df = pd.DataFrame(data, columns=['Name', 'Age', 'Salary'])

# Display the DataFrame
print(df)

Below are additional frequently used DataFrame methods and examples with the provided data.

  • Viewing Data

df.head(): View the first few rows of the DataFrame.

df.tail(): View the last few rows of the DataFrame.

df.sample(): Randomly sample rows from the DataFrame.

  • Data Summary

df.describe(): Get statistical summary of numeric columns.

df.info(): Get information about the DataFrame, including data types and non-null values.

  • Selecting Columns

df['Column_Name']: Select a single column.

df[['Column1', 'Column2']]: Select multiple columns.

  • Sorting Data

df.sort_values(by='Column_Name'): Sort the DataFrame by a specific column.

df.sort_values(by='Age', ascending=False)
  • Aggregating Data

df.groupby('Grouping_Column').agg({'Aggregated_Column': 'Aggregation_Function'}): Perform aggregation operations.

df.groupby('Age').agg({'Salary': 'mean'})
  • Adding and Modifying Data

df['New_Column'] = ...: Add a new column

df['Age_Group'] = df['Age'].apply(lambda x: 'Young' if x < 30 else 'Old')

df.loc[row_index, 'Column_Name'] = new_value: Modify data.

df.loc[df['Name'] == 'Alice', 'Salary'] = 52000
  • Removing Data

df.drop('Column_Name', axis=1): Remove a column.

df.drop(index): Remove rows by index.

  • Exporting Data

df.to_csv('file.csv', index=False): Save the DataFrame to a CSV file.

In conclusion, we’ve covered the fundamental concepts of Pandas, along with various methods for data cleaning, transformation, filtering, and aggregation, making it an indispensable tool for efficiently handling and analyzing structured data in Python.

--

--

Brian

Software engineer interested in full stack, Golang, JavaScript, Python, Node.js, React, Nest.js & Next.js. Sharing knowledge through blogs. Follow for updates!