Python Learning Series Part-3
Complete Python Topics for Data Analysis: https://t.me/sqlspecialist/548
3. Pandas:
Pandas is a powerful library for data manipulation and analysis. It provides data structures like Series and DataFrame, making it easy to handle and analyze structured data.
1. Series and DataFrame Basics:
- Series: A one-dimensional array with labels, akin to a column in a spreadsheet.
import pandas as pd
series_data = pd.Series([1, 3, 5, np.nan, 6, 8])
- DataFrame: A two-dimensional table, similar to a spreadsheet or SQL table.
df = pd.DataFrame({
'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'City': ['New York', 'San Francisco', 'Los Angeles']
})
2. Data Cleaning and Manipulation:
- Handling Missing Data: Pandas provides methods to handle missing values, like dropna() and fillna().
df.dropna() # Drop rows with missing values
- Filtering and Selection: Selecting specific rows or columns based on conditions.
adults = df[df['Age'] > 25]
- Adding and Removing Columns:
df['Salary'] = [50000, 60000, 75000] # Adding a new column
df.drop('City', axis=1, inplace=True) # Removing a column
3. Grouping and Aggregation:
- GroupBy: Grouping data based on some criteria.
grouped_data = df.groupby('City')
- Aggregation Functions: Computing summary statistics for each group.
average_age = grouped_data['Age'].mean()
4. Pandas in Data Analysis:
- Pandas is extensively used for data preparation, cleaning, and exploratory data analysis (EDA).
- It seamlessly integrates with other libraries like NumPy and Matplotlib.
Here you can access Free Pandas Cheatsheet
Share with credits: https://t.me/sqlspecialist
Hope it helps :)