Python Data Science

buzonliao
Python 101
Published in
2 min readOct 20, 2023
Photo by ChatGpt4

Introduction

A brief introduction to the Python programming language's power in data science, using the FIFA19 player dataset(FIFA19_official_data.csv) as a sample for analysis.

  1. Begin by importing the necessary libraries.
import pandas as pd

2. Load the dataset into a pandas DataFrame and get initial insights.

data_frame = pd.read_csv('FIFA19_official_data.csv')
print(data_frame.shape)
print(data_frame.describe())
print(data_frame.values)

3. Filter and display players based on a specific age criterion.

print(data_frame[data_frame["Age"] > 40])

4. Introduce a utility function to convert string representations of numbers with ‘K,’ ‘M,’ and ‘B’ suffixes to their actual float values. This transformation is vital for numeric computations. For instance, “1.5K” becomes 1500.0, “2M” becomes 2000000.0, etc.

def value_to_float(x):
if type(x) == float or type(x) == int:
return x
if 'K' in x:
if len(x) > 1:
return float(x.replace('K', '')) * 1000
return 1000.0
if 'M' in x:
if len(x) > 1:
return float(x.replace('M', '')) * 1000000
return 1000000.0
if 'B' in x:
return float(x.replace('B', '')) * 1000000000
return 0.0

5. Extract specific columns (“Name,” “Wage,” and “Value”) and transform the ‘Wage’ and ‘Value’ columns using the utility function. Cleaning the ‘Wage’ and ‘Value’ columns to remove the ‘€’ symbol and converting the cleaned values using the value_to_float function.

df1 = pd.DataFrame(data_frame, columns=["Name", "Wage", "Value"])
wage = df1['Wage'].replace('[\€]', '', regex=True).apply(value_to_float)
value = df1['Value'].replace('[\€]', '', regex=True).apply(value_to_float)
df1["Value"] = value
df1["Wage"] = wage

6. Perform a simple analysis to compute the difference between a player’s ‘Value’ and ‘Wage.’ Then, sort the data frame based on this difference.

df1["difference"] = df1["Value"] - df1["Wage"]
print(df1.sort_values('difference', ascending=False))

Conclusion

A recap of the procedures and transformations performed on the dataset, emphasizing the ease with which Python and pandas allow users to manipulate, clean, and analyze data.

--

--