Prediction of Pokémon’s Rarity and Prediction Flask-App -1-

Ozan ERTEK
3 min readJan 24, 2022
Gotta catch ’em all. The Pokémon Company.

In this project,I have used Pokémon all data from kaggle.The aim for this project is to predict pokémon’s rarity(legendary or common) by using features (for instance; pokémon’s attack,defense,the element that have).And creating a flask app for prediction.I will explain all i did,step by step.You can visit project repository(HERE).The methodology of the project is given below.

  • Adding all data to SQL ( This is only for my project,not necessary)(For dataset).
  • Defining Data Features
  • Identifying Pokémon’s features that have most significant for Machine Learning and Using Hyperparameter Optimization(Bayesian).

Second Part(-2-)(Link HERE)

  • Flask APP
  • Adding in the Cloud( pythonanywhere.com)(mylink)

Let’s Gotta catch them all !

1 ) Getting Data

I used Pokémon data that have legendary column from kaggle.Includes type of pokémon(type1 ,type2),hp,attack,defense,legendary column in that dataset.(i added in my github repository.)

2) Defining Features for ML

We have to identify features that have more powerful against to parse target value(0 or 1).Therefore,we have to look pairplot graphs.

g = sns.PairGrid(df, hue="legendary")
g.map_diag(sns.histplot)
g.map_offdiag(sns.scatterplot)
g.add_legend()

and violinplot for ‘Total’ feature.

sns.set(font_scale = 2)
plt.figure(figsize = (35,25))
sns.catplot(x="type1", y="total", hue="legendary",
kind="violin",height=8.27, aspect=11.7/8.27, split=True, data=df)
plt.xticks(rotation=45)

and in below is most powerful one.(You can see bigger graph)

sns.set(font_scale = 2)
plt.figure(figsize = (15,10))
sns.histplot(x='total',hue='legendary',data= df)
plt.xlabel('TOTAL')

So,that means we should use ‘Total’ column for prediction.’Total’ columns is sum of pokémon’s all features(for instance, hp + attack + defense …).In the otherhand,that features is important because,we can add to flask very simply.So,we use that feature.

Other important pokémon features are type1(pokémon’s primary element),type2(pokémon’s secondary element),for example,pikachu is electric pokémon,so pikachu’s type1 is electric.

3)Using best model and Optimising by Bayesian Optimization.

We already identified our features in second title.So now,we going to try some powerful classification models.I tried before LightGBM, XGBoost GaussianNB.Best option was XGBoost.So i will continue with XGBoost.

cla1_X = df.drop(columns=['legendary','name','generation','number'])
cla_y = df['legendary']
X_train,X_test,y_train,y_test=train_test_split(cla1_X,cla_y,test_size=0.20,random_state=5)

and modeling.

modelxgb3 = XGBClassifier()

and then,we have to use cross-validation too.

from sklearn.model_selection import KFold
from sklearn.model_selection import cross_val_score
kfold = KFold(n_splits=10)
results = cross_val_score(modelxgb3, cla1_X, cla_y, cv=kfold)

next step; prediction our accuracy,recall and precision scores.that scores high enough (i got before 96 % ).But,we have to give our best shot.We already select good features for prediction.but,bayesian optimization our final best shot.

from bayes_opt import BayesianOptimization

def xgbc_cv(max_depth,learning_rate,n_estimators,reg_alpha):

estimator_function = xgb.XGBClassifier(max_depth=int(max_depth),
learning_rate= learning_rate,
n_estimators= int(n_estimators),
reg_alpha = reg_alpha,
nthread = -1,

seed = seed)
# Fit the estimator
estimator_function.fit(X_train,y_train)


return accuracy_score(y_test,estimator_function.predict(X_test))



gp_params = {"alpha": 1e-10}
seed = 112
hyperparameter_space = {
'max_depth': (1, 150),
'learning_rate': (0, 1),
'n_estimators' : (20,300),
'reg_alpha': (0,1)
}

xgbcBO = BayesianOptimization(f = xgbc_cv,
pbounds = hyperparameter_space,
random_state = 16,
verbose = 10)


xgbcBO.maximize(init_points=2,n_iter=50,acq='ucb', kappa= 3, **gp_params)

You can change iteration on this point.This code gives us ‘max_depth’,’learning rate’,’n estimator’ and ‘reg alpha’ parameters.

optimum_parameter=xgbcBO.max
optimum_parameter["params"]

We can get that parameters by using above codes.

And then we add new Xgboost model by using optimized parameters.

xg_reg1 = xgb.XGBClassifier(learning_rate = 'your learning rate',
max_depth= 'your max_depth,
n_estimators ='your value',
reg_alpha = 'your reg_alpha value')

below codes is confusion matrix:

cm = print_confusion_matrix(confusion_matrix(cla_y, xg_reg1.predict(cla1_X)), ['Not Legendary', 'Legendary'])

Our accuracy is (98.1% now).Bayesian optimisation methods increased nearly +2%,and recall score and precision score are 83% -85%.

Our model is ready for flask app.In second part(-2-) we going to create pkl file for flask and creating flask application.

Thanks for reading my article :)

Hope to see you again in my next article…

Ozan ERTEK

--

--