DecisionTree Classifier — Working on Moons Dataset using GridSearchCV to find best hyperparameters

Published in

Analytics Vidhya

3 min readNov 18, 2019

Decision Tree’s are an excellent way to classify classes, unlike a Random forest they are a transparent or a whitebox classifier which means we can actually find the logic behind decision tree’s classification.

I will also show you a quick way to create a .png file of a decision tree from a model, which will explain the transparency I was talking about which will look something like this (code below) -

How did we do this ?

We’ll get to that but first lets read up a little on our dataset.

sklearn.datasets.make_moons - scikit-learn 0.21.3 documentation

Make two interleaving half circles A simple toy dataset to visualize clustering and classification algorithms. Read…

scikit-learn.org

Now we’re going to first load the dataset into objects and then split it into training and test dataset

from sklearn.datasets import make_moonsdataset=make_moons(n_samples=10000, shuffle=True, noise=0.4, random_state=42)X,y=datasetX_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.2,random_state=42)

Next we’re going to initialise our classifier and GridSearchCv which is the main component which will help us find the best hyperparameters.

We simply create a tuple (kind of non edit list) of hyperparameters we want the machine to test with as save them as params.

We then give the classifier and list of params as paramters to gridsearchcv.

from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import GridSearchCVparams = {'max_leaf_nodes': list(range(2, 100)), 'min_samples_split': [2, 3, 4]}
grid_search_cv = GridSearchCV(DecisionTreeClassifier(random_state=42), params, verbose=1, cv=3)grid_search_cv.fit(X_train, y_train)

Once we have fit the grid search cv model with training data, we will simply ask what worked best for you as a question and it will answer, something like -

grid_search_cv.best_estimator_

And we get an answer, now these parameters below are the best hyperparameter for this algorithm as per the mach

GridSearchCV(cv=3, error_score='raise-deprecating',
       estimator=DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=None,
            max_features=None, max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, presort=False, random_state=42,
            splitter='best'),
       fit_params=None, iid='warn', n_jobs=None,
       param_grid={'max_leaf_nodes': [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99], 'min_samples_split': [2, 3, 4]},
       pre_dispatch='2*n_jobs', refit=True, return_train_score='warn',
       scoring=None, verbose=1)

Now going back to making a Tree model image out of this model, simply copy the below code -

from sklearn.tree import export_graphvizexport_graphviz( 
 grid_search_cv.best_estimator_,
 out_file=(“moons_tree.dot”),
 feature_names=None,
 class_names=None,
 filled=True,
)

Once you run this, your code directory will show moons_tree.dot as a new file, simply run this in terminal to convert it into .png

$ dot -Tpng moons_tree.dot -o moons.png

And voila, you have a tree in your folder which explains how the model worked.

That’s the beauty of decision trees, to checkout my code you can go to >

Madmanius/DecisionTreeClassifier_GridSearchCv

Decision Tree's are an excellent way to classify classes, unlike a Random forest they are a transparent or a whitebox…

github.com

DecisionTree Classifier — Working on Moons Dataset using GridSearchCV to find best hyperparameters

sklearn.datasets.make_moons - scikit-learn 0.21.3 documentation

Make two interleaving half circles A simple toy dataset to visualize clustering and classification algorithms. Read…

Madmanius/DecisionTreeClassifier_GridSearchCv

Decision Tree's are an excellent way to classify classes, unlike a Random forest they are a transparent or a whitebox…

Written by Rohit Madan