# How I predicted the survival rate on Titanic

This was my first step towards a competition on **KAGGLE,** Here i share my approach for solving titanic data-set on kaggle, and I ended up getting an accuracy of 79.90 .

Aim: To find out what sorts of people were likely to survive. In particular, we predict which passengers survived the tragedy with help of machine learning. Shape: 891 rows , 9 features.

In this post I emphasised on tuning the parameters and how i ended up with this accuracy, without much Feature Engineering but fine tuning the hyper parameters in the model. my initial steps:

In my base line model initially i used Decision-Tree-Classifier the reason behind choosing this is it is faster and easy to understand, it has a layered splitting process. At each layer, we try to split the population or sample into two or more groups by seeing the classifier graph it will be clear how it works: (clf = DecesionTreeClassifier).

Here we can see how the branches are divided from parent to children branches.

After this initial trails my score was 65.55 on **kaggle**.

I tried using Random-Forest-Classifier in my further trails :

classifier = **RandomForestClassifier**( oob_score=True, random_state=0),

With some fine tuning of the Hyper parameters by plotting on graph, (train_data vs target_data) the parameters involved are :-

{‘**n_estimators**’: [ ],

‘**max_features**’:[ ] ,

‘**criterion**’: [ ],

‘**max_depth**’: [ ],

‘**min_samples_split**’: [ ],

‘**min_samples_leaf**’: [ ] }

The graph above helps in identifying how the train(red_line) and target(green_line) data accuracy varies, for different values of the **n_estimators, **we can observe if there is any overfitting by figuring out the behaviour of the target variable with repect to our train_data at any given point.

There is no great change or any variations in max_features.

Here we can clearly observe that the model is over fitting and eventually misleading after 2, and at 3 it started to show clearly that is overfitting.

Here in **min_samples_leaf** we can see that after 1, the graph clearly helps in understanding intuitively that some thing had gone wrong so **1** is the bestfit parameter for min_samples_leaf.

After finding out the parameters that can be best fitted with help of the graphs, I used the best fit estimator and the final result in it are attached.

Then finally predicted the target with the above parameters and successful in getting the accuracy in **kaggle.**

I hope this helps you in understanding how the parameters can be tuned and increase the accuracy of the model.