AI Blueprint Engine in Action: Purpose-Built Deep Learning for Predicting Survival of the Titanic Disaster

Walk-through using the AI Blueprint Engine to generate code for an end-to-end machine learning pipeline including data loading, pre-processing and a deep neural network for the titanic passenger survival prediction project.

Starting the project

Retrieving and preparing data

Output of the code excerpt above, the top rows provide information about columns in the train.csv file, the bottom rows about the test.csv file.
Filling the missing values in the Age and Fare feature columns.
Filling the missing values in the Embarked feature.
Encode the Sex and Embarked features with categorical indices.

Building the model with the AI Blueprint Engine

  • loads the continuous features Age and Fare from train.csv and normalizes them to have zero mean and unit variance,
  • loads the count features SibSp and Parch from train.csv and log-transforms them,
  • loads the categorical features Pclass, and Embarked from train.csv and embeds them into learned distributional representations,
  • loads the binary feature Sex from train.csv,
  • concatenates all features (or the corresponding distributional representations where applicable) and passes them through 3 alternating batch normalization and dense layers before learning to predict the target Survived by minimizing the average binary cross-entropy over all training examples between the prediction and the target.

Setting up the environment

# Create new environment "venv" inside the current
# working directory and activate it.
$ conda create --name venv python=2.7 && source activate venv
# Install packages.
$ pip install -r requirements.txt

Training the model

Running training.py with the default configuration.

Predicting survival of passengers from the test set

  1. the column order in data/test_preprocessed.csv is not identical to data/train_preprocessed.csv since the test set does not contain the column Survived, and
  2. the output of the inference script does not match the competition’s submission format.
cp titanic_ml_from_disaster/inference.py \ 
titanic_ml_from_disaster/inference_kaggle.py
The modified data loading function. We explicitly subtract the column indices by one to highlight the changes.
PassengerId,Survived
892,0
893,1
...
$ python titanic_ml_from_disaster/predict_for_submission.py \
--source_file_1 ./data/test_preprocessed.csv
PassengerId,Survived
892,0
893,0
894,0
...

--

--

--

Bridging the gap between ease of use and flexibility in artificial intelligence development — https://creaidAI.com

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Mastering RL in Minutes: An Optimistic Perspective on Offline RL

Defining Deep Learning, Part I: What It Is and What It Solves

In Search of the Autoencoder: A Machine Learning Odyssey, Part 1

Demonstration of YOLOv3 and build with custom dataset

How to get started with machine learning? 2020

How deep does your Sentence Embedding model need to be ?

Example of a Bag-Of-Words representation

Getting Started with Image Data — Deep Learning

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
creaidAI

creaidAI

Bridging the gap between ease of use and flexibility in artificial intelligence development — https://creaidAI.com

More from Medium

Sentiment Analysis in Keras using Attention Mechanism on Yelp Reviews Dataset

Performance Evaluation for ML Model

Denoiser: A noise detection and removal module

NLP for genre predictions on FFnet: an antithesis to utilitarianism