Eli5 (Explain it like I am 5) Model Explainability in Python
Understanding machine learning models can often feel like deciphering an ancient script. However, with tools like ELI5, the process of making sense of these models becomes much more approachable.
Setting the Stage: The Titanic Dataset
The Titanic dataset is a popular choice for demonstrating machine learning techniques. It’s relatively small, has a manageable number of features, yet remains intriguing due to the human story it tells. We’ll start by loading the dataset:
import csv
import numpy as np
with open('titanic-train.csv', 'rt') as f:
data = list(csv.DictReader(f))
data[:1]
This code loads the dataset and gives us a glimpse of the first entry, which includes features like age, fare, and survival status.
Preparing the Data
Before diving into model building, it’s essential to preprocess the data. We start by shuffling the data and splitting it into training and validation sets:
from sklearn.utils import shuffle
from sklearn.model_selection import train_test_split
_all_xs = [{k: v for k, v in row.items() if k != 'Survived'} for row in data]
_all_ys = np.array([int(row['Survived']) for row in data])
all_xs, all_ys = shuffle(_all_xs, _all_ys…