Hands-On LIME: Practical Implementation for Image, Text, and Tabular Data

SHREERAJ
7 min readJul 3, 2024

--

Welcome to my Fourth Article in this series on Explainable AI.

Source Google

Brief Recap of Third Article on Explainable AI :

In my Third Article on Explainable AI, We explored LIME (Local Interpretable Model-agnostic Explanations) and its application across text, images, and tabular data. LIME plays a crucial role in enhancing transparency by explaining individual predictions of complex AI models. It creates simplified surrogate models and perturbs input data to reveal why specific decisions are made.

Source Google

In this article, we’ll implement LIME practically on these data types to gain deeper insights into model predictions.

  1. Setting Up the Environment in Google Colab:
  • Open Google Colab Notebook
  • Install all Require Libraries
# Install required libraries
!pip install lime scikit-learn numpy pandas matplotlib tensorflow pillow

# Note: After installation, restart the runtime to ensure all libraries are properly loaded.

2. LIME Implementation for Image Data:

Detailed Algorithm Flow:

1. Image Preparation:
— Load the image file
— Resize the image to match the input size of the pre-trained model (299x299 for InceptionV3)
— Convert the image to a numpy array
— Add a batch dimension to the array
— Preprocess the input using the model’s specific preprocessing function

2. Model Setup and Prediction:
— Load the pre-trained InceptionV3 model with ImageNet weights
— Use the model to make a prediction on the preprocessed image
— Decode the prediction to get human-readable class labels and probabilities

3. LIME Explainer Setup: Initialize the LimeImageExplainer

4. Explanation Generation:
a. Call explain_instance method with the following parameters:
— Input image
— Model’s prediction function
— Number of top labels to explain
— Color to use for hiding superpixels
— Number of perturbed samples to generate

5. Explanation Visualization:
— Get the image and mask for the top predicted label
— Overlay the mask on the original image to highlight important regions
— Display the explanation using matplotlib

Code Implementation:

# Import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from tensorflow.keras.applications.inception_v3 import InceptionV3, preprocess_input, decode_predictions
from tensorflow.keras.preprocessing.image import load_img, img_to_array
from lime import lime_image
from skimage.segmentation import mark_boundaries

# Download a sample image
!wget https://upload.wikimedia.org/wikipedia/commons/thumb/b/b6/Felis_catus-cat_on_snow.jpg/640px-Felis_catus-cat_on_snow.jpg -O cat.jpg

# Load and preprocess the image
img_path = 'cat.jpg'
img = load_img(img_path, target_size=(299, 299)) # Resize image to match InceptionV3 input size
img_array = img_to_array(img) # Convert image to numpy array
img_array = np.expand_dims(img_array, axis=0) # Add batch dimension
img_array = preprocess_input(img_array) # Preprocess input for InceptionV3

# Display the original image
plt.imshow(img)
plt.axis('off')
plt.show()

# Load pre-trained InceptionV3 model
model = InceptionV3(weights='imagenet')

# Make prediction
preds = model.predict(img_array)
decoded_preds = decode_predictions(preds, top=3)[0] # Get top 3 predictions

# Print top 3 predictions
print("Top 3 predictions:")
for i, (imagenet_id, label, score) in enumerate(decoded_preds):
print(f"{i + 1}: {label} ({score:.2f})")

# Create LIME image explainer
explainer = lime_image.LimeImageExplainer()

# Generate explanation
explanation = explainer.explain_instance(img_array[0], model.predict, top_labels=5, hide_color=0, num_samples=1000)

# Visualize the explanation
temp, mask = explanation.get_image_and_mask(explanation.top_labels[0], positive_only=True, num_features=5, hide_rest=True)
plt.imshow(mark_boundaries(temp / 2 + 0.5, mask))
plt.axis('off')
plt.title("LIME Explanation")
plt.show()

Outputs:

Top 3 Predictions [1: Egyptian_cat (0.53) 2: tiger_cat (0.16) 3: tabby (0.14)]

Input (Left) LIME Explanation (Right)

3. LIME Implemtation for Text Data:

Detailed Algorithm Flow:

1. Dataset Preparation:
— Load the 20 newsgroups dataset, selecting specific categories
— Initialize TfidfVectorizer for text feature extraction

2. Text Preprocessing:
— Apply TfidfVectorizer to convert raw text data into TF-IDF features
— Split the dataset into features (X) and labels (y)

3. Model Training:
— Initialize a Logistic Regression classifier
— Fit the classifier on the TF-IDF features and corresponding labels

4. LIME Explainer Setup: Initialize LimeTextExplainer with the category names

5. Instance Selection:Choose a specific text instance from the dataset to explain

6. Explanation Generation:
a. Call explain_instance method with:
— Selected text instance
— Model’s predict_proba function
— Number of features to include in the explanation

7. Results Interpretation:
— Print the document ID, predicted class, and true class
— Display the explanation showing important words and their impact

8. Visualization:Generate a plot of the explanation using LIME’s built-in visualization

Code Implementation:

# Import necessary libraries
from sklearn.datasets import fetch_20newsgroups
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from lime.lime_text import LimeTextExplainer

# Prepare dataset
categories = ['alt.atheism', 'soc.religion.christian']
newsgroups_train = fetch_20newsgroups(subset='train', categories=categories)
vectorizer = TfidfVectorizer()
X_train = vectorizer.fit_transform(newsgroups_train.data) # Convert text to TF-IDF features
y_train = newsgroups_train.target

# Train text classification model
clf = LogisticRegression(random_state=0)
clf.fit(X_train, y_train)

# Create LIME text explainer
explainer = LimeTextExplainer(class_names=categories)

# Choose a text instance to explain
text_instance = newsgroups_train.data[0]

# Generate explanation
# Transform the text instance to the appropriate format before passing it to predict_proba
exp = explainer.explain_instance(text_instance,
classifier_fn=lambda x: clf.predict_proba(vectorizer.transform(x)), # Use a lambda function to transform the input text
num_features=6)

# Print explanation details
print('Document id: %d' % 0)
print('Predicted class: %s' % categories[clf.predict(X_train[0])[0]])
print('True class: %s' % categories[y_train[0]])
print('\nExplanation:')
print('\n'.join(map(str, exp.as_list())))

# Visualize explanation
exp.as_pyplot_figure()
plt.show()
Outputs:

Document id: 0
Predicted class: soc.religion.christian
True class: soc.religion.christian
Explanation: ('the', 0.030118101294126762) ('of', 0.029927988501438) ('and', 0.021238042795362604) ('The', 0.016587489639242923) ('will', 0.015860295413525877) ('Library', -0.014720361184831086)

Outputs:

4. LIME Implemtation for Tabular Data :

Detailed Algorithm Flow:

1. Dataset Loading and Preparation:
— Load the Iris dataset
— Convert to a pandas DataFrame for easier manipulation
— Split features (X) and target variable (y)

2. Data Splitting and Scaling:
— Split data into training and testing sets
— Initialize StandardScaler
— Fit StandardScaler on training data and transform both training and testing data

3. Model Training:
— Initialize a Random Forest Classifier
— Fit the classifier on the scaled training data

4. LIME Explainer Setup:
a. Initialize LimeTabularExplainer with:
— Scaled training data
— Feature names
— Class names
— Mode (classification)

5. Instance Selection: Choose a specific instance from the test set to explain

6. Explanation Generation:
a. Call explain_instance method with:
— Selected instance
— Model’s predict_proba function
— Number of features to include in the explanation

7. Results Visualization:
— Generate a plot of the explanation using LIME’s built-in visualization
— Display the plot showing feature importance and impact

8. Feature Importance Analysis:Print the list of features and their importance scores

Code Implementation:

# Import necessary libraries
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier
from lime import lime_tabular

# Load dataset
iris = load_iris()
df = pd.DataFrame(iris.data, columns=iris.feature_names)
df['target'] = iris.target

# Prepare data
X = df.drop('target', axis=1)
y = df['target']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Scale features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Train Random Forest model
rf_model = RandomForestClassifier(n_estimators=100, random_state=42)
rf_model.fit(X_train_scaled, y_train)

# Create LIME tabular explainer
explainer = lime_tabular.LimeTabularExplainer(
X_train_scaled,
feature_names=X.columns,
class_names=iris.target_names,
mode='classification'
)

# Choose an instance to explain
instance = X_test_scaled[0]

# Generate explanation
exp = explainer.explain_instance(instance, rf_model.predict_proba, num_features=4)

# Visualize explanation
exp.as_pyplot_figure()
plt.show()

# Print feature importance
print(exp.as_list())

Outputs:

[('0.30 < petal length (cm) <= 0.79', 0.2152934376349697),
('-1.18 < petal width (cm) <= 0.16', 0.1752256765106721),
('-0.07 < sepal length (cm) <= 0.72', 0.02534779672363008),
('sepal width (cm) <= -0.59', -0.020655190394031558)]

Suggestions for Further Exploration:

Source Google

1. Apply LIME to different models and datasets in your field
2. Experiment with LIME parameters and their effects
3. Combine LIME with other interpretability techniques
4. Use LIME to improve model performance or detect biases
5. Investigate LIME’s explanation stability across different runs

Conclusion:

Source Google

We’ve explored LIME’s practical implementation for image, text, and tabular data, demonstrating its ability to provide local explanations for complex models. Key takeaways:

- LIME creates interpretable local models around specific predictions
- It’s adaptable to various data types and model architectures
- Explanation quality can vary based on parameters and data characteristics
- LIME should be used alongside other evaluation methods for comprehensive understanding

Link For Fifth Article On Explainable AI : SHAP Unveiled: A Deep Dive into Explaining AI Models for Machine Learning

In our next article, we’ll examine SHAP (SHapley Additive exPlanations), comparing it to LIME and exploring its game theory-based approach to model interpretation.

Generated By DALLE3

--

--