Arabic NLP — Explain semantic predictions with LIME

4 min readMay 24, 2024

https://i.ytimg.com/vi/cVibCHRSxB0/maxresdefault.jpg

Lime is a tool that helps explain how machine learning models make predictions. It breaks down complex models into understandable explanations by highlighting which parts of the input data were most influential in the model’s decision. This makes it easier for people to understand and trust the model’s predictions.

After preprocessing the data and training the model, Lime Text Explainer explains the model’s predictions for each sentence, highlighting the key features contributing to the sentiment classification. This process helps understand how the model interprets Arabic text sentiment, providing insights into its decision-making.

First, let’s create a sentiment analysis model for simple Arabic text using this small dataset

tex = ["الحياة جميلة جداً في فصل الربيع.", 
    "لقد استمتعت بقراءة هذه الرواية، كانت ملهمة حقًا.",
    "شكراً لكم على الضيافة الرائعة، كانت تجربة رائعة بالفعل.",
    "أحب أن أقضي وقتي مع أصدقائي، فهم يجلبون لي الفرح والسرور.",
    "تمنيت أن تدوم العطلة لفترة أطول، لقد كانت أياماً سعيدة جداً.",
    "تحققت أخيراً من حلمي بالسفر إلى اليابان، لقد كانت تجربة لا تُنسى.",
    "لقد أخذتني الحياة في مسار غير متوقع تماماً، أشعر بالإحباط الشديد.",
    "تعبت من المشاكل المتكررة في العمل، لا أعرف كيف أتعامل معها بعد الآن.",
    "لا يمكنني النوم بسبب الضغوطات الكبيرة في الحياة اليومية.",
    "لقد خيبت آمالي تماماً، لم أتوقع أن تنتهي الأمور بهذه الطريقة.",
    "تأخرت الطائرتين وضاعت حقيبتي في الرحلة، لقد كان يوماً سيئاً للغاية.",
    "لم يعد لدي الرغبة في مواصلة هذا العمل بعد كل هذه الصعوبات.",
    "الطقس اليوم يبدو معتدلاً، لا يوجد أي تقارير عن أحوال جوية غير عادية.",
    "بدأت بالتدريب على العزف على البيانو، الرحلة في بدايتها مثيرة للاهتمام.",
    "تناولت وجبة خفيفة في مطعم قريب، الطعم كان جيداً دون أن يكون رائعاً.",
    "تلقيت دعوة لحفل زفاف صديق، سأحضر بالطبع لأنه لا يمكنني تفويت هذه الفرصة.",
    "قمت بتنظيف المنزل وترتيبه بشكل جيد، الآن أشعر بالارتياح داخل بيتي.",
    "لا يوجد شيء مهم على جدول أعمالي اليوم، سأستمتع ببعض الوقت الهادئ في المنزل."]

label = ["positive", "positive", "positive","positive", "positive","positive", "negative","negative","negative", "negative",
         "negative","negative", "neutral","neutral", "neutral","neutral", "neutral", "neutral"]


# Placeholder DataFrame with 'tex' and 'label' columns
data = {
    'tex': tex,
    'label': label
}

The Classifier Code:

In our sentiment analysis, we use various tools to process Arabic text. TF-IDF helps convert words into numbers, capturing their importance in each sentence. This numerical representation is crucial for accurate predictions. With Calibrated SVC, we classify sentiments, adjusting confidence levels for reliability. The Vectorizer acts as a translator, converting Arabic words into numerical features for effective sentiment pattern recognition. These components form the backbone of our sentiment analysis, allowing us to interpret Arabic sentiments precisely.

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from lime.lime_text import LimeTextExplainer
from bidi.algorithm import get_display
import arabic_reshaper



# Create and shuffle DataFrame
df = pd.DataFrame(data)
shuffled_df = df.sample(frac=1).reset_index(drop=True)

# Initialize vectorizer and fit it with the text data
vectorizer = TfidfVectorizer()
vectorizer.fit(shuffled_df['tex'])

# Transform text data
X = vectorizer.transform(shuffled_df['tex'])
y = shuffled_df['label']

# Initialize and train the model
svc = LinearSVC()
calibrated_svc = CalibratedClassifierCV(svc)
calibrated_svc.fit(X, y)

# Initialize the label encoder and fit it with the binary labels
labels_encoder = LabelEncoder()
labels_encoder.fit(y)

# Define the function to process Arabic text
def process_arabic_text(text):
    reshaped_text = arabic_reshaper.reshape(text)
    display_text = get_display(reshaped_text)
    return display_text

# Define predict_proba function for LIME
def predict_proba(texts):
    texts_transformed = vectorizer.transform(texts)
    return calibrated_svc.predict_proba(texts_transformed)

# Initialize LimeTextExplainer
explainer = LimeTextExplainer(class_names=labels_encoder.classes_.astype(str))

# Get the first 20 sentences and their binary labels from the shuffled DataFrame
sentences = shuffled_df['tex'].tolist()
binary_labels = shuffled_df['label'].tolist()

# Iterate over the sentences and explain them
for i, sentence in enumerate(sentences):
    print(f"Original Sentence: {sentence}")
    print(f"Binary Label: {binary_labels[i]}")
    
    # Generate explanation for the sentence
    explanation = explainer.explain_instance(sentence, predict_proba, num_features=10)
    
    # Print the explanation
    explanation.show_in_notebook(text=True)
    explanation.as_list()

    # Get the explanation as a list
    explanation_list = explanation.as_list()
    
    for feature, weight in explanation_list:
        # Reshape Arabic text for correct display
        display_text = arabic_reshaper.reshape(feature)
        # Print the feature and its weight
        print(display_text, weight)   
        
    print()
    print(20 * "#")
    print()

Lime offers key insights into our sentiment analysis model, revealing the primary factors guiding its predictions for Arabic text sentiment. By examining individual samples, Lime clarifies which words play the most significant role in determining whether the sentiment is positive, negative, or neutral. This transparency enhances our confidence in the model’s accuracy and promotes a deeper understanding of sentiment analysis in Arabic text.

Example of neutral prediction:

Example of negative prediction:

Example of positive prediction:

Arabic NLP — Explain semantic predictions with LIME

Written by ALSHARGI