Sentiment Analysis with Machine Learning ML.NET

Published in

CodeNx

8 min readDec 6, 2023

Sentiment analysis offers insights into the public’s feelings towards products, brands, or topics by analyzing customer feedback. Microsoft’s machine learning framework ML.NET, makes this task accessible without needing deep knowledge in machine learning.

What is Sentiment Analysis?

Imagine you’re sifting through thousands of tweets, customer reviews, or emails. You want to gauge the general sentiment: are people happy, frustrated, or neutral?

Doing this manually is tedious and time-consuming. This is where sentiment analysis comes in . It’s like having a smart assistant who reads through the lines and understands the emotions and classifies customer feedback.

Data Preparation

Before diving into the code, it’s crucial to prepare your data correctly. Sentiment analysis relies on having a well-structured dataset. Let’s explore how to set up your data for effective analysis.

Structuring Your Data

Your dataset should be in a CSV (Comma-Separated Values) format. Each row in this file will represent a different customer feedback instance, with two main columns:

Sentiment Text: The actual text of the customer feedback.
Sentiment Label: The sentiment of the feedback, which in our case can be “Happy”, “Neutral”, or “Frustrated”.

Here’s a simple example of how your CSV file might look:

SentimentText,Sentiment
Love the speedy service and friendly staff!,Happy
The product is fine but nothing extraordinary.,Neutral
Waited too long for my order quite frustrating experience.,Frustrated
Extremely pleased with the product quality. It is outstanding,Happy
Service was okay but I had higher expectations.,Neutral
Disappointed with the delayed delivery,Frustrated
The customer service was excellent!,Happy
The product works as expected nothing remarkable though.,Neutral
Disappointed by the lack of features in the product.,Frustrated
Very satisfied with the quick delivery.,Happy
The product is alright but I've seen better.,Neutral
Unhappy with the product's poor performance.,Frustrated
Thrilled with the high quality of the product.,Happy
Found the user interface to be quite average.,Neutral
Experiencing repeated issues with the service is very bad.,Frustrated
Overjoyed with the prompt customer support response.,Happy
Product is okay but lacks exciting features.,Neutral
Long wait times for support are becoming a norm.,Frustrated
Delighted with the seamless and user-friendly experience.,Happy
The product does its job but it's nothing special.,Neutral
Very frustrated with the constant technical glitches.,Frustrated
Completely satisfied with the fast and efficient service.,Happy
The product's performance is mediocre at best.,Neutral
Very frustrated due to the incorrect billing charges.,Frustrated
Extremely happy with the product's innovative features.,Happy
Neutral about the product it meets basic requirements.,Neutral
Encountering poor customer service has been a frustrating experience.,Frustrated

Guidelines for Data Preparation

Text Quality: Clean the text data, removing irrelevant characters and ensuring it’s relevant for sentiment analysis.
Class Balance: Aim for a balanced distribution among “Happy”, “Neutral”, and “Frustrated” samples. This helps prevent model bias toward a specific sentiment.
Consistent Formatting: Ensure the CSV is properly formatted with consistent use of commas as separators and quotes for text fields that contain commas
Representative Samples: Include a diverse range of feedback examples in each category to capture the variability in customer sentiments.

Getting Started with ML.NET

First things first, you need to set up your .NET environment and install the ML.NET NuGet package. You can do this using the NuGet Package Manager or the Package Manager Console.

Install-Package Microsoft.ML

Data Models — The Blueprint

Before we start building, we need a blueprint — this is our data model. In ML.NET, we define classes to represent your input data and predictions:

using Microsoft.ML.Data;

public class SentimentData
{
    [LoadColumn(0)]
    public string SentimentText;

    [LoadColumn(1), ColumnName("Label")]
    public string Sentiment;
}

public class SentimentPrediction
{
    [ColumnName("PredictedLabel")]
    public string Prediction { get; set; }  
}

The SentimentData class plays a crucial role in both training the ML.NET model and feeding new data into it for predictions. It’s a container holding each piece of text/customer feedback and its corresponding sentiment (true for positive, false for negative).

[LoadColumn(0)] public string SentimentText;: This property is for the text content you want to analyze. The [LoadColumn(0)] attribute tells ML.NET that this data comes from the first column (column index 0) from our dataset.
[LoadColumn(1), ColumnName("Label")] public string Sentiment;: This property represents the sentiment label for each piece of text/customer feedback. This string format can represent multiple sentiment categories like Positive, Neutral, and Negative. The [LoadColumn(1)] attribute indicates that this data is drawn from the second column from our dataset.
By applying the ColumnName("Label") attribute, this property is renamed to "Label" within the ML.NET pipeline. This renaming aligns with the common machine learning practice where the target variable (the outcome you're trying to predict) is labeled as "Label".

SentimentPrediction class is used for model prediction.

The Analysis Method — Machine Learning Pipeline

Initialization and Data Loading

var mlContext = new MLContext();
var dataView = mlContext.Data.LoadFromTextFile<SentimentData>(
  "path_to_your_data.csv", separatorChar: ',', hasHeader: true);

// Split the data into train and test sets
var trainTestSplit = mlContext.Data.TrainTestSplit(dataView, testFraction: 0.2);
var trainSet = trainTestSplit.TrainSet;
var testSet = trainTestSplit.TestSet;

var mlContext = new MLContext();: This line initializes a new instance of MLContext, which is a starting point for all ML.NET operations. It represents the environment where machine learning models are created, trained, and evaluated.
var dataView = mlContext.Data.LoadFromTextFile<SentimentData>("path_to_your_data.csv", hasHeader: true);: Here, the data is loaded from a CSV file into an IDataView. This IDataView is a flexible, efficient way of describing tabular data (numeric and text). The hasHeader: true parameter indicates that the first line of the CSV file contains column headers.
Here, we are splitting the data into training data set and testing data set. Training data set will be used for the model to get trained with trainSet, and testSet will be used for model evaluation.

Data Processing Pipeline

// Data processing pipeline
var dataProcessPipeline = mlContext.Transforms.Conversion.MapValueToKey("Label")
    .Append(mlContext.Transforms.Text.FeaturizeText(
      outputColumnName: "Features", 
      inputColumnName: nameof(SentimentData.SentimentText)));

mlContext.Transforms.Conversion.MapValueToKey("Label"): This line of code is transforming the 'Label' column of our dataset. The transformation maps each distinct value in the 'Label' column (which are string values representing sentiment categories in our case) to a key (a numeric value) that ML.NET algorithms can work with more efficiently.
This step is essential for multiclass classification as it helps in converting string labels into a numeric format that the machine learning algorithm can process.
.Append(mlContext.Transforms.Text.FeaturizeText(outputColumnName: "Features", inputColumnName: nameof(SentimentData.SentimentText))): This line is appending a text featurizer to your pipeline.
FeaturizeText is a transformation that converts the text data (SentimentText) into numerical features that the machine learning model can understand. This process is known as feature extraction or text vectorization. The output of this transformation is a new column named ‘Features’, which contains the numeric vector representation of the input text.

Training Algorithm

var trainer = mlContext.MulticlassClassification
  .Trainers
  .SdcaMaximumEntropy(
    labelColumnName: "Label", featureColumnName: "Features");

This code sets up the training algorithm for your model, which is a multiclass classification algorithm.
SdcaMaximumEntropy is the trainer used here. It stands for Stochastic Dual Coordinate Ascent with Maximum Entropy, a popular algorithm for multiclass classification tasks.
labelColumnName: "Label" specifies that the 'Label' column (transformed earlier) is the target column you're trying to predict.
featureColumnName: "Features" indicates that the features produced by the text featurization (the 'Features' column) will be used as the input features for the model.

var trainingPipeline = dataProcessPipeline
  .Append(trainer)
  .Append(mlContext.Transforms.Conversion.MapKeyToValue("PredictedLabel"));

The final training pipeline is constructed by appending the trainer to the data processing pipeline.
This creates a sequence where the input data will first undergo the transformations (label mapping and text featurization) and then be fed into the training algorithm.
.Append(mlContext.Transforms.Conversion.MapKeyToValue("PredictedLabel")): After the model makes predictions, this transformation maps the predicted numeric keys back to their original string values (like "Happy", "Neutral", "Frustrated"). This step is crucial for interpretability, as it converts the model’s predictions from numeric keys back to understandable sentiment labels.

Model Training

var trainedModel = trainingPipeline.Fit(trainSet);

The Fit method trains the model on the provided dataset (dataView). This is where the algorithm learns from the data, forming the trained model.

Model Evaluation

var predictions = trainedModel.Transform(testSet);

var metrics = mlContext.MulticlassClassification.Evaluate(predictions, "Label");

After training, the model is used to make predictions on the same dataset. This is typically for evaluation purposes to understand the model’s performance.
The Evaluate method computes various metrics like accuracy, AUC (Area Under the Curve), and F1 score, which are essential for assessing the performance of the multi classification model.

Displaying Evaluation Metrics

Console.WriteLine($"Macro accuracy: {metrics.MacroAccuracy:P2}");
Console.WriteLine($"Micro accuracy: {metrics.MicroAccuracy:P2}");
Console.WriteLine($"Log loss: {metrics.LogLoss:P2}");

metrics.MacroAccuracy: Macro accuracy computes the metric independently for each class and then takes the average (hence treating all classes equally). This can be important in datasets where class imbalances are present.
metrics.MicroAccuracy: Micro accuracy aggregates the contributions of all classes to compute the average metric. In a multiclass classification problem, it gives an overall effectiveness of the classifier.
metrics.LogLoss: Logarithmic Loss (or Log Loss) measures the performance of a classification model where the prediction is a probability value between 0 and 1. Log loss increases as the predicted probability diverge from the actual label, so a model with perfect predictions will have a log loss of 0.

Saving the Trained Model

Once the model is trained and evaluated, it’s essential to save it for future use. This way, you don’t have to retrain the model every time we want to make predictions. Here’s how you can save your trained model in ML.NET:

// Save the trained model to a file
var modelPath = "path_to_save_your_model.zip";
mlContext.Model.Save(trainedModel, dataView.Schema, modelPath);

This code snippet uses the Save method of the MLContext.Model property.
trainedModel is the model you've trained.
dataView.Schema provides the schema of the data the model was trained on, ensuring compatibility when loading the model for predictions.
"path_to_save_your_model.zip" is the file path where the model will be saved. You can specify any path and file name, typically with a .zip extension.

Making Predictions with the Saved Model

After saving your model, you can load it to make predictions on new data. This is particularly useful for deploying your model in a production environment or integrating it into an application.

// Load the trained model
var loadedModel = mlContext.Model.Load("path_to_your_saved_model.zip", out var modelInputSchema);

// Create prediction engine
var predEngine = mlContext.Model.CreatePredictionEngine<SentimentData, SentimentPrediction>(loadedModel);

// Sample text for prediction
var sampleText = new SentimentData
{
    SentimentText = "The product was outstanding and the service excellent."
};

// Make a prediction
var predictionResult = predEngine.Predict(sampleText);

// Display the prediction
Console.WriteLine($"Predicted Sentiment: {predictionResult.Prediction}");

mlContext.Model.Load loads the saved model from the specified path.
CreatePredictionEngine creates a prediction engine for your model. This engine takes an input of SentimentData and outputs SentimentPrediction.
A new instance of SentimentData is created with sampleText, which is the text you want to analyze.
predEngine.Predict(sampleText) makes the prediction based on the loaded model.
The predicted sentiment label is then displayed in the console.

Predicted Sentiment: Happy

Sentiment text analysis using machine learning ML.NET to identify happy or frustrated or neutral — Image source: Created By Author

Scaling Sentiment Analysis: Insights from Cloud Computing and Azure

As we delve into sentiment analysis with ML.NET, it’s crucial to consider scalability and the potential of cloud computing in enhancing these capabilities. For those interested in exploring how sentiment analysis can be scaled using cloud services, I recommend reading this detailed Medium post, which delves into automated customer review analysis for businesses.

🎯Automated Customer Review Sentiment Analysis for Businesses

🌟Objective

medium.com

I trust this information has been valuable to you. 🌟 Wishing you an enjoyable and enriching learning journey!

📚 For more insights like these, feel free to 👏 follow 👉 Merwan Chinta