How to evaluate AI and machine learning solutions

Published in

Panaseer Labs Engineering & Data Science

6 min readMar 19, 2020

Have you ever found yourself struggling to tell whether an AI or machine learning solution is worth it? Or understand what it does? If so, you are not alone.

The availability of AI and machine learning products has never before been so far reaching. Some incredible advancements have been made possible using these techniques. This has been followed by a lot of hype around the topic, where coverage can make it sound like AI and machine learning are magic tools that will solve any problem they are presented with.

If you think such coverage sounds too good to be true, it’s because it is.

In reality, these solutions range from groundbreaking new discoveries to completely useless or even disadvantageous.

So, how can you tell where on the spectrum a given solution is?

To make the topic less mysterious, and make it easier to see past buzzwords like “data-driven” and “AI-powered” that are frequently used in a persuasive manner while sweeping the details neatly under the carpet, I have written this blog to shed light on what distinguishes a good AI or machine learning solution from a bad one. The coverage is centred around security products, but the same principles work across industries and types of solutions.

It’s worth noting that Panaseer does not feature any AI or machine learning capabilities. Instead, our tool relies on a process called ‘entity resolution’ which we will discuss in more detail in an upcoming post.

How do AI and ML differ?

We can look at artificial intelligence as an extension of machine learning. Solutions that are available today all fall into the machine learning subset of artificial intelligence.

ML is a subset of AI, which in turn is a subset of Advanced Analytics

An example of a machine learning solution is an algorithm that was trained on a dataset for malware detections and, based on the patterns it observed in that data, provides logic that we can use to make predictions about future malware detections.

If this solution also learned to determine when users of the computers in question were not acting in line with their user awareness training (without a human formulating that task), this solution would step out of the machine learning range of artificial intelligence, and into the territory of true artificial intelligence. The key distinction here is the part where the algorithm formulated a task without human assistance. An action that resembles intelligence. Computers have always been good at completing clearly defined tasks efficiently at large scale, but this does not require intelligence.

I generally refer to ‘machine learning’ when discussing solutions that we have available today, as I think there is less confusion around this term. Therefore, I will use this term instead of the slightly awkward “AI and machine learning” for the rest of this blog.

Now, onto the real question. How can you tell a good machine learning solution from a bad one?

To make this scenario realistic, let’s imagine you are speaking to a security vendor who is trying to convince you that their product will solve a key security problem for you. What can you ask them to determine whether this product is worth your time and effort?

1. What do you mean by [insert buzzword here]?

Can they explain what their product does without relying too heavily on buzzwords?

Buzzwords are not necessarily a problem, but they reveal very little, so further information is usually required. “Trust me, it’s AI” is never a valid response.

If the vendor is unable to explain in simple terms what their product does, this is a big warning sign.

2. How specifically do you use machine learning?

There are usually various tasks within an application that can be carried out by machine learning.

Let’s say that the product you’re examining is an extension to a vulnerability scanner that helps you prioritise findings for remediation. In that case, machine learning could be used to enrich the vulnerability information the scanner provides, to estimate which finding will be easiest to remediate, to estimate which finding has the largest potential to cause harm if it is left open or even just to make the scanning process run smoother.

The more business-critical that the decision being made by the machine learning part of the product is, the more assurance you want that it’s correct, adding value and performing in line with business priorities.

3. Why did you choose to use machine learning?

Due to the hype, it can be tempting to add machine learning to products primarily to meet demands from the market. It may come as a surprise, but developing a machine learning solution can be easily achieved in a few lines of code.

See the code snippet below for a demonstration. Using a programming language called python, we have created an machine learning model (this type of model is called a linear regression model) with the help of popular open source libraries pandas, numpy and sklearn:

import pandas as pd
import numpy as np
from sklearn.linear_model import LiearRegression# Read in data and split it into labels and everything else
df = pd.read_csv('training_data.csv')
features = df.iloc[:,:-1]
labels = df.iloc[:,-1]# This one line below is where the model gets created:
clf = LogisticRegression(random_state=0, solver='liblinear').fit(features, labels)

Developing a machine learning solution is easy — the tricky bit is developing a good solution. This includes making sure that:

The research question is clearly formed. “What is the optimal frequency of vulnerability scanning?” is a lot better than “How can I improve my vulnerability program?”.
Training data is of sufficient quality. A machine learning solution will only ever be as good as the quality of the data it was trained on. The data needs to be accurate, complete, up-to-date and not biased (unless there’s an explicit reason for why biased data is OK for the use case). If 80% of the records in the training data concern laptops, this solution may for example not perform well for mobile devices.
Requirements are properly defined. A solution that overwhelms a security analyst with alerts may for example not add any value, despite being “correctly implemented”.
Potential to cause harm has been considered. Are false negatives more important than false positives? Could adversarial parties take advantage of this solution? Does the solution treat sensitive data appropriately? Is the solution discriminatory? These are all important questions to consider.

The list goes on and on. For this reason it’s good to understand whether machine learning was used because it makes the product better, or for some other less desirable reason.

4. How was the model trained?

It may be difficult to get detailed answers to this one, but the key component that matters here is what sort of data the model was trained on. Generally, more data is better so the more extensive the data is, the better the model is likely to perform.

Diversity of data is also very important, so aggregating (and normalising) data from multiple sources can lead to improved performance.

Lastly, the more that data has been used, the more likely it is that limitations of the dataset are known. A small bespoke dataset is therefore generally not as good news as a well understood extensive dataset that has been used successfully for other similar purposes.

Another thing to consider is whether the model was trained to provide value straight out of the box, or whether it will require some period of training on your data before it becomes useful. For some tasks, a period of adjusting the model to your data may be unavoidable, but if that duration is extensive the usefulness of the model generally decreases as data tends to change with time too.

5. Can I validate the decisions being made?

If the product is being used to make important security decisions, you want to have an overview of what decisions are being made. A good machine learning product that is designed to make critical decisions will be explainable.

Last, but not least

Ask about anything that is not clear. Simple questions are usually key in uncovering whether a solution is worth the hype or not.

The key lesson to take away from this blog is that machine learning is a tool that can be used to achieve great things when applied correctly, but when caveats are overlooked, such products quickly cease to be worth the glossy paper they are advertised on.

This blog was based on a talk that I gave during BSides London 2019 (you can see a recording of this talk below) and a more detailed talk that I gave at DeepSec 2019.