Confusion Matrix From a CyberSecrurity Analyst Perspective

Dipaditya Das
Geek Culture
Published in
10 min readJun 6, 2021

--

In this article, we are going to talk about the Confusion Matrix used by the Cyber-Security Analyst in order to make their systems safer.

In May of 2017, a nasty cyber attack hit more than 200,000 computers in 150 countries over the course of just a few days. Dubbed “WannaCry,” it exploited a vulnerability that was first discovered by the National Security Agency (NSA) and later stolen and disseminated online.

It worked like this: After successfully breaching a computer, WannaCry encrypted that computer’s files and rendered them unreadable. In order to recover their imprisoned material, targets of the attack were told they needed to purchase special decryption software. Guess who sold that software? That’s right, the attackers.

Source: Wikipedia

The so-called “ransomware” siege affected individuals as well as large organizations, including the U.K.’s National Health Service, Russian banks, Chinese schools, Spanish telecom giant Telefonica and the U.S.-based delivery service FedEx. By some estimates, total losses approached $4 billion.

Source: Splunk

Other types of cyber invasions, such as “cryptojacking,” are more insidious and less damaging, but still costly. Cryptojacking is a technique where cyber-criminals disseminate malware on multiple computers or servers. The hack seizes control of a machine’s processing power to mine cryptocurrency — a process that voraciously consumes both computing power and electricity — and then sends that crypto back to the perpetrators.

Even high-profile companies with strong cybersecurity protocols aren’t immune, as evidenced by this 2018 scare at Tesla that was remedied thanks to a vigilant third-party team of cybersecurity experts.

What is Cybercrime?

Cybercrime is criminal activity that either targets or uses a computer, a computer network or a networked device. Most, but not all, cybercrime is committed by cybercriminals or hackers who want to make money. Cybercrime is carried out by individuals or organizations.

Some cybercriminals are organized, use advanced techniques and are highly technically skilled. Others are novice hackers. Rarely, cybercrime aims to damage computers for reasons other than profit. These could be political or personal.

Source: Giphy

Types of cybercrime

Here are some specific examples of the different types of cybercrime:

  • Email and internet fraud.
  • Identity fraud (where personal information is stolen and used).
  • Theft of financial or card payment data.
  • Theft and sale of corporate data.
  • Cyberextortion (demanding money to prevent a threatened attack).
  • Ransomware attacks (a type of cyberextortion).
  • Cryptojacking (where hackers mine cryptocurrency using resources they do not own).
  • Cyberespionage (where hackers access government or company data).

Most cybercrime falls under two main categories:

  • Criminal activity that targets
  • Criminal activity that uses computers to commit other crimes.
Source: Giphy

MACHINE LEARNING IN CYBERSECURITY

Machine learning has become a vital technology for cybersecurity. Machine learning preemptively stamps out cyber threats and bolsters security infrastructure through pattern detection, real-time cyber crime mapping and thorough penetration testing.

What is Confusion Matrix and why we need it?

In the field of machine learning and specifically the problem of statistical classification, a confusion matrix, also known as an error matrix, is a specific table layout that allows visualization of the performance of an algorithm, typically a supervised learning one (in unsupervised learning it is usually called a matching matrix). Each row of the matrix represents the instances in an actual class while each column represents the instances in a predicted class, or vice versa — both variants are found in the literature. The name stems from the fact that it makes it easy to see whether the system is confusing two classes (i.e. commonly mislabeling one as another).

It is a special kind of contingency table, with two dimensions (“actual” and “predicted”), and identical sets of “classes” in both dimensions (each combination of dimension and class is a variable in the contingency table).

Let’s make it simple. When we get the data, after data cleaning, pre-processing and wrangling, the first step we do is to feed it to an outstanding model and of course, get output in probabilities. But hold on! How in the hell can we measure the effectiveness of our model. Better the effectiveness, better the performance and that’s exactly what we want. And it is where the Confusion matrix comes into the limelight. Confusion Matrix is a performance measurement for machine learning classification.

Source: Google

A confusion matrix presents a table layout of the different outcomes of the prediction and results of a classification problem and helps visualize its outcomes. It plots a table of all the predicted and actual values of a classifier.

Source: Google

Teminologies and Derivation from Confusion Matrix

  • Condition Positive (P)
    The number of real positive cases in the data.
  • Condition Negative (N)
    The number of real negative cases in the data.
  • True Positive (TP) | Hits
    The total number of times when the positive predicted values are equally true for actual values. For example: The Systems which are compromised and are predicted correctly.
  • True Negative (TN) | Correct Rejection
    The total number of times when the negative predicted values are equally true for actual values. For example: The Systems which are not compromised and are predicted correctly.
  • False Positive (FP) | False Alarm | Type I Error | Underestimation
    The total number of times when the predicted values are positive but the actual values are negative. For example: The Systems which are not compromised but are predicted as compromised.
  • False Negative (FN) | Miss | Type II Error | Overestimation
    The total number of times when the predicted values are negative but the actual values are positive. For example: The Systems which are compromised but are predicted as not compromised.
  • Sensitivity | Recall | Hit Rate | True Positive Rate (TPR)
    It measures the proportion of positives that are correctly identified (i.e. the proportion of those who have some condition (affected) who are correctly identified as having the condition).
  • Specificity | Selectivity | True Negative Rate (TNR)
    It measures the proportion of negatives that are correctly identified (i.e. the proportion of those who do not have the condition (unaffected) who are correctly identified as not having the condition).
  • Precision | Positive Predictive Value (PPV)
    It is the fraction of relevant instances among the retrieved instances. The ideal value of the PPV, with a perfect test, is 1 (100%), and the worst possible value would be zero. It is the complement of false discovery rate (FDR).
  • Negative Predictive Value (NPV)
    It is defined as the ratio of the number of False negative and number of total negative cases. With a perfect test, one which returns no false negatives, the value of the NPV is 1 (100%), and with a test which returns no true negatives the NPV value is zero. It is the complement of false omission rate (FOR).
  • Accuracy (ACC)
    Accuracy is also used as a statistical measure of how well a binary classification test correctly identifies or excludes a condition. That is, the accuracy is the proportion of correct predictions (both true positives and true negatives) among the total number of cases examined. As such, it compares estimates of pre- and post-test probability. To make the context clear by the semantics, it is often referred to as the “Rand accuracy” or “Rand index”. It is a parameter of the test.

A Detailed Chart that can be determined from a Class-2 Confusion Matrix.

Source: Wikipedia

Two Types of Errors

In statistical test theory, the notion of a statistical error is an integral part of hypothesis testing. The test goes about choosing about two competing propositions called null hypothesis, denoted by H0 and alternative hypothesis, denoted by H1 . This is conceptually similar to the judgement in a court trial. The null hypothesis corresponds to the position of defendant: just as he is presumed to be innocent until proven guilty, so is the null hypothesis presumed to be true until the data provide convincing evidence against it. The alternative hypothesis corresponds to the position against the defendant. Specifically, the null hypothesis also involves the absence of a difference or the absence of an association. Thus, the null hypothesis can never be that there is a difference or an association.

If the result of the test corresponds with reality, then a correct decision has been made. However, if the result of the test does not correspond with reality, then an error has occurred. There are two situations in which the decision is wrong. The null hypothesis may be true, whereas we reject H0. On the other hand, the alternative hypothesis H1 may be true, whereas we do not reject H0.

Type I error

The first kind of error is the rejection of a true null hypothesis as the result of a test procedure. This kind of error is called a type I error (false positive) and is sometimes called an error of the first kind.

Type II error

The second kind of error is the failure to reject a false null hypothesis as the result of a test procedure. This sort of error is called a type II error (false negative) and is also referred to as an error of the second kind.

As you can see depending on the problem statement Type I error can be better than Type II error or vice versa. The chances of committing these two types of errors are inversely proportional, decreasing Type I error rate increases Type II error rate, and vice versa. Risk of committing a Type I error is represented by your alpha level (p value below which you reject the null hypothesis. Read more here). The commonly accepted α = .05 means that you will incorrectly reject the null hypothesis approximately 5% of the time. To decrease your chance of committing a Type I error, you can make your alpha (p) value more strict. Alternately, you can also increase the sample size.

Similarly, if our dataset is cyberthreat analysis in nature, we have to side more on the side of not comprised. On the other hand, something like spam filtering requires a different approach to these errors.

The results obtained from negative sample (left curve) overlap with the results obtained from positive samples (right curve). By moving the result cutoff value (vertical bar), the rate of false positives (FP) can be decreased, at the cost of raising the number of false negatives (FN), or vice versa.

Cyberattack detection is a classification problem, in which we classify the normal pattern from the abnormal pattern (attack) of the system.

The SDF is a very powerful and popular data mining algorithm for decision-making and classification problems. It has been using in many real-life applications like medical diagnosis, radar signal classification, weather prediction, credit approval, and fraud detection, etc.

A parallel Support Vector Machine (pSVM) algorithm was proposed for the detection and classification of cyber attack datasets.

The performance of the support vector machine is greatly dependent on the kernel function used by SVM. Therefore, we modified the Gaussian kernel function in a data-dependent way in order to improve the efficiency of the classifiers. The relative results of both the classifiers are also obtained to ascertain the theoretical aspects. The analysis is also taken up to show that PSVM performs better than SDF.

The classification accuracy of PSVM remarkably improve (accuracy for Normal class as well as DOS class is almost 100%) and comparable to false alarm rate and training, testing times.

Conclusion

Machine learning does some things really well, such as quickly scanning large amounts of data and analyzing it using statistics. Cybersecurity systems generate reams of data, so it’s no wonder the technology is such a useful tool.

“We have more and more data available, and the data is generally telling a story,” Raffael Marty, chief research and intelligence officer at cybersecurity firm Forcepoint, tells Built In. “If you understand how to analyze the data, you should be able to come up with the deviations from the norm.”

And those deviations sometimes reveal threats. Thanks to that important function, the use of machine learning is surging in multiple sectors. It’s employed for tasks that require image recognition and speech recognition. It has even defeated the world’s top Go player at his own game.

But while it has improved cybersecurity, Marty says, humans are still crucial.

“There’s this promise that you can just look at past data to predict the future — forgetting that domain expertise is really important in this equation,” he says. “There are groups of people who think you can learn everything from the data, but that’s simply not true.”

Over-reliance on AI in cybersecurity can create a false sense of safety, Marty adds. That’s why, in addition to judiciously applied algorithms, his firm employs cybersecurity experts, data scientists, and psychologists. As with all current artificial intelligence, machine learning supplements and enhances human efforts, rather than replacing them.

I hope you liked this article and in the future, I am going to write a lot of articles explaining the core concepts of Machine Learning and Statistics.

--

--

Dipaditya Das
Geek Culture

IN ● MLOps Engineer ● Linux Administrator ● DevOps and Cloud Architect ● Kubernetes Administrator ● AWS Community Builder ● Google Cloud Facilitator ● Author