Shreya Vikas Khedkar
7 min readJun 5, 2021

--

Task-05

What is a Confusion matrix??

A confusion matrix is a technique basically used for summarizing the performance of a classification algorithm.

Classification accuracy alone can be misleading if you have an unequal number If observations in each class or if you have more than two classes in your dataset.

Calculating a confusion matrix can give you a better idea of what your classification model is getting right and what types of error it is making.

The number of correct and incorrect predictions are summarized with count values and broken down by each class.This is the key to the confusion matrix. The confusion matrix shows the ways in which your classification model is confused when it makes predictions. It gives you insight not only into the errors being made by your classifier but more importantly the types of error that are being made.

How to Calculate a Confusion Matrix

1.You need a test dataset or a validation datset with expected outcome values.

2.Make a prediction for each row in your test dataset.

3.From the expected outcomes and prediction count:

  • The number of correct predictions for each class.
  • The number of incorrect predictions for each class, organised by the class that was predicted.

These numbers are then organised into a table,or a matrix as follows:

  • Expected down the side- Each row of the matrix corresponds to a predicted class.
  • Predicted across the top-Each column of the matrix corresponds to an actual class.

The counts of correct and incorrect classification are then filled into the table.

The total number of correct predictions for a class go into the expected row for that class value and the predicted column for that class value.

In the same way,the total number of incorrect predictions for a class go into the expected row for that class value and the predicted column for that class value.

This matrix can be used for 2-class problem where it is very easy to understand,but can easily be applied to problems with 3 or more class values,by adding more rows and columns to the confusion matrix.

2-Class Confusion Matrix

In this we often look to discriminate between observations with a specific outcome from a normal observations.

We can assign the event row as “positive” and the no-event row as “negative”. We can assign the event column or predictions as “true” and no-event as “false”.

This gives us:

  • True positive for correctly predicted event values.
  • False positive for incorrectly predicted values.
  • True negative for correctly predicted no-event values.
  • False negative for incorrectly predicted no event values.

Confusion Matrix in python with scikit-learn

The scikit-learn library for machine learning in python can calculate a confusion matrix.

Given an array or list of expected values and a list of predictions from your machine learning model the confusion_matrix() function will calculate a confusion matrix and return the result as an array.

True/False Positive/negative

All estimation parameters of the confusion matrix are based on 4basic inputs namely True positive,False positive,True negative and False negative.

Understanding the Confusion Matrix

A confusion matrix is used for classification tasks where the output of the algorithm is in two or more classes. While confusion matrices can be as wide and tall as the chosen number of classes, we’ll keep things simple for how and just look at a confusion matrix for a binary classification task, 2×2 confusion matrix.

Let’s say that our classifier wants to predict if a patient has a given disease based upon the symptoms (the features)fed into the classifier. This is a binary classification task,so the patient either has the disease or they don’t.

The left hand side of the prediction matrix displays the class predicted by the classifier Meanwhile,the top row of the matrix stores the actual class labels of the example.

You can look at where the values intersect to see how the network performed. The number of correct positive predictions(True positive) is located in the upper left corner.

Meanwhile, if the classifier called is positive,but the example was actually negative,this is a false positive and it is found in the upper right corner.

The lower-left corner stores the number of examples classified as negative but we’re actually positive,and finally the lower right corner stores the number of genuinely false example or true negatives.

Upper left:True positives

Upper Right:False positives

Lower Left: False Negatives

Lower Right:True negatives

If there are more than two classes,the matrix just grows by the respective number of classes.For instance,if there are four classes it would be a 4×4 matrix.

No matter the number of classes,the principal is still the same the left hand side is the predicted values and the top the actual values. Just check where they intersect to see the number of predicted examples for any given class against the actual number of examples for that class.you should also note that the instances of correct predictions will run down a diagonal from top-left to bottom-right. From seeing this matrix you can calculate the four predictive metrics : sensitivity, specificity,recall and precision.

Types of Errors

Confusion matrices have two types of Errors:Type 1 and Type 2

  • The best way is to re-write False negative and False positive. False positive is a type 1 error because false positive=false true and the only has one false.
  • False negative is a type 2 error because false negative=false false so this there are two False’s making it a type 2.

From our confusion matrix,we can calculate five different metrics measuring the validity of our model.

  • Accuracy (all correct/all)=TP+TN/TP+TN+FP+FN
  • Misclassification(all incorrect/all)=FP+FN/TP+TN+FP+FN
  • Precision (true positives/predicted positives)=TP/TP+FP
  • Recall(true positives/all actual positive)=TP/TP+FN
  • Specificity (true negatives/all actual negatives)=TN/TN+FP

Example

Taken random samples of 500 females to check whether they are diabetic or not. Of these 50 are actually diabetic. I predicted 100 total diabetic women,45 of which are actually diabetic.

Our task is :

  • Identify the TP,TN,FP,FN and construct a confusion matrix and
  • Calculate the accuracy, misclassification, precision, sensitivity and specificity.

I predicted 100 diabetics so our predicted diabetic row should add up to 100.we know that 45 of the 100 were indeed diabetic,so we can put 45 in the predicted diabetic actual diabetic spot,a true positive.

Additionally,50 people in my sample are actually diabetic. So my actual diabetic column should add up to 50.since we already have 45 in this column,we put 5 in the predicted not actual diabetic spot,a False negative.

I predicted 100 diabetics, but only 45of those were actually diabetic.so of all those that I predicted,how many did u falsely predict?The answer is 55 which is my false positive because I falsely predicted the positive outcome.

Finally, the amount of true negatives can be determined by adding 45,55,and 5together then subtracting from total sample of 500.this leaves us with 395true negatives.once our number are filled in,we can double check ourselves by adding up all of our squares and ensuring they add up to 500.

We also know that 55 or false positive is our Type 1 error because we know that I falsely predicted 55 diabetics of my total 100 predicted diabetic.we know that 5or false negative is our Type 2 error because we know that 1 falsely predicted 5 were not diabetic out of the 50total actual diabetics.

Confusion Matrix help us to evaluate the performance of these kind of models.

In this instance,with our diabetes example,a high incidence of false positives is the worst possible outcome.This makes precision the metric 1 would like to focus on, which is pretty terrible at just 45%.This means that we are falsely predicting diabetes and informing women of non existent diabetes 55%of the time.

Cybercrime

What is cybercrime?

Cybercrime is criminal activity that either targets or uses a computer,a computer network or a network device.

Most,but not all, cybercrime is committed by cybercriminals or hackers who want to make money.cybercrimes is carried out bu individuals or organisations.

Some cybercriminals are organised use advanced techniques and are highly technically skilled.others are novice hackers.

Types of cybercrime :

  • Email and internet fraud.
  • Identity fraud (where personal information is stolen and used)
  • Theft of financial or card payment data.
  • Theft and sale of corporate data.
  • Cyberextortion (demanding money to prevent a threatened attack)
  • Ransomware attacks(a type of cyberextortion)
  • Crypto jacking (where hackers mine cryptocurrency using resources they do not own)
  • Cyberspionage(where hackers access government or company data)

Most cybercrimes falls under two main categories

  • Criminal activity that targets
  • Criminal activity that uses computers to commit other crimes.

How to protect yourself against cybercrime

  • Keep software and OS updated
  • Use antivirus software and keep it updated
  • Use strong password
  • Never open attachment in spam emails.
  • Do not click on links in spam emails or untrusted websites.
  • Do not give out personal information unless secure.
  • Contact companies directly about suspicious request.
  • Be mindful of which website URLS you visit
  • Keep an eye on your bank statement

Conclusion

A confusion matrix is a powerful tool for predictive analysis, enabling you to visualise predicted values against actual values.it will take some time to get used to interpreting a confusion matrix but once you have done it will be an important part of your toolkit.

Thank you.!!

--

--