Confusion Matrix in Data Science, meaning and example

What is Confusion Matrix?
Confusion Matrix is a performance measuring technique for ML Classification model.

Why do we need Confusion Matrix? Is measuring accuracy not enough?
Confusion Matrix suggests the actual accuracy of your model. For example. Suppose I want to classify if a person is suffering from a very rare disease(1/100000). Even If i build a very bad model and label everyone as ‘suffering’ from the disease, then also the accuracy of the model will be somewhere around 99%. But that model is of no use because it is unable to solve the problem of classification.
Here comes the confusion matrix which is a 2×2 matrix of predicted and actual values

Here the columns denote the Predicted values and rows denoted the Actual values.

Take example of a fire alarm
True Positive – Prediction is true i.e. there is a fire in the building and there is actually fire in the building. That’s fine
False Negative – There is ‘actual’ fire in the building but your model suggests that the alarm is ringing in vain. This is catastrophic, the same is with the disease example, i.e. the person is infected but the model is unable to identify. It’s type II error
False Positive – The building is not on fire but the model suggests that it is on fire. This is still acceptable, matlab kaam chal jaeyga. Example. The person is not infected by the virus but your model suggests that it is, you will go for a few check-ups and will confirm that you are safe :). This is Type I error
True Negative – No fire, no alarm – All chill

Accuracy = (TP+TN)/(TP+TN+FP+TN)
Precision = TP/(TP+FP) i.e. the accuracy of the positive prediction
Recall Sensitivity = TP/(TP+FN) i.e. Coverage of actual positive results
Specificity = TN/(TN+FP) i.e Coverage of actual negative results

When is precision more important than recall?
Suppose there is a Zombie apocalypse, in that case you want to put as many normal person as possible. But even a single infected person is dangerous, so you look for high precision i.e. less False positive cases

What do they ask in top Data Science Interviews: 5 Complete Data Science Real Interviews Q and A
What do they ask in Top Data Science Interview Part 2: Amazon, Accenture, Sapient, Deloitte, and BookMyShow

Keep Learning 🙂

The Data Monk

Comments ( 9 )

AffiliateLabz
February 16, 2020 at 8:12 am

Great content! Super high-quality! Keep it up! 🙂
Shubham Bhatt Contributor
July 28, 2020 at 4:46 am

Such a nice explanation. This is one of the common and vital interview questions.
Diksha
July 28, 2020 at 11:00 am

Nice Explanation
Siddhant Mittal
August 2, 2020 at 9:13 am

A very nice explanation to the confusion matrix. Thanks a lot for sharing
Anirudh99 Newbie
August 2, 2020 at 3:45 pm

The denominator should be FN in case of accuracy formula if I am not wrong.
- Prakhar
  May 24, 2021 at 7:45 am
  
  yeah that’s a typo.
Suvam
May 24, 2021 at 5:26 pm

Described very accurately !
khushnuma Newbie
May 25, 2021 at 4:45 am

Nice explanation.
The accuracy formula should be
Accuracy = (TP+TN)/(TP+TN+FP+FN)
If am not wrong.
Arpit
May 28, 2021 at 1:58 pm

Good Answer and Explanation

Register Now

Login

Lost Password

Login

Register Now

Confusion Matrix in Data Science, meaning and example

Top Categories

Confusion Matrix in Data Science, meaning and example

About TheDataMonkGrand Master

Related Posts

Adobe Analytics Interview Questions – SQL

Statistics Complete Tutorial – 7 Days Analytics Course

Case Study for Analytics Interviews – 7 Days Analytics

Pandas Complete Tutorial – 7 Days Analytics

Python Complete Tutorial – 7 Days Analytics

Comments ( 9 )

Subscribe to our newsletter