Confusion Matrix in Data Science, meaning and example
What is Confusion Matrix?
Confusion Matrix is a performance measuring technique for ML Classification model.
Why do we need Confusion Matrix? Is measuring accuracy not enough?
Confusion Matrix suggests the actual accuracy of your model. For example. Suppose I want to classify if a person is suffering from a very rare disease(1/100000). Even If i build a very bad model and label everyone as ‘suffering’ from the disease, then also the accuracy of the model will be somewhere around 99%. But that model is of no use because it is unable to solve the problem of classification.
Here comes the confusion matrix which is a 2×2 matrix of predicted and actual values
Here the columns denote the Predicted values and rows denoted the Actual values.
Take example of a fire alarm
True Positive – Prediction is true i.e. there is a fire in the building and there is actually fire in the building. That’s fine
False Negative – There is ‘actual’ fire in the building but your model suggests that the alarm is ringing in vain. This is catastrophic, the same is with the disease example, i.e. the person is infected but the model is unable to identify. It’s type II error
False Positive – The building is not on fire but the model suggests that it is on fire. This is still acceptable, matlab kaam chal jaeyga. Example. The person is not infected by the virus but your model suggests that it is, you will go for a few check-ups and will confirm that you are safe :). This is Type I error
True Negative – No fire, no alarm – All chill
Accuracy = (TP+TN)/(TP+TN+FP+TN)
Precision = TP/(TP+FP) i.e. the accuracy of the positive prediction
Recall Sensitivity = TP/(TP+FN) i.e. Coverage of actual positive results
Specificity = TN/(TN+FP) i.e Coverage of actual negative results
When is precision more important than recall?
Suppose there is a Zombie apocalypse, in that case you want to put as many normal person as possible. But even a single infected person is dangerous, so you look for high precision i.e. less False positive cases
What do they ask in top Data Science Interviews: 5 Complete Data Science Real Interviews Q and A
What do they ask in Top Data Science Interview Part 2: Amazon, Accenture, Sapient, Deloitte, and BookMyShow
Keep Learning 🙂
The Data Monk
The Data Monk services
We are well known for our interview books and have 70+ e-book across Amazon and The Data Monk e-shop page . Following are best-seller combo packs and services that we are providing as of now
- YouTube channel covering all the interview-related important topics in SQL, Python, MS Excel, Machine Learning Algorithm, Statistics, and Direct Interview Questions
Link – The Data Monk Youtube Channel - Website – ~2000 completed solved Interview questions in SQL, Python, ML, and Case Study
Link – The Data Monk website - E-book shop – We have 70+ e-books available on our website and 3 bundles covering 2000+ solved interview questions. Do check it out
Link – The Data E-shop Page - Instagram Page – It covers only Most asked Questions and concepts (100+ posts). We have 100+ most asked interview topics explained in simple terms
Link – The Data Monk Instagram page - Mock Interviews/Career Guidance/Mentorship/Resume Making
Book a slot on Top Mate
The Data Monk e-books
We know that each domain requires a different type of preparation, so we have divided our books in the same way:
1. 2200 Interview Questions to become Full Stack Analytics Professional – 2200 Most Asked Interview Questions
2.Data Scientist and Machine Learning Engineer -> 23 e-books covering all the ML Algorithms Interview Questions
3. 30 Days Analytics Course – Most Asked Interview Questions from 30 crucial topics
You can check out all the other e-books on our e-shop page – Do not miss it
For any information related to courses or e-books, please send an email to [email protected]
Comments ( 9 )
Great content! Super high-quality! Keep it up! 🙂
Such a nice explanation. This is one of the common and vital interview questions.
Nice Explanation
A very nice explanation to the confusion matrix. Thanks a lot for sharing
The denominator should be FN in case of accuracy formula if I am not wrong.
yeah that’s a typo.
Described very accurately !
Nice explanation.
The accuracy formula should be
Accuracy = (TP+TN)/(TP+TN+FP+FN)
If am not wrong.
Good Answer and Explanation