Confusion Matrix - Is "accuracy(metric)" flawed ?

 Hello friends, as a machine learning student I grappled with this concept and was finally enlightened 😁 to the fact that accuracy(metric) may not provide the right insight into the effectiveness of your model. 

Reference confusion matrix

 

 

Actual

 

 

Positive

Negative

Predicted

Positive

True +ve

False +ve

Negative

False -ve

True -ve


Let's start with some examples:

Example1: Consider a class of 500 school students and you are trying to predict how many brush their teeth before coming to school.

 

 

 

Actual

 

 

 

Positive

Negative

 

Predicted

Positive

250

50

300

Negative

0

200

200

 

 

250

250

 

Total sample size: 500

  • Ground Truth:
    • Actual positive: 250, Actual negative: 250
  • Prediction:
    • Predicted positive: 300, Predicted negative: 200

Accuracy: True +ve True -ve/Total samples = 250+200/500 = 90% accuracy

Example2: Consider the same class of 500 school students and now you are trying to predict how many are affected because of covid and you can recommend them for isolation/quarantine.

 

 

Actual

 

 

 

Positive

Negative

 

Predicted

Positive

5

0

300

Negative

5

490

200

 

 

10

490

 

Total sample size: 500

  • Ground Truth:
    • Actual positive: 10, Actual negative: 490
  • Prediction:
    • Predicted positive: 5, Predicted negative: 495
Accuracy: True +ve True -ve/Total samples = 495/500 = 99% accuracy

Hurray! you got an accuracy of 99% not bad at all. However, when you re-review the data again you realize that even though your model is 99% accurate but it failed to identify 5 students who were positive, but incorrectly classified as negative, you know the consequence of this miss 😒 

Whom should you blame? -> the metrics
Why did it happen? -> unbalanced data, the # of negative class samples(490) outweighed the positive class samples(10)
What should you do next? -> find a more robust metric/s

Search for a robust metric/s

1) Precision(of the positive class): Percentage of True positive correctly classified by prediction.


 

True +ve

Total +ve

(predicted)

Precision

(True+ve)/ Total +ve(predicted)

Example1

250

300

100%

Example2

5

5

100%


 

😩 sadly precision metric is not able to catch the problem, Let's try Recall...

2) Recall(of the positive class): Percentage of True positive from the actual positive(ground truth) classification



 

True +ve

Total +ve (actual)

Precision

(True+ve)/ Total +ve(actual)

Example1

250

250

100%

Example2

5

10

50%




We got it this time, recall is able to capture the poor quality of positive class prediction. 

Does this mean we don't need precision? Not really let me explain this in my next post  😁

Have a great day!



Comments

Popular posts from this blog

ROC-AUC explained

R2 or Adjusted R2(Where is the adjustment ?)