Confusion Matrix - "accuracy(metric)" flawed ?

Confusion Matrix - "accuracy(metric)" flawed ?_post#2

March 01, 2022

Hello friends, this is my second post related to the confusion matrix measurement metric.

In my previous post, I explained how accuracy metrics may mislead judging the performance of a confusion matrix(an output from a classification model). We have also seen how Recall(True positive over Actual positive) metric in Example2 was able to catch the poor quality of positive class prediction.

We left the earlier post at this thought......

Does this mean we don't need precision?

The answer is No, let's see another example for the same use case

Example3: Consider the same class of 500 school students and you are trying to predict how many are affected because of covid and you can recommend them for isolation/quarantine.

This time you wanted to make sure there should not declare any student negative if he/she "may" be positive, you added some additional checks during testing, and the following was the result:

		Actual
		Positive	Negative
Predicted	Positive	10	10	20
Predicted	Negative	0	480	480
		10	490

Total sample size: 500

Ground Truth:

Actual positive: 10, Actual negative: 490

Prediction:

Predicted positive: 20, Predicted negative: 480

Accuracy: True +ve + True -ve/Total samples = 495/500 = 98% accuracy

Recall: True+ve/ Total +ve(actual): 10/10 = 100%

Precision: True+ve/ Total +ve(predicted) : 10/10+ 10 = 50%

You can appreciate that Recall was not able to catch the 10 additional samples which were negative but classified as positive, this is where Precision helps.

Let's summarize the results for all three examples(you can find the previous 2 examples here)

	True +ve	Total +ve (actual)	Total +ve (predicted)	Accuracy	Precision (True+ve)/ Total +ve(predicted)		Recall (True+ve)/ Total +ve(actual)
Example1	250	250		90%			100%
Example2	5	10		99%			50%
Example3	10	10	20	98%		50%	100%

Final summary for all the 3 examples

Based on all the 3 examples we can infer the following:

1) Accuracy may not be the right metric, since real-world data is skewed for one type of class.

2) Recall will be able to catch cases where the model predicts fewer positives than actual.

Now, for non-mission-critical use cases like how many students brushed in the morning, it may be OK but for use cases like covid infection, we would like Recall to be close to 100%. Recall of 100% means that +ve class was correctly identified(with some tradeoff for negative class)

3) Precision: For Precision value lower than 100% indicates there were some -ve class samples incorrectly identified as +ve which may be OK for mission-critical use cases(like covid predication, cancer detection), we can advise for a retest. However, if a positive case is misclassified as negative it may prove to be fatal.

Now, this leads to another question, what-if different models predict different Recall & Precision scores for same the data set.

If the results are aligned(all models predicting similar Recall and Precision direction) we are good however if the models predict Precision and Recall metrics that are not aligned(in the example table below) it will not be an easy decision.

	Precision	Recall
ClassificationModel1	Good	OK
Classification Model2	OK	Good

Don't worry there is another metric that can help us in the situation, let's see it in the next post 😉

References:

My previous post on Precision/Recall

Great youtube video on Confusion metrics

Search This Blog

ML-medley- "Is it too complex?"

Confusion Matrix - "accuracy(metric)" flawed ?_post#2

Comments

Post a Comment

Popular posts from this blog

Confusion Matrix - Is "accuracy(metric)" flawed ?

ROC-AUC explained

R2 or Adjusted R2(Where is the adjustment ?)