Hello friends, as a machine learning student I grappled with this concept and was finally enlightened 😁 to the fact that accuracy(metric) may not provide the right insight into the effectiveness of your model.
Reference confusion matrix
| | Actual |
| | Positive | Negative |
Predicted | Positive | True +ve | False +ve |
Negative | False -ve | True -ve |
Let's start with some examples:
Example1: Consider a class of 500 school students and you are trying to predict how many brush their teeth before coming to school.
| | Actual | |
| | Positive | Negative | |
Predicted | Positive | 250 | 50 | 300 |
Negative | 0 | 200 | 200 |
| | 250 | 250 | |
Total sample size: 500
- Ground Truth:
- Actual positive: 250, Actual negative: 250
- Prediction:
- Predicted positive: 300, Predicted negative: 200
Accuracy: True +ve + True -ve/Total samples = 250+200/500 = 90% accuracy
Example2: Consider the same class of 500 school students and now you are trying to predict how many are affected because of covid and you can recommend them for isolation/quarantine.
| | Actual | |
| | Positive | Negative | |
Predicted | Positive | 5 | 0 | 300 |
Negative | 5 | 490 | 200 |
| | 10 | 490 | |
Total sample size: 500
- Ground Truth:
- Actual positive: 10, Actual negative: 490
- Prediction:
- Predicted positive: 5, Predicted negative: 495
Accuracy: True +ve + True -ve/Total samples = 495/500 = 99% accuracy
Hurray! you got an accuracy of 99% not bad at all. However, when you re-review the data again you realize that even though your model is 99% accurate but it failed to identify 5 students who were positive, but incorrectly classified as negative, you know the consequence of this miss 😒
Whom should you blame? -> the metrics
Why did it happen? -> unbalanced data, the # of negative class samples(490) outweighed the positive class samples(10)
What should you do next? -> find a more robust metric/s
Search for a robust metric/s:
1) Precision(of the positive class): Percentage of True positive correctly classified by prediction.
| True +ve | Total +ve (predicted) | Precision (True+ve)/ Total +ve(predicted) |
Example1 | 250 | 300 | 100% |
Example2 | 5 | 5 | 100% |
😩 sadly precision metric is not able to catch the problem, Let's try Recall...
2) Recall(of the positive class): Percentage of True positive from the actual positive(ground truth) classification
| True +ve | Total +ve (actual) | Precision (True+ve)/ Total +ve(actual) |
Example1 | 250 | 250 | 100% |
Example2 | 5 | 10 | 50% |
We got it this time, recall is able to capture the poor quality of positive class prediction.
Does this mean we don't need precision? Not really let me explain this in my next post 😁
Have a great day!
Comments
Post a Comment