Posts

Confusion Matrix - "accuracy(metric)" flawed ?_post#2

Hello friends, this is my second post related to the confusion matrix measurement metric.  In my previous post , I explained how accuracy metrics may mislead judging the performance of a confusion matrix(an output from a classification model). We have also seen how Recall (True positive over Actual positive) metric in Example2 was able to catch the poor quality of positive class prediction.  We left the earlier post at this thought......      Does this mean we don't need precision?   The answer is No, let's see another example for the same use case Example3 : Consider the same class of 500 school students and you are trying to predict how many are affected because of covid and you can recommend them for isolation/quarantine.  This time you wanted to make sure there should not declare any student negative if he/she "may" be positive, you added some additional checks during testing, and the following was the result:     Actual   ...

ROC-AUC explained

Image
Hello Friends, this is the first post (of multiple posts) for one of the most important classification metrics measurement ROC/AUC . The subsequent post will be hands-on demonstrating ROC/AUC using python and associated libraries.  Let's set the baseline first:      1) Precision is True positive over Predicted positive      2) Recall is True positive over Actual positive (Recall is also referred as True positivity rate OR Sensitivity) 3) Let's introduce another team, Specificity     Specificity is True Negative Rate(TNR): True negative over Actual negative Sensitivity and Specificity are inversely proportional to each other  False Positive Rate : Proportion of False positives over actual negatives.  It can also be calculated as:  FPR =  1- Specificity  For ROC we need a True-positive rate(TPR) and a False-negative rate(FPR) . Now let's observe visually different model performance(with threshold @0.5) Perfe...

Confusion Matrix - Is "accuracy(metric)" flawed ?

Image
  Hello friends, as a machine learning student I grappled with this concept and was finally enlightened 😁 to the fact that accuracy(metric) may not provide the right insight into the effectiveness of your model.  Reference confusion matrix     Actual     Positive Negative Predicted Positive True +ve False +ve Negative False -ve True -ve Let's start with some examples: Example1 : Consider a class of 500 school students and you are trying to predict how many brush their teeth before coming to school.       Actual       Positive Negative   Predicted Positive 250 50 300 Negative 0 200 200     250 250   Total sample size: 500 Ground Truth: Actual positive: 250, Actual negative: 250 Prediction: Predicted positive: 300, Predicted negative: 200 Accuracy:  True +ve   +  True -ve /Total samples = 250+200/500 = 90% accuracy Example2 : Consider the same class of 500 school students and now you are tryin...

R2 or Adjusted R2(Where is the adjustment ?)

Image
Hello friends, this post is about " adjusted r2 " starting with a small embarrassing story.  My professor explained R2 followed by its limitations and introduced us to " adjusted r2 " . I understood R2 very well but adjusted R2 was hazy and I could not gather the courage to raise my hand  🙋‍♂️  and ask additional  questions to clarify my understanding.  Consequence 🤦‍♂️.....  now I have to spend extra time going through "innumerable" online posts and videos to understand the concept. Finally, I got some insights and am happy to share 🙂 First thing first:  What is R2 ? -> It is the measure of the proportion of the variability in the data that your model is able to explain. Let's take some examples: 1. Say you bought 10 notebooks and recorded the # of pages w.r.t to weight and generated a linear model.  In my manufactured example data, you can observe how poor R2 is, only 7.2% and i.e because data is spread everywhere and I chose to apply a l...