Posts

Showing posts from February, 2022

Confusion Matrix - Is "accuracy(metric)" flawed ?

Image
  Hello friends, as a machine learning student I grappled with this concept and was finally enlightened 😁 to the fact that accuracy(metric) may not provide the right insight into the effectiveness of your model.  Reference confusion matrix     Actual     Positive Negative Predicted Positive True +ve False +ve Negative False -ve True -ve Let's start with some examples: Example1 : Consider a class of 500 school students and you are trying to predict how many brush their teeth before coming to school.       Actual       Positive Negative   Predicted Positive 250 50 300 Negative 0 200 200     250 250   Total sample size: 500 Ground Truth: Actual positive: 250, Actual negative: 250 Prediction: Predicted positive: 300, Predicted negative: 200 Accuracy:  True +ve   +  True -ve /Total samples = 250+200/500 = 90% accuracy Example2 : Consider the same class of 500 school students and now you are tryin...

R2 or Adjusted R2(Where is the adjustment ?)

Image
Hello friends, this post is about " adjusted r2 " starting with a small embarrassing story.  My professor explained R2 followed by its limitations and introduced us to " adjusted r2 " . I understood R2 very well but adjusted R2 was hazy and I could not gather the courage to raise my hand  🙋‍♂️  and ask additional  questions to clarify my understanding.  Consequence 🤦‍♂️.....  now I have to spend extra time going through "innumerable" online posts and videos to understand the concept. Finally, I got some insights and am happy to share 🙂 First thing first:  What is R2 ? -> It is the measure of the proportion of the variability in the data that your model is able to explain. Let's take some examples: 1. Say you bought 10 notebooks and recorded the # of pages w.r.t to weight and generated a linear model.  In my manufactured example data, you can observe how poor R2 is, only 7.2% and i.e because data is spread everywhere and I chose to apply a l...