GLOBAL RESEARCH SYNDICATE
No Result
View All Result
  • Login
  • Latest News
  • Consumer Research
  • Survey Research
  • Marketing Research
  • Industry Research
  • Data Collection
  • More
    • Data Analysis
    • Market Insights
  • Latest News
  • Consumer Research
  • Survey Research
  • Marketing Research
  • Industry Research
  • Data Collection
  • More
    • Data Analysis
    • Market Insights
No Result
View All Result
globalresearchsyndicate
No Result
View All Result
Home Data Analysis

Looking at R-Squared – Erika D

globalresearchsyndicate by globalresearchsyndicate
May 16, 2020
in Data Analysis
0
Looking at R-Squared – Erika D
0
SHARES
7
VIEWS
Share on FacebookShare on Twitter

Erika D

In data science we create regression models to see how well we can predict one variable using one or more other variables. The hope of a regression line is that is that we will be able to predict our dependent variable by plugging in our independent variables into our best-fit line equation. But how do we know that this line is predicting the y values accurately? This is where R-Squared comes into play.

What is R-Squared?

R-Squared, also known as the Coefficient of Determination, is a value between 0 and 1 that measures how well our regression line fits our data. R-Squared can be interpreted as the percent of variance in our dependent variable that can be explained by our model. The closer R-Squared is to 1 or 100% the better our model will be at predicting our dependent variable. This can be a little confusing so to truly understand R-Squared we must look into how it is calculated.

How to Calculate R-Squared

The R-Squared formula compares our fitted regression line to a baseline model. This baseline model is considered the “worst” model. The baseline model is a flat-line that predicts every value of y will be the mean value of y. R-Squared checks to see if our fitted regression line will predict y better than the mean will.

The top of our formula, is the Residual sum of squared errors of our regression model (SSres). So if the actual y value was 5 but we had predicted it would be 6 then the residual squared error would be 1 and we would add that to the rest of the residual squared errors for the model.

The bottom of our formula is the total sum of squared errors (SStot). This is comparing the actual y values to our baseline model the mean. So we square the difference between the all the actual y values and the mean and add them together.

Image via: https://learn.co/tracks/data-science-career-v2/module-1-python-for-data-science/section-07-introduction-to-linear-regression/coefficient-of-determination

It is interesting to note that the R-Squared formula does not make use of any independent variables.

Is a Low R-Squared Always Bad?

Just because a model a has a low R-Squared does not mean it is a bad model. R-Squared is often said to measure the goodness of fit of a regression line however this can be misleading. Some areas of study will always have a a greater amount of unexplained variation, for example, studies trying to predict human behavior. If you have a low R-Squared but the independent variables are still statistically significant you can still draw conclusions about the relationships between the variables and your regression model may be the best fit for a given dataset. If the residuals are widely dispersed and variance (σ2) is high, your R-Squared will inevitably be smaller but the regression line may still be the best way to describe the relationship between variables.

Image via: https://blog.minitab.com/blog/adventures-in-statistics-2/how-to-interpret-a-regression-model-with-low-r-squared-and-low-p-values

Is a High R-Squared Always Good?

R-Squared may also be high but does that mean the model is accurate? Not necessarily. There are many possibilities as to why your R-Squared is high that have nothing to do with the predictive validity of your model. One case is that you can be predicting one variable by unintentionally using a different form of the same variable, for example you may be predicting the temperature in Fahrenheit and one of your independent variables is temperature in Celsius. You would obviously get a very high R-Squared but a model that predicts one variable using the same variable in another form is useless.

Image via: https://newonlinecourses.science.psu.edu/stat501/node/251/

Another possibility is that there are too many variables in your model compared to the number of observations. This will lead to an overfitted model and although it can predict the modeled data well it will not predict new data well. To ensure you are not over-fitting your data to achieve a high R-Squared you can either split your data to train and test it but you can also look at the Predicted R-Squared.

Predicted R-Squared can be calculated using a statistical program that removes each data point from the data set, re-calculates the regression line, checks how well the line predicts the missing data point, and then repeats this process for each data point in the the dataset. If your Predicted R-Squared is significantly lower than your R-Squared value then you can assume your model is over-fitting to the data and you may need to remove some independent variables from your model.

We may also see situations where R-Squared is close to 1 but the model is completely wrong. If you look at the below figure you will see that the below model has a R-Squared of .92 but all you have to do is look at the graph and you will see that the regression line does not fit the data well and another model should be used.

Image via: https://newonlinecourses.science.psu.edu/stat501/node/258/

Another problem with R-Squared is that if you keep adding more and more independent variables, R-Squared will go up. However, that does not mean these additional variables have a predictive quality. For example, let’s say you are creating a model to predict the weight of a person using their height. You then decide to add the variable, eye color and find that your R-Squared goes up. Does this mean that someone with brown eyes is likely to weigh more or less than someone with blue eyes? Probably not but in order to confirm this we could use Adjusted R-Squared.

Adjusted R-Squared takes into account the number of independent variables you employ in your model and can help indicate if a variable is useless or not. The more variables you add to your model without predictive quality the lower your Adjusted R-Squared will be. You can see that the number of independent variables, k, is included in the Adjusted R-Squared formula below.

Image via: https://slideplayer.com/slide/8485605/

Just like R-Squared, Adjusted R-Squared is a value between 0 and 1 and will either be lower than or equal to R-Squared. You want the difference between R-Squared and Adjusted R-Squared to be as small as possible.

Final Thoughts

R-Squared is a useful statistic to use when determining if your regression model can accurately predict a variable but it must be used carefully. We cannot simply throw away a model because an R-Squared value is low or assume we have a great model because our R-Squared is high. We must look at the spread of our residuals, what type of predictor variables we are using and how many we are using. It is also helpful to look at the Predicted R-Squared and Adjusted R-Squared compared to our original R-Squared. Keep in mind that R-Squared is not the only way to measure our prediction error and it may be useful to look at other statistics like the Mean Squared Error.

Related Posts

How Machine Learning has impacted Consumer Behaviour and Analysis
Consumer Research

How Machine Learning has impacted Consumer Behaviour and Analysis

January 4, 2024
Market Research The Ultimate Weapon for Business Success
Consumer Research

Market Research: The Ultimate Weapon for Business Success

June 22, 2023
Unveiling the Hidden Power of Market Research A Game Changer
Consumer Research

Unveiling the Hidden Power of Market Research: A Game Changer

June 2, 2023
7 Secrets of Market Research Gurus That Will Blow Your Mind
Consumer Research

7 Secrets of Market Research Gurus That Will Blow Your Mind

May 8, 2023
The Shocking Truth About Market Research Revealed!
Consumer Research

The Shocking Truth About Market Research: Revealed!

April 25, 2023
market research, primary research, secondary research, market research trends, market research news,
Consumer Research

Quantitative vs. Qualitative Research. How to choose the Right Research Method for Your Business Needs

March 14, 2023
Next Post
The Global Data Center Storage Market is expected to grow by USD 126.3 bn during 2020-2024, progressing at a CAGR of 27% during the forecast period

Japan In-Vitro Diagnostics (IVD) Market Size, Share, Trends, Major Deals, Company Analysis and Recent Developments – Forecast to 2026

Categories

  • Consumer Research
  • Data Analysis
  • Data Collection
  • Industry Research
  • Latest News
  • Market Insights
  • Marketing Research
  • Survey Research
  • Uncategorized

Recent Posts

  • Ipsos Revolutionizes the Global Market Research Landscape
  • How Machine Learning has impacted Consumer Behaviour and Analysis
  • Market Research: The Ultimate Weapon for Business Success
  • Privacy Policy
  • Terms of Use
  • Antispam
  • DMCA

Copyright © 2024 Globalresearchsyndicate.com

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In
This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish. Cookie settingsACCEPT
Privacy & Cookies Policy

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these cookies, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may have an effect on your browsing experience.
Necessary
Always Enabled
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Non-necessary
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.
SAVE & ACCEPT
No Result
View All Result
  • Latest News
  • Consumer Research
  • Survey Research
  • Marketing Research
  • Industry Research
  • Data Collection
  • More
    • Data Analysis
    • Market Insights

Copyright © 2024 Globalresearchsyndicate.com