Global Research Syndicate
No Result
View All Result
  • Latest News
  • Consumer Research
  • Survey Research
  • Marketing Research
  • Industry Research
  • Data Collection
  • More
    • Data Analysis
    • Market Insights
  • Latest News
  • Consumer Research
  • Survey Research
  • Marketing Research
  • Industry Research
  • Data Collection
  • More
    • Data Analysis
    • Market Insights
No Result
View All Result
globalresearchsyndicate
No Result
View All Result
Home Data Analysis

How To Code Linear Regression Models With R

globalresearchsyndicate by globalresearchsyndicate
November 24, 2019
in Data Analysis
0
How To Code Linear Regression Models With R
0
SHARES
5
VIEWS
Share on FacebookShare on Twitter

READ ALSO

Global Fire Truck Market Scope and Price Analysis of Top Manufacturers Profiles 2020-2027 – NeighborWebSJ

Bitcoin, Ethereum, Ripple, Polkadot, and Chainlink


Regression is one of the most common data science problem. It, therefore, finds its application in artificial intelligence and machine learning. Regression techniques are used in machine learning to predict continuous values, for example predicting salaries, ages or even profits. Linear regression is the type of regression in which the correlation between the dependent and independent factors can be represented in a linear fashion.

In this article, we will tailor a template for three commonly-used linear regression models in ML :



  • Simple Linear Regression
  • Multiple Linear Regression
  • Support Vector Machine Regression

Here are the pre-requisites:

Simple Linear Regression

Simple linear regression is the simplest regression model of all. The model is used when there are only two factors, one dependent and one independent.


W3Schools


The model is capable of predicting the salary of an employee with respect to his/her age or experience. Given a dataset consisting of two columns age or experience in years and salary, the model can be trained to understand and formulate a relationship between the two factors. Based on the derived formula, the model will be able to predict salaries for any given age or experience.

Here’s The Code:

The Simple Linear Regression is handled by the inbuilt function ‘lm’ in R.

Creating the Linear Regression Model and fitting it with training_Set

regressor = lm(formula = Y ~ X, data = training_set)

This line creates a regressor and provides it with the data set to train.

*   formula : Used to differentiate the independent variable(s) from the dependent variable.In case of multiple independent variables, the variables are appended using ‘+’ symbol. Eg. Y ~ X1 +  X2 + X3 + …

*  X: independent Variable or factor. The column label is specified

*  Y: dependent Variable.The column label is specified.

*  data : The data the model trains on, training_set.

Predicting the values for test set

predicted_Y = predict(regressor, newdata = test_set)

This line predicts the values of dependent factor for new given values of independent factor.

*   regressor : the regressor model that was previously created for training.

*   newdata : the new set of observations that you want to predict Y for.

Visualizing training set predictions

install.packages('ggplot2')  # install once
library(ggplot2)   # importing the library
ggplot() +
geom_point(aes(x = training_set$X, y = training_set$Y), colour = 'black') +
geom_line(aes(x = training_set$X, y = predict(regressor, newdata = training_set)),colour = 'red') +
ggtitle('Y vs X (Training Set)')
xlab('X')
ylab('y')

Visualizing test set predictions

ggplot() +
geom_point(aes(x = test_set$X, y = test_set$Y), colour = 'blue') +
geom_line(aes(x = training_set$X, y = predict(regressor, newdata = training_set)),colour = 'red') +
ggtitle('Y VS X (Test Set)')
xlab('X')
ylab('Y')

These two  blocks of code represent the dataset in a graph. ggplot2 library is used for plotting the data points and the regression line.

The first block is used for plotting the training_set and the second block for the test_set predictions.

*   geom_point() : This function scatter plots all data points in a 2 Dimensional graph

*   geom_line() : Generates or draws the regression line in 2D graph

*   ggtitle() : Assigns the title of the graph

*   xlab : Labels the X- axis

*   ylab : Labels the Y-axis

Replace all X and Y with the Independent and dependent factors (Column labels) respectively.

 

Multiple Linear Regression

Multiple Linear Regression is another simple regression model used when there are multiple independent factors involved. So unlike simple linear regression, there are more than one independent factors that contribute to a dependent factor. It is used to explain the relationship between one continuous dependent variable and two or more independent variables. The independent variables can be continuous or categorical (dummy variables).

Unlike simple linear regression where we only had one independent variable, having more independent variables leads to another challenge of identifying the one that shows more correlation to the dependent variable. Backward Elimination is one method that can help us identify the independent variables with strongest relation to the dependent variable. In this method, a significance Level is chosen. Most commonly it’s 0.05. The regressor model returns a P value for each independent factor/variable. The variable with P Value greater than the chosen Significance Level is removed and P values are updated. The process is iterated until the strongest factor is obtained.

This model can be used to predict the salary of an employee against multiple factors like experience, employee_score etc.

See Also


Here’s The Code:

The Multiple Linear Regression is also handled by the function lm.

Creating the Multiple Linear Regressor and fitting it with Training Set

regressor = lm(Y ~ .,data = training_set)

The expression ‘Y ~ .” takes all variables except Y in the training_set as independent variables.

Predicting the values for test set

predicted_Y = predict(regressor, newdata = test_set)

Using Backward Elimination to Find the most significant Factors

backwardElimination <- function(x, sl) {
numVars = length(x)
for (i in c(1:numVars)){
regressor = lm(formula = Y ~ ., data = x)
maxVar = max(coef(summary(regressor))[c(2:numVars), "Pr(>|t|)"])
if (maxVar > sl){
j = which(coef(summary(regressor))[c(2:numVars), "Pr(>|t|)"] == maxVar)
x = x[, -j]
}
numVars = numVars - 1
}
return(summary(regressor))
}
SL = 0.05
dataset = dataset[, c(indexes of independent factors separated by a coma)]
backwardElimination(dataset, SL)

This block identifies the most significant independent factor by using Backward Elimination method.The independent variable with a greater P value than the chosen Significance Level is removed iteratively until the most Significant variable remains.

Support Vector Regression

Support Vector Regression is a subset of Support Vector Machine (SVM) which is a classification model. Unlike SVM used for predicting binary categories, SVR uses the same principle to predict continuous values.

Here’s The Code:

The package e1071 is used for handling Support Vector Regression in R

Installing and Importing the Library

install.packages('e1071') #install once
library(e1071) #importing the library

Creating the Support Vector Regressor and fitting it with Training Set

svr_regressor = svm(formula = Y ~ ., data = training_set, type = 'eps-regression')

This line creates a Support Vector Regressor and provides the data to train.

*   type : one of two types. ‘eps-regression’ denotes that this is a regression problem

Predicting the values for test set

predicted_Y = predict(svr_regressor, newdata = test_set)

Outlook

The R programming language has been gaining popularity in the ever-growing field of AI and Machine Learning. The language has libraries and extensive packages tailored to solve real real-world problems and has thus proven to be as good as its competitor Python. Linear Regression models are the perfect starter pack for machine learning enthusiasts. This tutorial will give you a template for creating three most common Linear Regression models in R that you can apply on any regression dataset.


Enjoyed this story? Join our Telegram group. And be part of an engaging community.


Provide your comments below

comments

Related Posts

Up Market Research – StartupNG
Data Analysis

Global Fire Truck Market Scope and Price Analysis of Top Manufacturers Profiles 2020-2027 – NeighborWebSJ

January 23, 2021
Bitcoin, Ethereum, Ripple, Polkadot, and Chainlink
Data Analysis

Bitcoin, Ethereum, Ripple, Polkadot, and Chainlink

January 23, 2021
EOS, Algorand, YFI Price Analysis: 23 January
Data Analysis

EOS, Algorand, YFI Price Analysis: 23 January

January 23, 2021
Bulls take a brief pause near 1-month old descending trend-line resistance
Data Analysis

Recapturing $1857 is critical for XAU/USD in the FOMC week ahead – Confluence Detector

January 23, 2021
Ripple price regression targets a disappointing $0 by February says, analyst
Data Analysis

XRP bulls need to crack 200-DMA to extend the recovery

January 23, 2021
Curve Dao (CRV) pumps 36% as BTC struggles
Data Analysis

Curve Dao (CRV) pumps 36% as BTC struggles

January 23, 2021
Next Post
For food sovereignty in Morocco

For food sovereignty in Morocco

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

POPULAR NEWS

‘Vatican Blackout’ Trends on Twitter as Trigger-Happy Users Try to Link It with US Election Fraud

‘Vatican Blackout’ Trends on Twitter as Trigger-Happy Users Try to Link It with US Election Fraud

January 10, 2021
Global Food Authenticity Industry

Fifth Third Bank Partners with Cardtronics to Enhance Brand Visibility in Carolinas through ATM Branding Program

February 4, 2020
Horowitz: Asian-American researcher fired from Michigan State administration for advancing facts about police shootings

Horowitz: Asian-American researcher fired from Michigan State administration for advancing facts about police shootings

July 8, 2020
Digital Learning Market 2020 industry report explores segmented by growth opportunities, emerging-trends, and industry verticals till 2025

Online Brand Protection Software Market report reviews size, share, analysis, trends, growth and forecast 2025

March 6, 2020
Survey finds 40% of fashion brands have not paid suppliers | Apparel Industry News

Survey finds 40% of fashion brands have not paid suppliers | Apparel Industry News

May 29, 2020

EDITOR'S PICK

Uber’s Turbulent Week: Kalanick Out, New Twist In Google Lawsuit

November 9, 2019
Nerve Monitoring Systems Market 2019 Primary Research, Size, Product Research, Trends and Forecast by 2025

Joint Anatomical Model Market Study From Year 2020 to 2025 by Search Techniques Such as Primary Research, Secondary Research, Product Research by Trends and Key Players

January 14, 2020
Procurement Outsourcing Market 2020 – Industry Scenario, Strategies, Growth Factors and Forecast 2026

Data Center IT Asset Disposition Market Report 2020 – Industry Research Report by Manufactures, Types, Applications and Market Dynamics

January 10, 2020
Global Pistachio Nuts Market Opportunities, Outlook, Product types and Compatitive Analysis by 2025| Makin, Olam, Kanegrade, Barry Callebaut Schweiz, Rasha Pistachio – The Daily Chronicle

Trending News Corona impact on Accountable Care Solutions Market Sales Strategy, Revenue Generation, Key Players and Forecast to 2026| Cerner Corporation, Ibm Corporation, Aetna, Inc., Mckesson Corporation

October 2, 2020

Categories

  • Consumer Research
  • Data Analysis
  • Data Collection
  • Industry Research
  • Latest News
  • Market Insights
  • Marketing Research
  • Survey Research
  • Uncategorized

Recent Posts

  • Global Photo Kiosk Market Research Report Covers, Future Trends, Past, Present Data and Deep Analysis 2020-2026 – Reviewindependent
  • Global Fire Truck Market Scope and Price Analysis of Top Manufacturers Profiles 2020-2027 – NeighborWebSJ
  • Cold Pressed Juices Market Top Scenario, SWOT Analysis, Business Overview, Forecast 2027 – Reviewindependent
  • Privacy Policy
  • Terms of Use
  • Antispam
  • DMCA
  • Contact Us

Copyright © 2020 Globalresearchsyndicate.com.

No Result
View All Result
  • Latest News
  • Consumer Research
  • Survey Research
  • Marketing Research
  • Industry Research
  • Data Collection
  • More
    • Data Analysis
    • Market Insights

Copyright © 2020 Globalresearchsyndicate.com.

Login to your account below

Forgotten Password?

Fill the forms bellow to register

All fields are required. Log In

Retrieve your password

Please enter your username or email address to reset your password.

Log In