GLOBAL RESEARCH SYNDICATE
No Result
View All Result
  • Login
  • Latest News
  • Consumer Research
  • Survey Research
  • Marketing Research
  • Industry Research
  • Data Collection
  • More
    • Data Analysis
    • Market Insights
  • Latest News
  • Consumer Research
  • Survey Research
  • Marketing Research
  • Industry Research
  • Data Collection
  • More
    • Data Analysis
    • Market Insights
No Result
View All Result
globalresearchsyndicate
No Result
View All Result
Home Data Analysis

Hands-On Guide To Spline Regression

globalresearchsyndicate by globalresearchsyndicate
October 27, 2020
in Data Analysis
0
Hands-On Guide To Spline Regression
0
SHARES
45
VIEWS
Share on FacebookShare on Twitter

W3Schools


Linear regression is one of the first algorithms taught to beginners in the field of machine learning. Linear regression helps us understand how machine learning works at the basic level by establishing a relationship between a dependent variable and an independent variable and fitting a straight line through the data points. But, in real-world data science, linear relationships between data points is a rarity and linear regression is not a practical algorithm to use. 

To overcome this, polynomial regression was introduced. But the main drawback of this was as the complexity of the algorithm increased, the number of features also increased and it became difficult to handle them eventually leading to overfitting of the model. To further eliminate these drawbacks, spline regression was introduced. 

In this article, we will discuss spline regression with its implementation in python.



What is Spline Regression?

Spline regression is a non-linear regression which is used to try and overcome the difficulties of linear and polynomial regression algorithms. In linear regression, the entire dataset is considered at once. But in spline regression, the dataset is divided into bins. Each bin of the data is then made to fit with separate models. The points where the data is divided are called knots. Since there are separate functions that fit the bins, each function is called piecewise step functions. 

What are the Piecewise Step Functions?

Piecewise step functions are those functions that can remain constant only over an interval of time. Individual step functions can be fit on these bins and thus avoid using one model on the entire dataset. We break the features into X ranges and apply the following functions. 

spline regression

Here, we have split the data X into c0,c1,,,ck functions and fit them to indicator functions I(). This indicator returns 0 or 1 depending on the condition it is given. 

Though these functions are good for the non-linearity, binning of the functions does not essentially establish the relationship between input and output as we need. So, we need to include some basic functions which are discussed below.

Basic functions and piecewise polynomials

Instead of treating the functions that are applied to the bins as linear, it would be even more efficient to treat them as non-linear. To do this, a very general family of functions is applied to the target variable. This family should not be too flexible to overfit or be too rigid to not fit at all. 

These families of functions are called basic functions. 

y= a0 + a1b0(x1) + a2b1(x2)….

In the above function, if a degree is added to x to make it polynomial, it is called piecewise polynomial function. 

y=a0+a1x1+a2x2

Now that we have understood the overall concept of spline regression let us implement it. 

Implementation

We will implement polynomial spline regression on a simple Boston housing dataset. This data is most commonly used in case of linear regression but we will use cubic spline regression on it. The dataset contains information about the house prices in Boston and the features are the factors affecting the price of the house. You can download the dataset here. 

We will load the dataset now. 

import pandas as pd
from patsy import dmatrix
import statsmodels.api as sm
import numpy as np
import matplotlib.pyplot as plt
dataset=pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/BostonHousing.csv')
dataset
spline regression

Let us now plot the graph of age and the prices that are indicated as medv in the dataset and check how it looks. 

See Also

PP Score Banner

plt.scatter(dataset['age'], dataset['medv'])

Clearly, there is no linear relationship between these points. So we will use spline regression as follows:

Cube and natural spline

spline_cube = dmatrix('bs(x, knots=(20,30,40,50))', {'x': dataset['age']})
spline_fit = sm.GLM(dataset['medv'], spline_cube).fit()
natural_spline = dmatrix('cr(x, knots=(20,30,40,50))', {'x': dataset['age']})
spline_natural = sm.GLM(dataset['medv'], natural_spline).fit()

Here, we have used the generalized linear model or GLM and fit the natural and cube splines. It is in the form of the matrix where the knots or divides have to be mentioned. These knots are where the data will divide and form bins and act on them. The knots used above are 20,30,40 and 50 since the age is upto 50. 

Creating linspaces

Next, we will create linspaces from the dataset based on minimum and maximum values. Then, we will use this linspace to make the prediction on the above model.

range = np.linspace(dataset['age'].min(), dataset['age'].max(), 50)
cubic_line = spline_fit.predict(dmatrix('bs(range, knots=(20,30,40,50))', {'range': range}))
line_natural = spline_natural.predict(dmatrix('cr(range, knots=(20,30,40,50))', {'range': range}))

Plot the graph

Finally, after the predictions are made it is time to plot the spline regression graphs and check how the model has fit on the bins. 

plt.plot(range, cubic_line, color='r', label='Cubic spline')
plt.plot(range, line_natural, color='g', label='Natural spline')
plt.legend()
plt.scatter(dataset['age'], dataset['medv'])
plt.xlabel('age')
plt.ylabel('medv')
plt.show()
spline regression

As you can see, the bins at 20 and 30 vary slightly more and the bins at 40 and 50 also fit differently. This is because different models are fit on the different bins of the data. But it is efficient since most points are being covered by the model. 

Conclusion

In this article, we saw how to improve linear and polynomial regression to fit on non-linear relationships using spline regression. This type of regression can be used more efficiently to establish relationships between variables without linearity involved and for real-world problems. 

The complete code of the above implementation is available at the AIM’s GitHub repository. Please visit this link to find the notebook of this code.


If you loved this story, do join our Telegram Community.


Also, you can write for us and be one of the 500+ experts who have contributed stories at AIM. Share your nominations here.

Related Posts

How Machine Learning has impacted Consumer Behaviour and Analysis
Consumer Research

How Machine Learning has impacted Consumer Behaviour and Analysis

January 4, 2024
Market Research The Ultimate Weapon for Business Success
Consumer Research

Market Research: The Ultimate Weapon for Business Success

June 22, 2023
Unveiling the Hidden Power of Market Research A Game Changer
Consumer Research

Unveiling the Hidden Power of Market Research: A Game Changer

June 2, 2023
7 Secrets of Market Research Gurus That Will Blow Your Mind
Consumer Research

7 Secrets of Market Research Gurus That Will Blow Your Mind

May 8, 2023
The Shocking Truth About Market Research Revealed!
Consumer Research

The Shocking Truth About Market Research: Revealed!

April 25, 2023
market research, primary research, secondary research, market research trends, market research news,
Consumer Research

Quantitative vs. Qualitative Research. How to choose the Right Research Method for Your Business Needs

March 14, 2023
Next Post
Indonesia says ‘Jurassic Park’ project no threat to Komodo dragon

Indonesia says 'Jurassic Park' project no threat to Komodo dragon

Categories

  • Consumer Research
  • Data Analysis
  • Data Collection
  • Industry Research
  • Latest News
  • Market Insights
  • Marketing Research
  • Survey Research
  • Uncategorized

Recent Posts

  • Ipsos Revolutionizes the Global Market Research Landscape
  • How Machine Learning has impacted Consumer Behaviour and Analysis
  • Market Research: The Ultimate Weapon for Business Success
  • Privacy Policy
  • Terms of Use
  • Antispam
  • DMCA

Copyright © 2024 Globalresearchsyndicate.com

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In
This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish. Cookie settingsACCEPT
Privacy & Cookies Policy

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these cookies, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may have an effect on your browsing experience.
Necessary
Always Enabled
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Non-necessary
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.
SAVE & ACCEPT
No Result
View All Result
  • Latest News
  • Consumer Research
  • Survey Research
  • Marketing Research
  • Industry Research
  • Data Collection
  • More
    • Data Analysis
    • Market Insights

Copyright © 2024 Globalresearchsyndicate.com