GLOBAL RESEARCH SYNDICATE
No Result
View All Result
  • Login
  • Latest News
  • Consumer Research
  • Survey Research
  • Marketing Research
  • Industry Research
  • Data Collection
  • More
    • Data Analysis
    • Market Insights
  • Latest News
  • Consumer Research
  • Survey Research
  • Marketing Research
  • Industry Research
  • Data Collection
  • More
    • Data Analysis
    • Market Insights
No Result
View All Result
globalresearchsyndicate
No Result
View All Result
Home Data Analysis

Learn why decision trees are more flexible than linear models

globalresearchsyndicate by globalresearchsyndicate
December 11, 2019
in Data Analysis
0
Learn why decision trees are more flexible than linear models
0
SHARES
5
VIEWS
Share on FacebookShare on Twitter

This blog post will examine a hypothetical dataset of website visits and customer conversion, to illustrate how decision trees are a more flexible mathematical model than linear models such as logistic regression.

Imagine you are monitoring the webpage of one of your products. You are keeping track of how many times individual customers visit this page, the total amount of time they’ve spent on the page across all their visits, and whether or not they bought the product. Your goal is to be able to predict, for future visitors, how likely they are to buy the product, based on the page visit data. You are considering presenting a discount, or some other kind of offer, to customers you think are likely to buy the product but haven’t yet.

Get to know more about decision trees and linear models!

If you are interested in building your knowledge to prepare data for regularized logistic regression and random forest algorithms, read our book Data Science Projects with Python written by Stephen Klosterman. This book will give you practical guidance on industry-standard data analysis and machine learning tools in Python, with the help of realistic data. You will also learn how to use pandas and Matplotlib to critically examine a dataset with summary statistics and graphs and extract the insights you seek to derive. 

After logging the data on many customers, you visualize them and see the following, including some jitter to help see all the data points:

There are several interesting patterns visible here. We see that in general, the longer someone spends on the page, the more likely they are to purchase the item. However, this effect seems to depend on the number of visits, in a complex way. Someone who visited the page once and spent at least two minutes there (i.e. two minutes per visit) seems likely to buy, at least up until 18 or so minutes. But someone who visited 10 times as much as this seems likely to buy after only 12 minutes cumulative time (1.2 minutes per visit).

Additionally, there is a phenomenon of customers who spend a relatively long time (at least 18 or 19 minutes) over a relatively small number of visits (just one or two), who don’t buy. Maybe they opened the page, but then walked away from their computer, and closed the page as soon as they came back.

Whatever the reason, the patterns in this data set are interesting and complicated. If you want to create a predictive model of these data, you should consider the likely success of non-linear models, such as decision trees, versus linear models, such as logistic regression (for more information see chapter 3 of my book, Data Science Projects with Python).

Linear Regression as a linear model

At a high level, linear models will take the feature space (the two-dimensional space where time is on the x-axis and number of visits is on the y-axis, as in the graph above), and seek to draw a straight line somewhere that creates an accurate division of the two classes of the response variable (“Bought” or “Did not buy”). Consider how likely this is to work. Where would you draw a straight line on the graph above, so that the two regions on either side of the line would primarily contain responses of only one class?

It should be apparent that this is not likely to be an entirely successful task. The best you could probably do, would be to draw a line that isolates non-buying customers who spent relatively little time on the page, represented by the region of dots to the left of the graph, from the blue dots representing buying customers to the right. While this would basically ignore the little group of customers to the lower right, it’s probably the best you could do overall for most customers, using the straight-line approach.

In fact, this is essentially what a logistic regression classifier looks like when the model is calibrated to these data.

The above graph shows the regions of prediction (“Unlikely to buy” and “Likely to buy”) as red or blue shading in the background. Deeper colors indicate a higher likelihood for either class. The conceptual straight-line decision boundary that divides the two regions mentioned above, would run right through the white portion of the background, where the probability of belonging to either class is very low. In other words, the model is “uncertain” about what prediction to make in this region.

From the above graph, it can be seen that in addition to ignoring the small group of non-buying customers in the lower right, a straight line is also not a great model for isolating the non-buying customers on the left of the graph. While you can imagine that a curve might be able to define this boundary, a straight line cannot.

Decision Trees as a non-linear model

How can we do better? Enter non-linear models. Decision trees are a prime example of non-linear models. Decision trees work by dividing the data up into regions based on the “if-then” type of questions. For example, if a user spends less than three minutes over two or fewer visits, how likely are they to buy? Graphically, by asking many of these types of questions, a decision tree can divide up the feature space using little segments of vertical and horizontal lines. This approach can create a much more complex decision boundary, as shown below.

It should be clear that decision trees can be used with more success, to model this data set. Given this, you would have a better model for the likelihood of customer conversion and could then proceed to design offers designed to increase conversion (for more information see chapter 5 of my book, Data Science Projects with Python).

In conclusion, this post has shown how non-linear models, such as decision trees, can more effectively describe relationships in complex data sets than linear models, such as logistic regression. It should be noted that linear models can be extended to non-linearity by various means including feature engineering. On the other hand, non-linear models may suffer from overfitting, since they are so flexible. Nonetheless, approaches to prevent decision trees from overfitting have been formulated using ensemble models such as random forests and gradient boosted trees, which are among the most successful machine learning techniques in use today. As a final caveat, note this blog post presents a hypothetical, synthetic data set, which can be modeled almost perfectly with decision trees. Real-world data is messier, but the same principles hold.

I hope you found this conceptual discussion helpful. For a more detailed explanation of how decision trees and logistic regression work “under the hood” with real-world data, and the python code for a similar hypothetical example to that shown here, check out my book Data Science Projects with Python.

Author Bio

Stephen Klosterman is a machine learning data scientist at CVS Health and is the author of the book Data Science Projects with Python. He enjoys helping to frame problems in a data science context and delivering machine learning solutions that business stakeholders understand and value. His education includes a Ph.D. in biology from Harvard University, where he was an assistant teacher of the data science course.

About the Book

This book Data Science Projects with Python will help you understand the working and output of algorithms like pandas and Matplotlib and gain insight into not only the predictive capabilities of the models but also their reasons for making these predictions.

The book also provides a detailed financial insight on how to build a multiclass classification model and how to conduct a financial analysis to find the optimal threshold for binary classification. This will help you with financial budgeting and operational strategy for a well-optimized usage model.

At the end of this book, you will be able to confidently use various machine learning algorithms to perform detailed data analysis and extract meaningful insights from unstructured data.

Read Next

Netflix open-sources Metaflow, its Python framework for building and managing data science projects

What does a data science team look like?

Get Ready for Open Data Science Conference 2019 in Europe and California

How to learn data science: from data mining to machine learning

Dr.Brandon explains Decision Trees to Jon

Related Posts

How Machine Learning has impacted Consumer Behaviour and Analysis
Consumer Research

How Machine Learning has impacted Consumer Behaviour and Analysis

January 4, 2024
Market Research The Ultimate Weapon for Business Success
Consumer Research

Market Research: The Ultimate Weapon for Business Success

June 22, 2023
Unveiling the Hidden Power of Market Research A Game Changer
Consumer Research

Unveiling the Hidden Power of Market Research: A Game Changer

June 2, 2023
7 Secrets of Market Research Gurus That Will Blow Your Mind
Consumer Research

7 Secrets of Market Research Gurus That Will Blow Your Mind

May 8, 2023
The Shocking Truth About Market Research Revealed!
Consumer Research

The Shocking Truth About Market Research: Revealed!

April 25, 2023
market research, primary research, secondary research, market research trends, market research news,
Consumer Research

Quantitative vs. Qualitative Research. How to choose the Right Research Method for Your Business Needs

March 14, 2023
Next Post
Research for AZ Blues plan show patients want digital offerings

Research for AZ Blues plan show patients want digital offerings

Categories

  • Consumer Research
  • Data Analysis
  • Data Collection
  • Industry Research
  • Latest News
  • Market Insights
  • Marketing Research
  • Survey Research
  • Uncategorized

Recent Posts

  • Ipsos Revolutionizes the Global Market Research Landscape
  • How Machine Learning has impacted Consumer Behaviour and Analysis
  • Market Research: The Ultimate Weapon for Business Success
  • Privacy Policy
  • Terms of Use
  • Antispam
  • DMCA

Copyright © 2024 Globalresearchsyndicate.com

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In
This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish. Cookie settingsACCEPT
Privacy & Cookies Policy

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these cookies, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may have an effect on your browsing experience.
Necessary
Always Enabled
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Non-necessary
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.
SAVE & ACCEPT
No Result
View All Result
  • Latest News
  • Consumer Research
  • Survey Research
  • Marketing Research
  • Industry Research
  • Data Collection
  • More
    • Data Analysis
    • Market Insights

Copyright © 2024 Globalresearchsyndicate.com