GLOBAL RESEARCH SYNDICATE
No Result
View All Result
  • Login
  • Latest News
  • Consumer Research
  • Survey Research
  • Marketing Research
  • Industry Research
  • Data Collection
  • More
    • Data Analysis
    • Market Insights
  • Latest News
  • Consumer Research
  • Survey Research
  • Marketing Research
  • Industry Research
  • Data Collection
  • More
    • Data Analysis
    • Market Insights
No Result
View All Result
globalresearchsyndicate
No Result
View All Result
Home Data Analysis

What is ensemble learning? – TechTalks

globalresearchsyndicate by globalresearchsyndicate
November 12, 2020
in Data Analysis
0
What is ensemble learning? – TechTalks
0
SHARES
3
VIEWS
Share on FacebookShare on Twitter

ensemble learning
Ensemble methods combine several machine learning models to improve results

This article is part of Demystifying AI, a series of posts that (try to) disambiguate the jargon and myths surrounding AI.

The principle of “the wisdom of the crowd” shows that a large group of people with average knowledge on a topic can provide reliable answers to questions such as predicting quantities, spatial reasoning, and general knowledge. The aggregate results cancel out the noise and can often be superior to those of highly knowledgeable experts. The same rule can apply to artificial intelligence applications that rely on machine learning, the branch of AI that predicts outcomes based on mathematical models.

In machine learning, crowd wisdom is achieved through ensemble learning. For many problems, the result obtained from an ensemble, a combination of machine learning models, can be more accurate than any single member of the group.

How does ensemble learning work?

Say you want to develop a machine learning model that predicts inventory stock orders for your company based on historical data you have gathered from previous years. You use train four machine learning models using a different algorithms: linear regression, support vector machine, a regression decision tree, and a basic artificial neural network. But even after much tweaking and configuration, none of them achieves your desired 95 percent prediction accuracy. These machine learning models are called “weak learners” because they fail to converge to the desired level.

disparate machine learning models
Single machine learning models do not provide the desired accuracy

But weak doesn’t mean useless. You can combine them into an ensemble. For each new prediction, you run your input data through all four models, and then compute the average of the results. When examining the new result, you see that the aggregate results provide 96 percent accuracy, which is more than acceptable.

The reason ensemble learning is efficient is that your machine learning models work differently. Each model might perform well on some data and less accurately on others. When you combine all them, they cancel out each other’s weaknesses.

You can apply ensemble methods to both predictions problems, like the inventory prediction example we just saw, and classification problems, such as determining whether a picture contains a certain object.

ensemble machine learning models
Ensemble machine learning combine several models to improve the overall results.

Ensemble methods

For a machine learning ensemble, you must make sure your models are independent of each other (or as independent of each other as possible). One way to do this is to create your ensemble from different algorithms, as in the above example.

Another ensemble method is to use instances of the same machine learning algorithms and train them on different data sets. For instance, you can create an ensemble composed of 12 linear regression models, each trained on a subset of your training data.

There are two key methods for sampling data from your training set. “Bootstrap aggregation,” aka “bagging,” takes random samples from the training set “with replacement.” The other method, “pasting,” draws samples “without replacement.”

To understand the difference between the sampling methods, here’s an example. Say you have a training set with 10,000 samples and you want to train each machine learning model in your ensemble with 9,000 samples. In case you’re using bagging, for each of your machine learning models, you take the following steps:

  1. Draw a random sample from the training set.
  2. Add a copy of the sample to the model’s training set
  3. Return the sample to the original training set
  4. Repeat the process 8,999 times
bagging sampling
Bagging sampling draws samples from the training set and replaces them

When using pasting, you go through the same process, with the difference that samples are not returned to the training set after being drawn. Consequently, the same sample might appear in a model’s several times when using bagging but only once when using pasting.

After training all your machine learning models, you’ll have to choose an aggregation method. If you’re tackling a classification problem, the usual aggregation method is “statistical mode,” or the class that is predicted more than others. In regression problems, ensembles usually use the average of the predictions made by the models.

pasting sampling
Pasting draws samples from the training set and replaces them

Boosting methods

Another popular ensemble technique is “boosting.” In contrast to classic ensemble methods, where machine learning models are trained in parallel, boosting methods train them sequentially, with each new model building up on the previous one and solving its inefficiencies.

AdaBoost (short for “adaptive boosting”), one of the more popular boosting methods, improves the accuracy of ensemble models by adapting new models to the mistakes of previous ones. After training your first machine learning model, you single out the training examples misclassified or wrongly predicted by the model. When training the next model, you put more emphasis on these examples. This results in a machine learning model that performs better where the previous one failed. The process repeats itself for as many models you want to add to the ensemble. The final ensemble contains several machine learning models of different accuracies, which together can provide better accuracy. In boosted ensembles, the output of each model is given a weight that is proportionate to its accuracy.

Random forests

One area where ensemble learning is very popular is decision trees, a machine learning algorithm that is very useful because of its flexibility and interpretability. Decision trees can make predictions on complex problems, and they can also trace back their outputs to a series of very clear steps.

The problem with decision trees is that they don’t create smooth boundaries between different classes unless you break them down into too many branches, in which case they become prone to “overfitting,” a problem that occurs when a machine learning model performs very well on training data but poorly on novel examples from the real world.

This is a problem that can be solved through ensemble learning. Random forests are machine learning ensembles composed of multiple decision trees (hence the name “forest”). Using random forests ensures that a machine learning model does not get caught up in the specific confines of a single decision tree.

Random forests have their own independent implementation in Python machine learning libraries such as scikit-learn.

Challenges of ensemble learning

random vectors

While ensemble learning is a very powerful tool, it also has some tradeoffs.

Using ensemble means you must spend more time and resources on training your machine learning models. For instance, a random forest with 500 trees provides much better results than a single decision tree, but it also takes much more time to train. Running ensemble models can also become problematic if the algorithms you use require a lot of memory.

Another problem with ensemble learning is explainability. While adding new models to an ensemble can improve its overall accuracy, it makes it harder to investigate the decisions made by the AI algorithm. A single machine learning models such as decision tree is easy to trace, but when you have hundreds of models contributing to an output, it is much more difficult to make sense of the logic behind each decision.

As with most everything you’ll encounter in machine learning, ensemble is one among the many tools you have for solving complicated problems. It can get you out of difficult situations, but it’s not a silver bullet. Use it wisely.

Related Posts

How Machine Learning has impacted Consumer Behaviour and Analysis
Consumer Research

How Machine Learning has impacted Consumer Behaviour and Analysis

January 4, 2024
Market Research The Ultimate Weapon for Business Success
Consumer Research

Market Research: The Ultimate Weapon for Business Success

June 22, 2023
Unveiling the Hidden Power of Market Research A Game Changer
Consumer Research

Unveiling the Hidden Power of Market Research: A Game Changer

June 2, 2023
7 Secrets of Market Research Gurus That Will Blow Your Mind
Consumer Research

7 Secrets of Market Research Gurus That Will Blow Your Mind

May 8, 2023
The Shocking Truth About Market Research Revealed!
Consumer Research

The Shocking Truth About Market Research: Revealed!

April 25, 2023
market research, primary research, secondary research, market research trends, market research news,
Consumer Research

Quantitative vs. Qualitative Research. How to choose the Right Research Method for Your Business Needs

March 14, 2023
Next Post
eXp World Holdings to Acquire Leading Personal Development Media Brand, SUCCESS Enterprises Nasdaq:EXPI

eXp World Holdings to Acquire Leading Personal Development Media Brand, SUCCESS Enterprises Nasdaq:EXPI

Categories

  • Consumer Research
  • Data Analysis
  • Data Collection
  • Industry Research
  • Latest News
  • Market Insights
  • Marketing Research
  • Survey Research
  • Uncategorized

Recent Posts

  • Ipsos Revolutionizes the Global Market Research Landscape
  • How Machine Learning has impacted Consumer Behaviour and Analysis
  • Market Research: The Ultimate Weapon for Business Success
  • Privacy Policy
  • Terms of Use
  • Antispam
  • DMCA

Copyright © 2024 Globalresearchsyndicate.com

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In
This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish. Cookie settingsACCEPT
Privacy & Cookies Policy

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these cookies, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may have an effect on your browsing experience.
Necessary
Always Enabled
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Non-necessary
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.
SAVE & ACCEPT
No Result
View All Result
  • Latest News
  • Consumer Research
  • Survey Research
  • Marketing Research
  • Industry Research
  • Data Collection
  • More
    • Data Analysis
    • Market Insights

Copyright © 2024 Globalresearchsyndicate.com